Skip to content

CometNativeException: "arrays of different length" when using to_date on Iceberg Timestamp column #3255

@boudica-dev-eng

Description

@boudica-dev-eng

Describe the bug

I am encountering a CometNativeException when performing standard date transformations (to_date or datediff) on a Timestamp column read from an Iceberg table.

The error message Cannot perform binary operation on arrays of different length occurs even though the table schema contains no ArrayType columns (only Scalars).

The issue appears to be related to how Comet handles the vectorisation of the Timestamp column, possibly involving dictionary encoding in the underlying Parquet files.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage X failed 4 times...
Caused by: org.apache.comet.CometNativeException: Compute error: Cannot perform binary operation on arrays of different length
    at org.apache.comet.Native.executePlan(Native Method)
    ...

When Comet is disabled, Spark executes the job flawlessly.

Steps to reproduce

  1. Read an Iceberg table containing a TIMESTAMPTZ column.
  2. Apply F.to_date() to the timestamp column.
  3. Trigger an action (e.g., .count() or a write).
# Schema is simple: id (String), ts (Timestamp) - No Arrays present
df = spark.read.format("iceberg").load("db.table")

# This crashes Comet:
df.withColumn("date_col", F.to_date(F.col("ts"))).count()

# This ALSO crashes Comet:
df.withColumn("diff", F.datediff(F.current_date(), F.col("ts"))).count()

Expected behavior

Column is added as date

Additional context

  • Comet version: built from ea26629
  • Spark version: 4.0.1_2.13 (Spark Connect, Kubernetes, no Python/PySpark)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions