-
Notifications
You must be signed in to change notification settings - Fork 276
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I am encountering a CometNativeException when performing standard date transformations (to_date or datediff) on a Timestamp column read from an Iceberg table.
The error message Cannot perform binary operation on arrays of different length occurs even though the table schema contains no ArrayType columns (only Scalars).
The issue appears to be related to how Comet handles the vectorisation of the Timestamp column, possibly involving dictionary encoding in the underlying Parquet files.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage X failed 4 times...
Caused by: org.apache.comet.CometNativeException: Compute error: Cannot perform binary operation on arrays of different length
at org.apache.comet.Native.executePlan(Native Method)
...
When Comet is disabled, Spark executes the job flawlessly.
Steps to reproduce
- Read an Iceberg table containing a
TIMESTAMPTZcolumn. - Apply
F.to_date()to the timestamp column. - Trigger an action (e.g.,
.count()or a write).
# Schema is simple: id (String), ts (Timestamp) - No Arrays present
df = spark.read.format("iceberg").load("db.table")
# This crashes Comet:
df.withColumn("date_col", F.to_date(F.col("ts"))).count()
# This ALSO crashes Comet:
df.withColumn("diff", F.datediff(F.current_date(), F.col("ts"))).count()
Expected behavior
Column is added as date
Additional context
- Comet version: built from ea26629
- Spark version: 4.0.1_2.13 (Spark Connect, Kubernetes, no Python/PySpark)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working