What would you like to happen?
In Beam's latest release 2.74.0, Hash Distribution Mode and AutoSharding for ICEBERG IO WRITES were enabled via this PR 38061.
This PR introduced a new transform WriteToPartitions (file WriteToPartitions.java, new in 2.74.0) and this file uses GroupIntoBatches Transform which is not supported in Spark Runners as Spark Runner does not support @OnWindowExpiration annotation (States and Timers) which is why i m unable to use the new feature in Spark Runner.
When i use SparkStructuredStreamingRunner, the code fails at this line
When i use SparkRunner, the code fails at this line
Is there a workaround or solve for Spark Runner Users to use the "hash" distribution mode with "autoSharding" for Iceberg Writes using Beam.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
What would you like to happen?
In Beam's latest release 2.74.0, Hash Distribution Mode and AutoSharding for ICEBERG IO WRITES were enabled via this PR 38061.
This PR introduced a new transform WriteToPartitions (file WriteToPartitions.java, new in 2.74.0) and this file uses GroupIntoBatches Transform which is not supported in Spark Runners as Spark Runner does not support @OnWindowExpiration annotation (States and Timers) which is why i m unable to use the new feature in Spark Runner.
When i use SparkStructuredStreamingRunner, the code fails at this line
When i use SparkRunner, the code fails at this line
Is there a workaround or solve for Spark Runner Users to use the "hash" distribution mode with "autoSharding" for Iceberg Writes using Beam.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components