Skip to content

blog: Apache DataFusion Comet 0.17.0 release post#198

Merged
mbutrovich merged 20 commits into
apache:mainfrom
andygrove:blog-comet-0.17.0
Jun 22, 2026
Merged

blog: Apache DataFusion Comet 0.17.0 release post#198
mbutrovich merged 20 commits into
apache:mainfrom
andygrove:blog-comet-0.17.0

Conversation

@andygrove

Copy link
Copy Markdown
Member

Draft release announcement for Apache DataFusion Comet 0.17.0.

The post is structured as a standard PMC release announcement, with the focus on this cycle's headline work:

  • Arrow-native framing: leads with the "Arrow-native end to end" value proposition and the Rust-implemented vs JVM-implemented distinction, following the nomenclature direction in Establish nomenclature style guide and audit operator names for clarity datafusion-comet#4419.
  • JVM codegen dispatch: the maturation of the codegen dispatcher (Scala/Java UDFs enabled by default, 100% Spark-compatible regex and JSON, additional scalar/text functions, and the Incompatible-to-dispatch change).
  • Expanded expression coverage: more than 120 Spark expressions newly supported since 0.16.0, curated by category.
  • Performance: removing the Arrow FFI round trip and per-batch deep copy from the native shuffle write path, Arrow vector buffer-address caching, and GroupsAccumulator for statistical aggregates.
  • Correctness: the cross-version expression audits, framed as preparation toward a future 1.0.0 release.

Opening as a draft because the following still need to be finalized at release time (flagged with a TODO comment in the post):

  • Release date (filename and front matter currently use a placeholder)
  • PR count and contributor count (to be confirmed against the generated dev/changelog/0.17.0.md)
  • The expressions reference, installation, and changelog links assume the standard release-doc publishing

Feedback welcome on scope: a few other 0.17.0 themes (Iceberg RewriteDataFiles native scan, NullType shuffle support, pluggable S3 credentials) were intentionally left out to keep the focus tight.

Draft release announcement for Comet 0.17.0, focused on the JVM codegen dispatcher, expanded expression coverage, and the native-shuffle FFI round-trip removal, told through the Arrow-native framing. Stats and date are placeholders to finalize at release.
@andygrove andygrove changed the title blog: Apache DataFusion Comet 0.17.0 release post blog: Apache DataFusion Comet 0.17.0 release post [WIP] Jun 11, 2026
@mbutrovich

Copy link
Copy Markdown
Contributor

Do we want a section that we're starting to discuss criteria for a 1.0.0 release, have a tracking issue for it apache/datafusion-comet#4082, and are looking for as much user feedback as possible?

andygrove added 10 commits June 12, 2026 07:34
Fill in PR count (192) from the generated changelog and remove the
pre-publish TODO. Expand the codegen dispatch coverage to reflect the
expressions wired up since the draft (collection and higher-order
functions, AES, mask, try_to_number, timezone conversions). Add the
Arrow C Stream Interface input-path change and the native broadcast
nested loop join. Correct Spark 4.1 to 4.1.2. Tighten editorializing
prose throughout.
The post referenced a global spark.comet.expr.allowIncompatible, which
does not exist. The flag is per-expression and lives under the
spark.comet.expression prefix: spark.comet.expression.<name>.allowIncompatible.
Replace raw TPC-H runtimes with the combined improvement: the two FFI
changes improve TPC-DS at 1TB by around 9%.
The 9% is the 0.16.0-to-0.17.0 improvement at 1TB; the FFI changes are
the largest contributor but not the sole cause.
@andygrove andygrove marked this pull request as ready for review June 21, 2026 13:20
@andygrove andygrove changed the title blog: Apache DataFusion Comet 0.17.0 release post [WIP] blog: Apache DataFusion Comet 0.17.0 release post Jun 21, 2026
@mbutrovich

Copy link
Copy Markdown
Contributor

Taking another pass this morning.

@mbutrovich mbutrovich left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits. Thanks for driving the blog post, @andygrove! 0.17.0 looks like a good one!

Comment thread content/blog/2026-06-20-datafusion-comet-0.17.0.md Outdated
Comment thread content/blog/2026-06-20-datafusion-comet-0.17.0.md Outdated
andygrove and others added 2 commits June 22, 2026 07:31
Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>

@mbutrovich mbutrovich left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove!

@mbutrovich mbutrovich merged commit c1fd23f into apache:main Jun 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants