diff --git a/content/blog/2026-06-12-datafusion-54.0.0.md b/content/blog/2026-06-12-datafusion-54.0.0.md
new file mode 100644
index 00000000..6febb8d2
--- /dev/null
+++ b/content/blog/2026-06-12-datafusion-54.0.0.md
@@ -0,0 +1,442 @@
+---
+layout: post
+title: Apache DataFusion 54.0.0 Released
+date: 2026-06-12
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+We are proud to announce the release of [DataFusion 54.0.0]. This post highlights
+some of the major improvements since [DataFusion 53.0.0]. Notable additions
+include `LATERAL` joins, SQL lambda functions, and a new Avro reader, alongside
+significant join, scan, and planning performance improvements. The complete list
+of changes is available in the [changelog]. This release represents roughly 11
+weeks of development and 740 commits. Thanks to the [139 contributors]
+(a new record!) for making it possible.
+
+[DataFusion 54.0.0]: https://crates.io/crates/datafusion/54.0.0
+[DataFusion 53.0.0]: https://datafusion.apache.org/blog/2026/04/02/datafusion-53.0.0/
+[changelog]: https://github.com/apache/datafusion/blob/branch-54/dev/changelog/54.0.0.md
+[139 contributors]: https://github.com/apache/datafusion/blob/branch-54/dev/changelog/54.0.0.md#credits
+
+## Performance Improvements 🚀
+
+<img
+src="/blog/images/datafusion-54.0.0/performance_over_time_clickbench.png"
+width="100%"
+class="img-fluid"
+alt="Performance over time"
+/>
+
+**Figure 1**: Average and median normalized execution times for DataFusion 54.0.0 on ClickBench queries, compared to previous releases.
+Query times are normalized using the ClickBench definition. See the
+[DataFusion Benchmarking Page](https://alamb.github.io/datafusion-benchmarking/)
+for more details.
+
+We continue to make significant performance improvements in DataFusion, as
+explained below. This release prunes more redundant work out of plans and
+makes joins, repartitioning, scans, and many built-in functions faster.
+
+### Execution Operator Improvements
+
+**Physical Execution of Uncorrelated Scalar Subqueries**:
+DataFusion previously executed an uncorrelated scalar subquery (one that doesn't
+depend on the outer query) by rewriting it into a join. DataFusion 54 instead
+evaluates it once with a new physical operator. This lets functions use their
+specialized scalar code paths, and allows uncorrelated scalar subqueries in
+`ORDER BY`, `JOIN ON`, and as arguments to aggregate functions.
+Thanks to [@neilconway] for implementing this feature, with reviews from
+[@Dandandan], [@alamb], and [@timsaucer]. Related PRs: [#21240]
+
+**Faster Sort-Merge Joins**:
+Semi, anti, and mark joins now track matches with a per-row bitset instead of
+materializing `(outer, inner)` pairs. Batched deferred filtering makes
+near-unique `LEFT` and `FULL` joins 20-50x faster. Finally, join-key comparisons
+now use a `DynComparator` that resolves the column type once rather than per row,
+making microbenchmarks up to 12% faster and TPC-H ~5% faster overall.
+Thanks to [@mbutrovich] for this work, with reviews from [@Dandandan],
+[@comphead], and [@rluvaton]. Related PRs: [#20806], [#21184], [#21484], [#21517]
+
+**Faster Repartitioning**:
+`RepartitionExec` now coalesces batches before sending them to distributor
+channels, cutting per-batch overhead for up to 50% faster execution on some
+repartition-heavy queries.
+Thanks to [@gabotechs] for this work, with reviews from [@Dandandan] and
+[@alamb]. Related PRs: [#22010]
+
+**Faster Functions and Hashing**:
+DataFusion ships hundreds of built-in functions, so speeding them up pays off
+across many workloads. This release optimizes many, including [array_to_string],
+[array_concat], [array_sort], [split_part], [substr], [strpos], [left], [right],
+[string_agg], and [approx_distinct], plus better `NULL` handling across many
+array and datetime functions. The `first_value` and `last_value` aggregates are
+also substantially faster over `Utf8` and `Binary` columns thanks to a new
+`GroupsAccumulator` ([#21090]). DataFusion 54 also swaps `ahash` for [foldhash] in
+`datafusion-common`, and optimizes [regexp_replace] by stripping trailing `.*`
+from anchored patterns.
+Thanks to the many contributors who drove this work, especially [@UBarney],
+[@neilconway], [@Dandandan], [@zhangxffff], [@lyne7-sc], [@CuteChuanChuan],
+[@kumarUjjawal], and [@coderfender].
+
+### Planner Improvements
+
+**Pruning Functionally Redundant Sort Keys**:
+Sorting is expensive, so it pays to sort by as few columns as possible.
+DataFusion 54 now drops functionally redundant `ORDER BY` keys: when an earlier
+key determines a later one, the later key can't change the ordering, so removing
+it cuts sorting cost without affecting results.
+Thanks to [@xiedeyantu] for implementing this feature, with reviews from
+[@alamb] and [@neilconway]. Related PRs: [#21362]
+
+**Skip Redundant Parquet Filters**:
+When statistics prove a filter matches every row in a Parquet row group,
+DataFusion now skips evaluating it — both row filters and page-level pruning —
+for that row group instead of re-checking each row.
+Thanks to [@xudong963] for implementing this, building on a suggestion from
+[@crepererum]. Related issues and PRs: [#19028], [#21637]
+
+**Statistics-Driven Sort Pushdown and TopK**:
+Files and Parquet row groups are now ordered using statistics, which can avoid
+sorting entirely and improve dynamic filtering and early stopping for TopK
+(`ORDER BY ... LIMIT`) queries. The most promising data is read first, often
+satisfying the `LIMIT` before scanning the rest.
+Thanks to [@zhuqi-lucas] for driving this work, with reviews from [@adriangb].
+Related PRs: [#21182], [#21426], [#21956]
+
+**Improved Statistics and Cardinality Estimation**:
+Good plans depend on good statistics. This release extracts NDV (number of
+distinct values) statistics from Parquet metadata, uses NDV for equality-filter
+selectivity, adds a pluggable [StatisticsRegistry] for operator-level statistics
+propagation, and improves cardinality estimation for semi and anti joins.
+Thanks to [@asolimando], [@jonathanc-n], and [@buraksenn] for driving this work.
+Related PRs: [#19957], [#20789], [#21077], [#21081], [#21483], [#20904]
+
+### Scan Improvements
+
+**Morsel-Driven Parquet Scans**:
+Parquet scan parallelism was previously bounded by the slowest scan thread, so
+data skew (large row groups, less-selective filters, or variable object store
+latency) left cores underutilized. DataFusion 54 reworks the Parquet scan around
+a [morsel-driven design], where idle threads dynamically pull small units of work
+("morsels") instead of each being assigned a fixed partition up front. This
+spreads work more evenly and can be up to ~2x faster for skewed scans such as
+ClickBench.
+Thanks to [@Dandandan], [@alamb], [@adriangb], [@xudong963], and [@zhuqi-lucas]
+for collaborating on this substantial effort. Related issues and PRs: [#20529],
+[#21327], [#21342], [#21351]
+
+**Struct Field Filter Pushdown and Leaf-Level Projection**:
+Filters on struct fields (e.g. `WHERE s['foo'] > 67`) are now pushed down into the
+Parquet decoder rather than evaluated after a full scan, and both filtering and
+projection read only the struct leaves they actually access,
+significantly improving performance for nested and `Variant` data in large
+Parquet files.
+Thanks to [@friendlymatthew] for this work, with reviews from [@adriangb],
+[@cetra3], and [@AdamGS]. Related PRs: [#20822], [#20854], [#20925]
+
+## New Features ✨
+
+### `LATERAL` Joins
+
+Lateral joins have been long requested ([#10048]). DataFusion 54 adds basic
+support for `CROSS JOIN LATERAL`, `INNER JOIN LATERAL`, and `LEFT JOIN LATERAL`
+([#21202], [#21352]). A lateral subquery in the `FROM` clause can reference
+columns from preceding tables — handy for expanding a per-row series or
+correlating against a set-returning function. It uses decorrelation, so the
+subquery is evaluated once rather than re-executed per outer row.
+
+```sql
+-- For each row in t1, expand a series 1..t1_int and join the values back
+SELECT t1_id, t1_name, i
+FROM join_t1 t1
+CROSS JOIN LATERAL (
+    SELECT * FROM unnest(generate_series(1, t1_int))
+) AS series(i);
+```
+
+Thanks to [@neilconway] for implementing this feature, with reviews from
+[@Dandandan], [@alamb], and [@crm26].
+
+### Lambda Functions
+
+DataFusion now supports lambda expressions (`x -> expr`) with column capture,
+plus new higher-order array UDFs like [array_transform], [array_filter], and
+[array_any_match] ([#21323], [#21679]). Lambdas express per-element computation
+directly in SQL:
+
+```sql
+-- Apply `x * 10` to every element
+SELECT array_transform([1, 2, 3, 4, 5], x -> x * 10);
+-- [10, 20, 30, 40, 50]
+
+-- Keep only elements where `x > 2`
+SELECT array_filter([1, 2, 3, 4, 5], x -> x > 2);
+-- [3, 4, 5]
+
+-- True if any element satisfies `x > 2`
+SELECT array_any_match([1, 2, 3], x -> x > 2);
+-- true
+```
+
+Lambdas compose, so you can filter then transform in one expression:
+
+```sql
+-- Keep elements > 2, then multiply each survivor by 10
+SELECT array_transform(array_filter([1, 2, 3, 4, 5], x -> x > 2), x -> x * 10);
+-- [30, 40, 50]
+```
+
+Thanks to [@gstvg] and [@rluvaton] for leading this effort (first prototyped in
+[#18921]), [@ologlogn] and [@LiaCastaneda] for the [array_filter] and
+[array_any_match] functions, and [@benbellick], [@comphead], [@martin-g],
+[@pepijnve], and [@shehabgamin] for reviews.
+
+### Spilling Nested Loop Joins
+
+`NestedLoopJoinExec` previously failed with an out-of-memory error when the build
+side exceeded the memory budget. DataFusion 54 adds a memory-limited path that
+transparently spills to disk and completes the query instead ([#21448]), with
+zero overhead when memory is sufficient. It currently covers `INNER`, `LEFT`,
+`LEFT SEMI`, `LEFT ANTI`, and `LEFT MARK` joins. Thanks to [@viirya] for
+implementing this feature, with reviews from [@2010YOUY01].
+
+### New Avro Reader
+
+The Avro reader now uses the `arrow-avro` crate ([#17861]), replacing internal
+conversion code with a faster, better-maintained implementation shared with the
+Arrow ecosystem (see [Introducing arrow-avro]). Thanks to [@getChan] for this
+work, with reviews from [@adriangb], [@alamb], and [@jecsand838].
+
+[Introducing arrow-avro]: https://arrow.apache.org/blog/2025/10/23/introducing-arrow-avro/
+
+### Extension Type Registry
+
+Arrow extension types let users layer their own semantics on top of a physical
+storage type. DataFusion 54 adds a registry for registering their behavior
+([#18223], [#20312]), several more canonical extension types ([#21291]), and the
+ability to cast to an extension type in logical expressions ([#18136]). Thanks to
+[@tschwarzinger] and [@paleolimbot] for driving this work, with reviews from
+[@adriangb] and [@cetra3].
+
+### Content-Defined Chunking for Parquet
+
+DataFusion's Parquet writer can now use content-defined chunking (CDC)
+([#21110]), which aligns data page boundaries with the data rather than fixed row
+counts. This improves deduplication and incremental storage, since inserting or
+editing a few rows no longer shifts every later page boundary. For background,
+see [Parquet Content-Defined Chunking] and [Improving Parquet Dedupe on Hugging
+Face Hub]. Thanks to [@kszucs] for this feature, with reviews from [@alamb].
+
+[Parquet Content-Defined Chunking]: https://huggingface.co/blog/parquet-cdc
+[Improving Parquet Dedupe on Hugging Face Hub]: https://huggingface.co/blog/improve_parquet_dedupe
+
+### New Functions
+
+**SQL and Scalar Functions**:
+DataFusion 54 adds new scalar functions including [array_compact],
+[cosine_distance], [inner_product], [array_normalize], [cast_to_type], and
+[with_metadata], plus nanosecond `date_part` support ([#20674]) and the `:` JSON
+access operator ([#20628]). The [cosine_distance], [inner_product], and
+[array_normalize] additions round out DataFusion's vector-search building blocks. Thanks to [@comphead], [@crm26],
+[@adriangb], [@mhilton], and [@Samyak2] for these contributions.
+
+**Spark-Compatible Functions**:
+The [datafusion-spark crate] gains many new or improved Spark-compatible
+functions, including [round], [floor], [ceil], [soundex], [xxhash64],
+[array_contains], [array_compact], int/float-to-timestamp casts, and UTF-8
+validation functions.
+Thanks to the contributors who drove this work, especially [@comphead],
+[@coderfender], [@SubhamSinghal], [@kazantsev-maksim], [@andygrove], [@buraksenn],
+[@davidlghellin], [@athlcode], and [@shivbhatia10].
+
+## Upgrade Guide and Changelog 📖
+
+Upgrading to 54.0.0 should be straightforward for most users, though there are
+some breaking changes. See the [Upgrade Guide] for details and
+migration snippets, and the [changelog] for the full list of changes.
+
+## About DataFusion
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that uses
+[Apache Arrow] as its in-memory format. DataFusion is used by developers to
+create new, fast, data-centric systems such as databases, dataframe libraries,
+and machine learning and streaming applications. While [DataFusion's primary
+design goal] is to accelerate the creation of other data-centric systems, it
+provides a reasonable experience directly out of the box as a [dataframe
+library], [Python library], and [command-line SQL tool].
+
+DataFusion's core thesis is that, as a community, together we can build much
+more advanced technology than any of us as individuals or companies could build
+alone. Without DataFusion, highly performant vectorized query engines would
+remain the domain of a few large companies and world-class research
+institutions. With DataFusion, we can all build on top of a shared foundation
+and focus on what makes our projects unique.
+
+## How to Get Involved
+
+DataFusion is not a project built or driven by a single person, company, or
+foundation. Rather, our community of users and contributors works together to
+build a shared technology that none of us could have built alone.
+
+If you are interested in joining us, we would love to have you. You can try out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests, or code. A list of open issues suitable for beginners is [here], and you
+can find out how to reach us on the [communication doc].
+
+[Apache DataFusion]: https://datafusion.apache.org/
+[Rust]: https://www.rust-lang.org/
+[Apache Arrow]: https://arrow.apache.org
+[DataFusion's primary design goal]: https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[Python library]: https://datafusion.apache.org/python/
+[command-line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+[Upgrade Guide]: https://datafusion.apache.org/library-user-guide/upgrading.html
+[here]: https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22
+[communication doc]: https://datafusion.apache.org/contributor-guide/communication.html
+
+[@xiedeyantu]: https://github.com/xiedeyantu
+[@zhuqi-lucas]: https://github.com/zhuqi-lucas
+[@neilconway]: https://github.com/neilconway
+[@mbutrovich]: https://github.com/mbutrovich
+[@gabotechs]: https://github.com/gabotechs
+[@Dandandan]: https://github.com/Dandandan
+[@zhangxffff]: https://github.com/zhangxffff
+[@lyne7-sc]: https://github.com/lyne7-sc
+[@CuteChuanChuan]: https://github.com/CuteChuanChuan
+[@kumarUjjawal]: https://github.com/kumarUjjawal
+[@coderfender]: https://github.com/coderfender
+[@gstvg]: https://github.com/gstvg
+[@rluvaton]: https://github.com/rluvaton
+[@getChan]: https://github.com/getChan
+[@tschwarzinger]: https://github.com/tschwarzinger
+[@paleolimbot]: https://github.com/paleolimbot
+[@kszucs]: https://github.com/kszucs
+[@comphead]: https://github.com/comphead
+[@crm26]: https://github.com/crm26
+[@adriangb]: https://github.com/adriangb
+[@mhilton]: https://github.com/mhilton
+[@Samyak2]: https://github.com/Samyak2
+[@SubhamSinghal]: https://github.com/SubhamSinghal
+[@kazantsev-maksim]: https://github.com/kazantsev-maksim
+[@andygrove]: https://github.com/andygrove
+[@buraksenn]: https://github.com/buraksenn
+[@davidlghellin]: https://github.com/davidlghellin
+[@asolimando]: https://github.com/asolimando
+[@jonathanc-n]: https://github.com/jonathanc-n
+[@alamb]: https://github.com/alamb
+[@timsaucer]: https://github.com/timsaucer
+[@martin-g]: https://github.com/martin-g
+[@jecsand838]: https://github.com/jecsand838
+[@cetra3]: https://github.com/cetra3
+[@xudong963]: https://github.com/xudong963
+[@crepererum]: https://github.com/crepererum
+[@viirya]: https://github.com/viirya
+[@2010YOUY01]: https://github.com/2010YOUY01
+[@UBarney]: https://github.com/UBarney
+[@friendlymatthew]: https://github.com/friendlymatthew
+[@AdamGS]: https://github.com/AdamGS
+[@athlcode]: https://github.com/athlcode
+[@shivbhatia10]: https://github.com/shivbhatia10
+[@ologlogn]: https://github.com/ologlogn
+[@LiaCastaneda]: https://github.com/LiaCastaneda
+[@benbellick]: https://github.com/benbellick
+[@pepijnve]: https://github.com/pepijnve
+[@shehabgamin]: https://github.com/shehabgamin
+
+[array_to_string]: https://github.com/apache/datafusion/pull/20639
+[array_concat]: https://github.com/apache/datafusion/pull/20620
+[array_sort]: https://github.com/apache/datafusion/pull/21083
+[split_part]: https://github.com/apache/datafusion/pull/21119
+[substr]: https://github.com/apache/datafusion/pull/21366
+[strpos]: https://github.com/apache/datafusion/pull/20754
+[left]: https://github.com/apache/datafusion/pull/21442
+[right]: https://github.com/apache/datafusion/pull/21442
+[string_agg]: https://github.com/apache/datafusion/pull/21154
+[approx_distinct]: https://github.com/apache/datafusion/pull/21037
+[foldhash]: https://github.com/apache/datafusion/pull/20958
+[regexp_replace]: https://github.com/apache/datafusion/pull/21379
+[array_transform]: https://github.com/apache/datafusion/pull/21679
+[array_filter]: https://github.com/apache/datafusion/pull/21895
+[array_any_match]: https://github.com/apache/datafusion/pull/21903
+[array_compact]: https://github.com/apache/datafusion/pull/21522
+[cosine_distance]: https://github.com/apache/datafusion/pull/21542
+[inner_product]: https://github.com/apache/datafusion/pull/21861
+[array_normalize]: https://github.com/apache/datafusion/pull/22013
+[cast_to_type]: https://github.com/apache/datafusion/pull/21322
+[with_metadata]: https://github.com/apache/datafusion/pull/21509
+
+[round]: https://github.com/apache/datafusion/pull/21062
+[floor]: https://github.com/apache/datafusion/pull/21933
+[ceil]: https://github.com/apache/datafusion/pull/20593
+[soundex]: https://github.com/apache/datafusion/pull/20725
+[xxhash64]: https://github.com/apache/datafusion/pull/21967
+[array_contains]: https://github.com/apache/datafusion/pull/20685
+
+[morsel-driven design]: https://db.in.tum.de/~leis/papers/morsels.pdf
+[StatisticsRegistry]: https://docs.rs/datafusion/latest/datafusion/physical_plan/operator_statistics/struct.StatisticsRegistry.html
+
+[#17861]: https://github.com/apache/datafusion/pull/17861
+[#18921]: https://github.com/apache/datafusion/pull/18921
+[#10048]: https://github.com/apache/datafusion/issues/10048
+[#20674]: https://github.com/apache/datafusion/pull/20674
+[#20628]: https://github.com/apache/datafusion/pull/20628
+[#19028]: https://github.com/apache/datafusion/issues/19028
+[#20529]: https://github.com/apache/datafusion/issues/20529
+[#21327]: https://github.com/apache/datafusion/pull/21327
+[#21342]: https://github.com/apache/datafusion/pull/21342
+[#21351]: https://github.com/apache/datafusion/pull/21351
+[#21448]: https://github.com/apache/datafusion/pull/21448
+[#21637]: https://github.com/apache/datafusion/pull/21637
+[#21090]: https://github.com/apache/datafusion/pull/21090
+[#20822]: https://github.com/apache/datafusion/pull/20822
+[#20854]: https://github.com/apache/datafusion/pull/20854
+[#20925]: https://github.com/apache/datafusion/pull/20925
+[#18223]: https://github.com/apache/datafusion/issues/18223
+[#18136]: https://github.com/apache/datafusion/pull/18136
+[#20312]: https://github.com/apache/datafusion/pull/20312
+[#20789]: https://github.com/apache/datafusion/pull/20789
+[#20806]: https://github.com/apache/datafusion/pull/20806
+[#20904]: https://github.com/apache/datafusion/pull/20904
+[#19957]: https://github.com/apache/datafusion/pull/19957
+[#21077]: https://github.com/apache/datafusion/pull/21077
+[#21081]: https://github.com/apache/datafusion/pull/21081
+[#21110]: https://github.com/apache/datafusion/pull/21110
+[#21182]: https://github.com/apache/datafusion/pull/21182
+[#21184]: https://github.com/apache/datafusion/pull/21184
+[#21202]: https://github.com/apache/datafusion/pull/21202
+[#21240]: https://github.com/apache/datafusion/pull/21240
+[#21291]: https://github.com/apache/datafusion/pull/21291
+[#21323]: https://github.com/apache/datafusion/pull/21323
+[#21352]: https://github.com/apache/datafusion/pull/21352
+[#21362]: https://github.com/apache/datafusion/pull/21362
+[#21426]: https://github.com/apache/datafusion/pull/21426
+[#21483]: https://github.com/apache/datafusion/pull/21483
+[#21484]: https://github.com/apache/datafusion/pull/21484
+[#21517]: https://github.com/apache/datafusion/pull/21517
+[#21679]: https://github.com/apache/datafusion/pull/21679
+[#21956]: https://github.com/apache/datafusion/pull/21956
+[#22010]: https://github.com/apache/datafusion/pull/22010
+
+[datafusion-spark crate]: https://docs.rs/datafusion-spark/latest/datafusion_spark/index.html
diff --git a/content/images/datafusion-54.0.0/performance_over_time_clickbench.png b/content/images/datafusion-54.0.0/performance_over_time_clickbench.png
new file mode 100644
index 00000000..d2f039fd
Binary files /dev/null and b/content/images/datafusion-54.0.0/performance_over_time_clickbench.png differ