You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: this plan / investigation was AI generated based on my human direction and guidance.
Problem
datafusion-proto serializes every built-in PhysicalExpr through a single ~300-line downcast_ref chain in serialize_physical_expr_with_converter, with a symmetric match on ExprType in from_proto.rs. Because the serializer lives outside the crate where each expression is defined, every piece of internal state an expression wants to round-trip has to be pub.
The concrete incident that motivated this issue was #21807, where supporting proto round-tripping of DynamicFilterPhysicalExpr required exposing pub struct Inner, pub fn inner(), pub fn from_parts(), pub fn original_children(), pub fn remapped_children() — all marked "warning: not stable; proto-only" in the docstrings. See discussion at #21807 (comment) / #21807 (comment).
The underlying issue is not specific to DynamicFilterPhysicalExpr. Any stateful expression (one whose round-trip state isn't expressible through its normal public constructor) will hit the same wall. It also means:
Built-in expressions use a big central switch; third-party expressions go through PhysicalExtensionCodec::try_encode_expr. Two different shapes for the same thing.
Every new built-in PhysicalExpr requires editing datafusion-proto, which many contributors don't touch for their "real" change.
The .proto schema for each expression lives far away from the expression itself.
Proposal
Two changes, in sequence.
1. Extract datafusion-proto-models
Mirror the existing datafusion-proto-common split, but for the physical/logical plan schemas. The new crate contains only:
the .proto file(s) and the prost-generated Rust types,
optional pbjson/serde derives behind a json feature,
zero datafusion deps beyond datafusion-proto-common.
datafusion-proto keeps its current public API by re-exporting from datafusion-proto-models. Downstream consumers (datafusion-ffi, datafusion-examples, benchmarks) need no changes.
This is a pure refactor — no semantic change, no behavior change.
2. Add PhysicalExpr::to_proto (feature-gated)
Add a method to the PhysicalExpr trait, gated on a new proto feature:
Default returns Ok(None) → "fall through to the existing codec path" (matches today's behavior for extension expressions).
Explicit Ok(Some(...)) → the expression has serialized itself.
Expressions that should serialize but don't implement this override the default with internal_err!("{typename} does not implement to_proto").
PhysicalExprEncoder is a small trait defined in datafusion-proto-models, wrapping the existing PhysicalExtensionCodec + PhysicalProtoConverterExtension plumbing, with helpers for encode_child, encode_udf, encode_udaf, encode_udwf. This keeps physical-expr-common free of datafusion-proto as a dep.
The existing serialize_physical_expr_with_converter tries expr.to_proto(ctx) first; if it returns Ok(None), it falls through to the current downcast chain. This lets expressions migrate one at a time without breaking anything.
What this unlocks
Private state stays private.DynamicFilterPhysicalExpr::to_proto reads the RwLock directly — no more pub struct Inner. Once migrated, proto: serialize and dedupe dynamic filters v2 #21807's "pub for proto" scaffolding (Inner, from_parts, inner(), original_children, remapped_children) can be reverted.
One-file changes. Adding serialization for a new expression is a method impl next to the expression, not a round trip through datafusion-proto.
Parity between built-in and third-party. Everyone uses the same hook shape.
Decode side (follow-up)
The encode-side win is clear. The decode side is still a central match on ExprType, which is fine — oneof ExprType gives us an exhaustive Rust enum, a strict improvement over today's runtime downcast chain.
As a follow-up (probably best after the encode migrations land), we can push decoding back to the expressions too, via associated fns like:
The central match becomes a dispatch table. The main payoff is that public god-constructors for private state go away: DynamicFilterPhysicalExpr::from_proto reads the proto fields and builds Inner internally, so we can fully delete the from_parts/Inner-as-pub scaffolding.
Alternatives considered
Bytes-level try_encode_self hook, no prost in physical-expr. Each expression encodes into an opaque Vec<u8>. Preserves privacy without pulling prost into more crates. Downside: you lose the single .proto file as the schema source of truth — cross-language debuggability and interop suffer. Rejected.
Keep the central switch, mark unstable accessors #[doc(hidden)]. Cheap. Doesn't fix the third-party/built-in asymmetry or the central-edit-per-new-expression problem. Reasonable as a stopgap, not a solution.
Separate PhysicalExprProto trait with a blanket fallback impl. Would avoid adding to PhysicalExpr. Requires a runtime "does it implement this" check (narrow Any cast). Slightly more complex than option (2) above and delivers the same thing.
Scope
This issue proposes the change only for PhysicalExpr. ExecutionPlan, LogicalPlan, and logical Expr have the same shape and would benefit from the same treatment, but each is its own migration (~40 ExecutionPlan impls alone). Worth tackling after the PhysicalExpr path is proven.
datafusion-substrait is unaffected — it operates independently and does not share code with datafusion-proto.
Open questions
Should PhysicalExprEncoder live in datafusion-proto-models or be a narrow sub-trait that PhysicalExtensionCodec re-implements? Leaning toward the former for minimum dep surface.
Window / aggregate expressions are serialized through a separate code path in to_proto.rs and are not regular PhysicalExpr impls. Probably get a parallel WindowExpr::to_proto hook, or stay central. Defer until the core PhysicalExpr migration is done.
proto feature default. I'd default it off on physical-expr-common/physical-expr/physical-plan (so developers who don't care pay nothing) and flip it on from datafusion-proto. CI would test with the feature on; off-mode is a "it still compiles" smoke test.
Plan: split datafusion-proto and add PhysicalExpr::to_proto
Context
Today, datafusion-proto serializes every built-in PhysicalExpr via a ~300-line downcast_ref chain in datafusion/proto/src/physical_plan/to_proto.rs::serialize_physical_expr_with_converter (and a symmetric ExprType match in from_proto.rs). Because the serializer lives outside the crate where each expression is defined, every piece of internal state an expression wants to round-trip must be pub. The motivating example is the DynamicFilterPhysicalExpr PR (#21807), which had to expose pub struct Inner, pub fn inner(), pub fn from_parts(), pub fn original_children(), pub fn remapped_children() — all "warning: not stable; proto-only". The same shape will recur for every stateful expression.
Two problems to solve together:
Private state stays private. Serialization code should live next to the data it serializes.
Central switch is a maintenance sink. 20 production PhysicalExpr impls already; every new one requires editing proto/src/physical_plan/to_proto.rs + from_proto.rs. Third parties have to use PhysicalExtensionCodec with a different shape than built-ins.
Decision (user-approved):
Extract datafusion-proto-models (mirror of datafusion-proto-common): just the prost-generated structs + optional pbjson. No datafusion deps beyond proto-common.
Add fn to_proto(&self, ctx: &ProtoEncodeCtx<'_>) -> Result<PhysicalExprNode> directly on the PhysicalExpr trait, #[cfg(feature = "proto")]. Default impl returns internal_err!("<typename> does not implement proto serialization").
CI tests with the proto feature on; off-mode is a best-effort "it still compiles" signal.
This plan covers PhysicalExpr only. ExecutionPlan, LogicalPlan, and logical Expr have the same shape and can follow once this lands, but each is its own body of work (~40 ExecutionPlan impls alone).
Scope boundaries
In scope: PhysicalExpr encode path, infrastructure for the decode path, migration of all ~18 currently-serialized PhysicalExpr types.
Out of scope for this plan:ExecutionPlan and logical-side refactors, changes to PhysicalExtensionCodec shape for third-party codecs, datafusion-substrait (operates independently; unaffected).
Sequencing
Three PRs, each independently mergeable and reviewable.
PR 1 — Extract datafusion-proto-models
Pure refactor: move generated prost code + proto schema into a new crate. No semantic change. datafusion-proto becomes a thin layer on top.
proto/datafusion.proto — moved from datafusion/proto/proto/datafusion.proto.
gen/src/main.rs + gen/Cargo.toml — moved from datafusion/proto/gen/.
src/lib.rs — re-exports generated types under a protobuf module (same shape as datafusion-proto-common's protobuf_common).
src/generated/prost.rs + src/generated/pbjson.rs — moved from datafusion/proto/src/generated/.
regen.sh — mirrored from datafusion/proto-common/regen.sh.
Modified:
Cargo.toml (workspace root) — register datafusion/proto-models member + workspace dep entry.
datafusion/proto/Cargo.toml — depend on datafusion-proto-models. Remove prost direct dep if fully delegated (keep if datafusion-proto itself still uses prost::Message directly — it does, for DataEncoderTuple, so keep).
datafusion/proto/src/lib.rs — pub use datafusion_proto_models::protobuf::*; (plus keep existing proto_error, FromProtoError, ToProtoError re-exports from proto-common). Maintain the existing public protobuf module so downstream users don't break.
datafusion/proto/regen.sh — update to invoke proto-models regen.
Invariants: public API surface of datafusion-proto is unchanged. datafusion/ffi, datafusion-examples, benchmarks — the only three external consumers — keep compiling without any Cargo.toml change.
Introduce the trait method. Keep the existing downcast chain in place as a fallback so this PR is non-breaking — no expression migrates yet.
Modified:
datafusion/physical-expr-common/Cargo.toml — add optional datafusion-proto-models dep + proto feature. Feature is enabled in datafusion/physical-expr and datafusion/physical-plan when they want serialization (gated through them as well).
#[cfg(feature = "proto")]fnto_proto(&self,_ctx:&datafusion_proto_models::ProtoEncodeCtx<'_>,) -> Result<datafusion_proto_models::protobuf::PhysicalExprNode>{internal_err!("PhysicalExpr `{}` does not implement to_proto; \ implement it or use PhysicalExtensionCodec",
std::any::type_name::<Self>(),)}
New (in datafusion-proto-models or a small sibling crate — TBD during implementation):
ProtoEncodeCtx<'a> — thin context type carrying &'a dyn PhysicalExtensionCodec and a reference to the PhysicalProtoConverterExtension converter, exposing helpers:
fn encode_child(&self, expr: &Arc<dyn PhysicalExpr>) -> Result<PhysicalExprNode> — recurses through the converter (preserves dedup support).
Open question for implementation: ProtoEncodeCtx references PhysicalExtensionCodec, which is defined in datafusion-proto. To avoid a cyclic dep, either:
(a) Move PhysicalExtensionCodec trait (or a stripped-down ProtoCodecExpr sub-trait) into proto-models, or
(b) Keep ProtoEncodeCtx in datafusion-proto and have PhysicalExpr::to_proto take &dyn PhysicalExprEncoder (a narrow trait defined in proto-models) that the real ProtoEncodeCtx implements.
datafusion/proto/src/physical_plan/to_proto.rs::serialize_physical_expr_with_converter — before the downcast chain, try expr.to_proto(&ctx). If it returns the default "not implemented" error (detect via sentinel error kind, or via a Result<Option<PhysicalExprNode>> where Ok(None) means "fall through"), fall through to the existing chain. Going with Ok(None) is cleaner.
Decode path (unchanged structurally): keep parse_physical_expr_with_converter as the central match proto.expr_type in from_proto.rs. Per-expression decoders become associated fns fn try_from_proto(node: &X, ctx: &ProtoDecodeCtx<'_>) -> Result<Arc<Self>> colocated with the type (next PR). The central match stays exhaustive on ExprType, which is a strict improvement over today's runtime downcast chain.
Follow-up for decode side (not blocking): once expressions own both ends, consider formalizing the decoder as a sibling trait so the symmetry with to_proto is explicit — e.g.
// On each concrete PhysicalExpr:implBinaryExpr{fnfrom_proto(proto:&PhysicalBinaryExprNode,ctx:&ProtoDecodeCtx<'_>) -> Result<Arc<Self>>{ ...}}
and the central match in from_proto.rs becomes a dispatch table: ExprType::BinaryExpr(n) => BinaryExpr::from_proto(n, ctx), etc. The key payoff is that no public god-constructor is needed for private state — e.g. DynamicFilterPhysicalExpr::from_proto reads the proto fields and constructs Inner directly, so we can delete from_parts/Inner-as-pub entirely. Doing this after PR 3+ migrations rather than during keeps each PR small.
PR 3+ — Migrate expressions one at a time
One commit per type (or small group). Each migration:
Implement fn to_proto(&self, ctx) -> Result<Option<PhysicalExprNode>> on the expression.
Implement fn try_from_proto(node, ctx) -> Result<Arc<Self>> as an associated fn.
Remove the corresponding downcast_ref::<T>() arm from to_proto.rs.
Change the corresponding arm in from_proto.rs's central match to call Type::try_from_proto.
Roll back any "pub for proto" scaffolding that is no longer needed (e.g., eventually the pub struct Inner + from_parts + inner() bits added in proto: serialize and dedupe dynamic filters v2 #21807 can become private fields, since DynamicFilterPhysicalExpr::to_proto reads the RwLock directly).
ScalarFunctionExpr — involves UDF encoding via ctx.encode_udf(...).
HashExpr, HashTableLookupExpr — lives in datafusion-physical-plan, so enable the proto feature there. HashTableLookupExpr's current "replace with lit(true)" hack (to_proto.rs:277) moves into its own to_proto impl.
Window + aggregate exprs (PlainAggregateWindowExpr, SlidingAggregateWindowExpr, StandardWindowExpr, WindowUDFExpr, AggregateFunctionExpr) — shaped slightly differently (not PhysicalExpr impls in the same way); may need a sibling trait or stay central.
After all in-workspace expressions migrate, the fallback downcast chain in to_proto.rs shrinks to just HashTableLookupExpr's special case (and eventually goes away entirely).
PhysicalExtensionCodec::try_encode_udf/udaf/udwf (datafusion/proto/src/physical_plan/mod.rs:3683–3723) — wrapped by ProtoEncodeCtx helpers, not changed.
PhysicalProtoConverterExtension (datafusion/proto/src/physical_plan/mod.rs:3751) — stays as-is; its physical_expr_to_proto becomes the entry point that calls into the trait method.
DeduplicatingSerializer/DeduplicatingDeserializer (mod.rs:3844+) — unchanged. ProtoEncodeCtx::encode_child routes through them, preserving dedup.
serialize_physical_exprs helper (to_proto.rs:220) — stays, becomes a thin wrapper over to_proto.
Verification
PR 1 (proto-models extraction):
cargo check -p datafusion-proto-models
cargo check -p datafusion-proto
cargo test -p datafusion-proto
cargo test -p datafusion-ffi
cargo check -p datafusion-examples
# Regen to confirm reproducibility:
./datafusion/proto-models/regen.sh && git diff --exit-code
Confirm no pub surface of datafusion-proto changed (cargo public-api or manual diff of src/lib.rs).
PR 2 (trait hook added, no migration):
cargo check -p datafusion-physical-expr-common --features proto
cargo check -p datafusion-physical-expr --features proto
cargo test -p datafusion-proto # all roundtrip tests still pass (fallback path)
cargo test -p datafusion-proto --no-default-features # smoke: still compiles
Also: verify the proto feature is wired into datafusion-proto's dev build so its roundtrip tests go through the trait hook with Ok(None) fallback — behavior identical to today.
PR 3+ (per-expression migration):
For each migrated expression, the existing roundtrip tests in datafusion/proto/tests/cases/roundtrip_physical_plan.rs and roundtrip_physical_expr.rs must continue to pass. Because the central from_proto.rs match stays exhaustive, missing migration arms cause compile errors, not silent regressions.
After DynamicFilterPhysicalExpr migrates: revert the "pub for proto" scaffolding added in #21807 (Inner becomes private, from_parts/inner()/original_children()/remapped_children() removed) and run the existing dynamic-filter roundtrip tests.
End-to-end signal: workspace CI runs ./dev/rust_lint.sh + cargo test --workspace --features proto green; cargo test -p datafusion-proto green on both feature settings.
Risks / open items
PhysicalExprEncoder trait placement (see open question above) — decided during PR 2 implementation. Pre-decision: put it in datafusion-proto-models so physical-expr-common has one dep under proto feature.
Window / aggregate expressions are not regular PhysicalExpr impls and are serialized through a different code path in to_proto.rs (lines 117–151). Migrating them may need a parallel WindowExpr::to_proto or can stay central — defer decision to PR 4+.
ForeignPhysicalExpr (datafusion-ffi) is a PhysicalExpr impl but not serialized today (not in the downcast chain). The default Ok(None) behavior preserves that — nothing to do.
datafusion-ffi depends on datafusion-proto — the re-export surface has to stay stable through PR 1. Verified above.
Feature-flag propagation is the main source of mechanical churn. Keep proto off by default on physical-expr-common/physical-expr/physical-plan; datafusion-proto flips it on. Developers who run cargo test --workspace without features get the no-proto variant; CI should explicitly --features proto for the proto crate's test job.
Note: this plan / investigation was AI generated based on my human direction and guidance.
Problem
datafusion-protoserializes every built-inPhysicalExprthrough a single ~300-linedowncast_refchain inserialize_physical_expr_with_converter, with a symmetricmatchonExprTypeinfrom_proto.rs. Because the serializer lives outside the crate where each expression is defined, every piece of internal state an expression wants to round-trip has to bepub.The concrete incident that motivated this issue was #21807, where supporting proto round-tripping of
DynamicFilterPhysicalExprrequired exposingpub struct Inner,pub fn inner(),pub fn from_parts(),pub fn original_children(),pub fn remapped_children()— all marked "warning: not stable; proto-only" in the docstrings. See discussion at #21807 (comment) / #21807 (comment).The underlying issue is not specific to
DynamicFilterPhysicalExpr. Any stateful expression (one whose round-trip state isn't expressible through its normal public constructor) will hit the same wall. It also means:PhysicalExtensionCodec::try_encode_expr. Two different shapes for the same thing.PhysicalExprrequires editingdatafusion-proto, which many contributors don't touch for their "real" change..protoschema for each expression lives far away from the expression itself.Proposal
Two changes, in sequence.
1. Extract
datafusion-proto-modelsMirror the existing
datafusion-proto-commonsplit, but for the physical/logical plan schemas. The new crate contains only:.protofile(s) and theprost-generated Rust types,pbjson/serdederives behind ajsonfeature,datafusion-proto-common.datafusion-protokeeps its current public API by re-exporting fromdatafusion-proto-models. Downstream consumers (datafusion-ffi,datafusion-examples,benchmarks) need no changes.This is a pure refactor — no semantic change, no behavior change.
2. Add
PhysicalExpr::to_proto(feature-gated)Add a method to the
PhysicalExprtrait, gated on a newprotofeature:Ok(None)→ "fall through to the existing codec path" (matches today's behavior for extension expressions).Ok(Some(...))→ the expression has serialized itself.internal_err!("{typename} does not implement to_proto").PhysicalExprEncoderis a small trait defined indatafusion-proto-models, wrapping the existingPhysicalExtensionCodec+PhysicalProtoConverterExtensionplumbing, with helpers forencode_child,encode_udf,encode_udaf,encode_udwf. This keepsphysical-expr-commonfree ofdatafusion-protoas a dep.The existing
serialize_physical_expr_with_convertertriesexpr.to_proto(ctx)first; if it returnsOk(None), it falls through to the current downcast chain. This lets expressions migrate one at a time without breaking anything.What this unlocks
DynamicFilterPhysicalExpr::to_protoreads theRwLockdirectly — no morepub struct Inner. Once migrated, proto: serialize and dedupe dynamic filters v2 #21807's "pub for proto" scaffolding (Inner,from_parts,inner(),original_children,remapped_children) can be reverted.datafusion-proto.Decode side (follow-up)
The encode-side win is clear. The decode side is still a central
matchonExprType, which is fine —oneof ExprTypegives us an exhaustive Rust enum, a strict improvement over today's runtime downcast chain.As a follow-up (probably best after the encode migrations land), we can push decoding back to the expressions too, via associated fns like:
The central match becomes a dispatch table. The main payoff is that public god-constructors for private state go away:
DynamicFilterPhysicalExpr::from_protoreads the proto fields and buildsInnerinternally, so we can fully delete thefrom_parts/Inner-as-pub scaffolding.Alternatives considered
try_encode_selfhook, noprostinphysical-expr. Each expression encodes into an opaqueVec<u8>. Preserves privacy without pullingprostinto more crates. Downside: you lose the single.protofile as the schema source of truth — cross-language debuggability and interop suffer. Rejected.#[doc(hidden)]. Cheap. Doesn't fix the third-party/built-in asymmetry or the central-edit-per-new-expression problem. Reasonable as a stopgap, not a solution.PhysicalExprPrototrait with a blanket fallback impl. Would avoid adding toPhysicalExpr. Requires a runtime "does it implement this" check (narrowAnycast). Slightly more complex than option (2) above and delivers the same thing.Scope
This issue proposes the change only for
PhysicalExpr.ExecutionPlan,LogicalPlan, and logicalExprhave the same shape and would benefit from the same treatment, but each is its own migration (~40ExecutionPlanimpls alone). Worth tackling after the PhysicalExpr path is proven.datafusion-substraitis unaffected — it operates independently and does not share code withdatafusion-proto.Open questions
PhysicalExprEncoderlive indatafusion-proto-modelsor be a narrow sub-trait thatPhysicalExtensionCodecre-implements? Leaning toward the former for minimum dep surface.to_proto.rsand are not regularPhysicalExprimpls. Probably get a parallelWindowExpr::to_protohook, or stay central. Defer until the corePhysicalExprmigration is done.protofeature default. I'd default it off onphysical-expr-common/physical-expr/physical-plan(so developers who don't care pay nothing) and flip it on fromdatafusion-proto. CI would test with the feature on; off-mode is a "it still compiles" smoke test.Related
datafusiondependency fromdatafusion-proto#17713 (removedatafusiondependency fromdatafusion-proto) — this refactor is consistent with that direction.Full implementation plan
Long Claude generated plan
Plan: split
datafusion-protoand addPhysicalExpr::to_protoContext
Today,
datafusion-protoserializes every built-inPhysicalExprvia a ~300-linedowncast_refchain indatafusion/proto/src/physical_plan/to_proto.rs::serialize_physical_expr_with_converter(and a symmetricExprTypematch infrom_proto.rs). Because the serializer lives outside the crate where each expression is defined, every piece of internal state an expression wants to round-trip must bepub. The motivating example is theDynamicFilterPhysicalExprPR (#21807), which had to exposepub struct Inner,pub fn inner(),pub fn from_parts(),pub fn original_children(),pub fn remapped_children()— all "warning: not stable; proto-only". The same shape will recur for every stateful expression.Two problems to solve together:
PhysicalExprimpls already; every new one requires editingproto/src/physical_plan/to_proto.rs+from_proto.rs. Third parties have to usePhysicalExtensionCodecwith a different shape than built-ins.Decision (user-approved):
datafusion-proto-models(mirror ofdatafusion-proto-common): just theprost-generated structs + optionalpbjson. No datafusion deps beyondproto-common.fn to_proto(&self, ctx: &ProtoEncodeCtx<'_>) -> Result<PhysicalExprNode>directly on thePhysicalExprtrait,#[cfg(feature = "proto")]. Default impl returnsinternal_err!("<typename> does not implement proto serialization").protofeature on; off-mode is a best-effort "it still compiles" signal.This plan covers PhysicalExpr only.
ExecutionPlan,LogicalPlan, and logicalExprhave the same shape and can follow once this lands, but each is its own body of work (~40ExecutionPlanimpls alone).Scope boundaries
PhysicalExprtypes.ExecutionPlanand logical-side refactors, changes toPhysicalExtensionCodecshape for third-party codecs,datafusion-substrait(operates independently; unaffected).Sequencing
Three PRs, each independently mergeable and reviewable.
PR 1 — Extract
datafusion-proto-modelsPure refactor: move generated prost code + proto schema into a new crate. No semantic change.
datafusion-protobecomes a thin layer on top.New crate:
datafusion/proto-models/Cargo.toml— deps:prost,arrow(workspace),datafusion-proto-common, optionalpbjson/serde/serde_jsonbehindjsonfeature. Nodatafusion-common, no datafusion crates beyondproto-common. Mirrordatafusion/proto-common/Cargo.toml.proto/datafusion.proto— moved fromdatafusion/proto/proto/datafusion.proto.gen/src/main.rs+gen/Cargo.toml— moved fromdatafusion/proto/gen/.src/lib.rs— re-exports generated types under aprotobufmodule (same shape asdatafusion-proto-common'sprotobuf_common).src/generated/prost.rs+src/generated/pbjson.rs— moved fromdatafusion/proto/src/generated/.regen.sh— mirrored fromdatafusion/proto-common/regen.sh.Modified:
Cargo.toml(workspace root) — registerdatafusion/proto-modelsmember + workspace dep entry.datafusion/proto/Cargo.toml— depend ondatafusion-proto-models. Removeprostdirect dep if fully delegated (keep ifdatafusion-protoitself still usesprost::Messagedirectly — it does, forDataEncoderTuple, so keep).datafusion/proto/src/lib.rs—pub use datafusion_proto_models::protobuf::*;(plus keep existingproto_error,FromProtoError,ToProtoErrorre-exports fromproto-common). Maintain the existing publicprotobufmodule so downstream users don't break.datafusion/proto/regen.sh— update to invokeproto-modelsregen.datafusion/proto/gen/— deleted (moved).datafusion/proto/proto/datafusion.proto— deleted (moved).datafusion/proto/src/generated/— deleted (moved).Invariants: public API surface of
datafusion-protois unchanged.datafusion/ffi,datafusion-examples,benchmarks— the only three external consumers — keep compiling without any Cargo.toml change.PR 2 — Add
PhysicalExpr::to_protohook +ProtoEncodeCtxIntroduce the trait method. Keep the existing downcast chain in place as a fallback so this PR is non-breaking — no expression migrates yet.
Modified:
datafusion/physical-expr-common/Cargo.toml— add optionaldatafusion-proto-modelsdep +protofeature. Feature is enabled indatafusion/physical-expranddatafusion/physical-planwhen they want serialization (gated through them as well).datafusion/physical-expr-common/src/physical_expr.rs— insidepub trait PhysicalExpr, add:New (in
datafusion-proto-modelsor a small sibling crate — TBD during implementation):ProtoEncodeCtx<'a>— thin context type carrying&'a dyn PhysicalExtensionCodecand a reference to thePhysicalProtoConverterExtensionconverter, exposing helpers:fn encode_child(&self, expr: &Arc<dyn PhysicalExpr>) -> Result<PhysicalExprNode>— recurses through the converter (preserves dedup support).fn encode_udf(&self, udf: &ScalarUDF) -> Result<Vec<u8>>— wrapscodec.try_encode_udf.encode_udaf/encode_udwf.Open question for implementation:
ProtoEncodeCtxreferencesPhysicalExtensionCodec, which is defined indatafusion-proto. To avoid a cyclic dep, either:PhysicalExtensionCodectrait (or a stripped-downProtoCodecExprsub-trait) intoproto-models, orProtoEncodeCtxindatafusion-protoand havePhysicalExpr::to_prototake&dyn PhysicalExprEncoder(a narrow trait defined inproto-models) that the realProtoEncodeCtximplements.Leaning toward (b) — cheaper surface change, smaller blast radius on
PhysicalExtensionCodec.Modified:
datafusion/proto/src/physical_plan/to_proto.rs::serialize_physical_expr_with_converter— before the downcast chain, tryexpr.to_proto(&ctx). If it returns the default "not implemented" error (detect via sentinel error kind, or via aResult<Option<PhysicalExprNode>>whereOk(None)means "fall through"), fall through to the existing chain. Going withOk(None)is cleaner.Revised trait signature:
Default = fall through. Explicit
Ok(Some(...))= serialized.Err(...)= genuine error.Decode path (unchanged structurally): keep
parse_physical_expr_with_converteras the centralmatch proto.expr_typeinfrom_proto.rs. Per-expression decoders become associated fnsfn try_from_proto(node: &X, ctx: &ProtoDecodeCtx<'_>) -> Result<Arc<Self>>colocated with the type (next PR). The central match stays exhaustive onExprType, which is a strict improvement over today's runtime downcast chain.Follow-up for decode side (not blocking): once expressions own both ends, consider formalizing the decoder as a sibling trait so the symmetry with
to_protois explicit — e.g.and the central match in
from_proto.rsbecomes a dispatch table:ExprType::BinaryExpr(n) => BinaryExpr::from_proto(n, ctx), etc. The key payoff is that no public god-constructor is needed for private state — e.g.DynamicFilterPhysicalExpr::from_protoreads the proto fields and constructsInnerdirectly, so we can deletefrom_parts/Inner-as-pub entirely. Doing this after PR 3+ migrations rather than during keeps each PR small.PR 3+ — Migrate expressions one at a time
One commit per type (or small group). Each migration:
fn to_proto(&self, ctx) -> Result<Option<PhysicalExprNode>>on the expression.fn try_from_proto(node, ctx) -> Result<Arc<Self>>as an associated fn.downcast_ref::<T>()arm fromto_proto.rs.from_proto.rs's central match to callType::try_from_proto.pub struct Inner+from_parts+inner()bits added in proto: serialize and dedupe dynamic filters v2 #21807 can become private fields, sinceDynamicFilterPhysicalExpr::to_protoreads theRwLockdirectly).Suggested migration order (easiest → hardest):
Column,UnKnownColumn,Literal,NoOp— trivial fields.NotExpr,IsNullExpr,IsNotNullExpr,NegativeExpr— single child.BinaryExpr(the linearization logic moves too),CastExpr,TryCastExpr,LikeExpr,InListExpr,CaseExpr.ScalarFunctionExpr— involves UDF encoding viactx.encode_udf(...).HashExpr,HashTableLookupExpr— lives indatafusion-physical-plan, so enable theprotofeature there.HashTableLookupExpr's current "replace withlit(true)" hack (to_proto.rs:277) moves into its ownto_protoimpl.DynamicFilterPhysicalExpr— after migration, revert proto: serialize and dedupe dynamic filters v2 #21807'spub Inner/from_partsscaffolding.PlainAggregateWindowExpr,SlidingAggregateWindowExpr,StandardWindowExpr,WindowUDFExpr,AggregateFunctionExpr) — shaped slightly differently (notPhysicalExprimpls in the same way); may need a sibling trait or stay central.After all in-workspace expressions migrate, the fallback downcast chain in
to_proto.rsshrinks to justHashTableLookupExpr's special case (and eventually goes away entirely).Critical files
Created:
datafusion/proto-models/Cargo.tomldatafusion/proto-models/proto/datafusion.proto(moved)datafusion/proto-models/gen/{Cargo.toml,src/main.rs}(moved)datafusion/proto-models/src/{lib.rs,generated/*.rs}(moved)datafusion/proto-models/regen.shModified (PR 1):
Cargo.toml(workspace root)datafusion/proto/Cargo.tomldatafusion/proto/src/lib.rsdatafusion/proto/regen.shModified (PR 2):
datafusion/physical-expr-common/Cargo.tomldatafusion/physical-expr-common/src/physical_expr.rs(trait definition, line 163+)datafusion/physical-expr/Cargo.toml(protofeature passthrough)datafusion/physical-plan/Cargo.toml(protofeature passthrough — needed forHashExpr/HashTableLookupExprmigration in PR 3)datafusion/proto/src/physical_plan/to_proto.rs(serialize_physical_expr_with_convertertries trait method first)PhysicalExprEncodertrait +ProtoEncodeCtx(location TBD; see open question above)Modified (PRs 3+): each expression's source file + corresponding removal from
datafusion/proto/src/physical_plan/{to_proto.rs,from_proto.rs}.Reused existing code
datafusion-proto-commonorganization (proto/,gen/,src/generated/,regen.sh,src/{from_proto,to_proto}/) — mirror forproto-models. Ref:datafusion/proto-common/.PhysicalExtensionCodec::try_encode_udf/udaf/udwf(datafusion/proto/src/physical_plan/mod.rs:3683–3723) — wrapped byProtoEncodeCtxhelpers, not changed.PhysicalProtoConverterExtension(datafusion/proto/src/physical_plan/mod.rs:3751) — stays as-is; itsphysical_expr_to_protobecomes the entry point that calls into the trait method.DeduplicatingSerializer/DeduplicatingDeserializer(mod.rs:3844+) — unchanged.ProtoEncodeCtx::encode_childroutes through them, preserving dedup.serialize_physical_exprshelper (to_proto.rs:220) — stays, becomes a thin wrapper overto_proto.Verification
PR 1 (
proto-modelsextraction):Confirm no
pubsurface ofdatafusion-protochanged (cargo public-apior manual diff ofsrc/lib.rs).PR 2 (trait hook added, no migration):
Also: verify the
protofeature is wired intodatafusion-proto's dev build so its roundtrip tests go through the trait hook withOk(None)fallback — behavior identical to today.PR 3+ (per-expression migration):
For each migrated expression, the existing roundtrip tests in
datafusion/proto/tests/cases/roundtrip_physical_plan.rsandroundtrip_physical_expr.rsmust continue to pass. Because the centralfrom_proto.rsmatch stays exhaustive, missing migration arms cause compile errors, not silent regressions.After
DynamicFilterPhysicalExprmigrates: revert the "pub for proto" scaffolding added in #21807 (Innerbecomes private,from_parts/inner()/original_children()/remapped_children()removed) and run the existing dynamic-filter roundtrip tests.End-to-end signal: workspace CI runs
./dev/rust_lint.sh+cargo test --workspace --features protogreen;cargo test -p datafusion-protogreen on both feature settings.Risks / open items
PhysicalExprEncodertrait placement (see open question above) — decided during PR 2 implementation. Pre-decision: put it indatafusion-proto-modelssophysical-expr-commonhas one dep underprotofeature.PhysicalExprimpls and are serialized through a different code path into_proto.rs(lines 117–151). Migrating them may need a parallelWindowExpr::to_protoor can stay central — defer decision to PR 4+.ForeignPhysicalExpr(datafusion-ffi) is aPhysicalExprimpl but not serialized today (not in the downcast chain). The defaultOk(None)behavior preserves that — nothing to do.datafusion-ffidepends ondatafusion-proto— the re-export surface has to stay stable through PR 1. Verified above.protooff by default onphysical-expr-common/physical-expr/physical-plan;datafusion-protoflips it on. Developers who runcargo test --workspacewithout features get the no-proto variant; CI should explicitly--features protofor the proto crate's test job.