Describe the bug
Unparser::expr_to_sql overflows the stack on deeply-nested expressions, and — more importantly — the recursive_protection feature does not prevent it for a large class of expressions, even though it is enabled by default in the datafusion crate.
expr_to_sql_inner is annotated with #[cfg_attr(feature = "recursive_protection", recursive::recursive)], so direct inner -> inner recursion (binary ops, etc.) is protected. However, several recursion paths route back through the public Unparser::expr_to_sql, which is not annotated:
function_args_to_sql (every scalar function argument) -> self.expr_to_sql(e)
make_array_to_sql, array_element_to_sql, map_to_sql, get_field_to_sql -> self.expr_to_sql(...)
- dialect scalar-function overrides, e.g.
PostgreSqlDialect::array_has_to_sql_any -> unparser.expr_to_sql(needle) / expr_to_sql(haystack)
Because the recursive trampoline only kicks in on the annotated expr_to_sql_inner, recursing through the un-annotated public expr_to_sql means the stack-growing protection is effectively bypassed for nested function arguments. So a query whose unparse path goes through nested scalar functions overflows the real OS stack regardless of recursive_protection.
This is related to #19787 (general "reduce recursion level of the unparser") but is a distinct, concrete defect: the protection that already exists is silently defeated on the function-argument / dialect-override paths.
To Reproduce
Self-contained, public-API reproduction (uses datafusion-functions-nested for array_has, but any nested scalar function — e.g. nested array_element, make_array, or even nested Expr arithmetic via function_args_to_sql-bound functions — exhibits the same behavior; a deep chain of plain binary + overflows at the same rate too):
use datafusion::logical_expr::{col, lit, Expr};
use datafusion::sql::unparser::{dialect::PostgreSqlDialect, Unparser};
use datafusion_functions_nested::expr_fn::array_has;
fn main() {
const DEPTH: usize = 2_000;
// A simple linear chain: array_has(array_has(... array_has(col, 'x') ...), 'x')
let mut e: Expr = col("c");
for _ in 0..DEPTH {
e = array_has(e, lit("x"));
}
// Unparse on a small thread stack to make the overflow fast/deterministic.
// (It also overflows on the default stack at a larger DEPTH.)
let handle = std::thread::Builder::new()
.stack_size(512 * 1024)
.spawn(move || {
let dialect = PostgreSqlDialect {};
let unparser = Unparser::new(&dialect);
unparser.expr_to_sql(&e).map(|sql| sql.to_string())
})
.unwrap();
let _ = handle.join(); // aborts the process before returning
}
Output:
thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow, aborting
The same overflow occurs whether or not the recursive_protection feature is enabled, and with both PostgreSqlDialect and DefaultDialect. The cost is roughly linear in nesting depth (measured ~200 KB of stack per nested level in a debug build), so the protection genuinely never engages on this path — it does not merely shift the threshold.
Expected behavior
With recursive_protection enabled (the default for the datafusion crate), unparsing a deeply-nested but otherwise valid expression should not overflow the OS stack — it should either succeed (via the stack-growing trampoline) or return a Result::Err for a recursion-limit, never abort the process.
Additional context
- datafusion commit:
0fc55d06797cac2190239c82b2d5f0a06b4d42e7 (workspace version 54.0.0)
- Affected code:
datafusion/sql/src/unparser/expr.rs: public expr_to_sql (~L96) is not #[recursive]; expr_to_sql_inner (~L105) is. The leak sites are the self.expr_to_sql(...) calls in make_array_to_sql (~L663), array_element_to_sql (~L690), map_to_sql (~L784/L792), and function_args_to_sql (~L926).
datafusion/sql/src/unparser/dialect.rs: PostgreSqlDialect::array_has_to_sql_any (~L409) calls unparser.expr_to_sql(needle) / expr_to_sql(haystack).
Likely fixes: have these internal recursion sites call the (annotated) expr_to_sql_inner instead of the public expr_to_sql, and/or move the #[recursive] annotation onto the public expr_to_sql (or add an annotated private recursion entry point that both use). Note expr_to_sql also runs remove_unnecessary_nesting in pretty mode, which would need the same treatment.
Describe the bug
Unparser::expr_to_sqloverflows the stack on deeply-nested expressions, and — more importantly — therecursive_protectionfeature does not prevent it for a large class of expressions, even though it is enabled by default in thedatafusioncrate.expr_to_sql_inneris annotated with#[cfg_attr(feature = "recursive_protection", recursive::recursive)], so directinner -> innerrecursion (binary ops, etc.) is protected. However, several recursion paths route back through the publicUnparser::expr_to_sql, which is not annotated:function_args_to_sql(every scalar function argument) ->self.expr_to_sql(e)make_array_to_sql,array_element_to_sql,map_to_sql,get_field_to_sql->self.expr_to_sql(...)PostgreSqlDialect::array_has_to_sql_any->unparser.expr_to_sql(needle)/expr_to_sql(haystack)Because the
recursivetrampoline only kicks in on the annotatedexpr_to_sql_inner, recursing through the un-annotated publicexpr_to_sqlmeans the stack-growing protection is effectively bypassed for nested function arguments. So a query whose unparse path goes through nested scalar functions overflows the real OS stack regardless ofrecursive_protection.This is related to #19787 (general "reduce recursion level of the unparser") but is a distinct, concrete defect: the protection that already exists is silently defeated on the function-argument / dialect-override paths.
To Reproduce
Self-contained, public-API reproduction (uses
datafusion-functions-nestedforarray_has, but any nested scalar function — e.g. nestedarray_element,make_array, or even nestedExprarithmetic viafunction_args_to_sql-bound functions — exhibits the same behavior; a deep chain of plain binary+overflows at the same rate too):Output:
The same overflow occurs whether or not the
recursive_protectionfeature is enabled, and with bothPostgreSqlDialectandDefaultDialect. The cost is roughly linear in nesting depth (measured ~200 KB of stack per nested level in a debug build), so the protection genuinely never engages on this path — it does not merely shift the threshold.Expected behavior
With
recursive_protectionenabled (the default for thedatafusioncrate), unparsing a deeply-nested but otherwise valid expression should not overflow the OS stack — it should either succeed (via the stack-growing trampoline) or return aResult::Errfor a recursion-limit, never abort the process.Additional context
0fc55d06797cac2190239c82b2d5f0a06b4d42e7(workspace version54.0.0)datafusion/sql/src/unparser/expr.rs: publicexpr_to_sql(~L96) is not#[recursive];expr_to_sql_inner(~L105) is. The leak sites are theself.expr_to_sql(...)calls inmake_array_to_sql(~L663),array_element_to_sql(~L690),map_to_sql(~L784/L792), andfunction_args_to_sql(~L926).datafusion/sql/src/unparser/dialect.rs:PostgreSqlDialect::array_has_to_sql_any(~L409) callsunparser.expr_to_sql(needle)/expr_to_sql(haystack).Likely fixes: have these internal recursion sites call the (annotated)
expr_to_sql_innerinstead of the publicexpr_to_sql, and/or move the#[recursive]annotation onto the publicexpr_to_sql(or add an annotated private recursion entry point that both use). Noteexpr_to_sqlalso runsremove_unnecessary_nestingin pretty mode, which would need the same treatment.