Skip to content

Unparser stack overflow: recursive_protection is bypassed for nested function arguments / dialect overrides (public expr_to_sql not #[recursive]) #23056

@adriangb

Description

@adriangb

Describe the bug

Unparser::expr_to_sql overflows the stack on deeply-nested expressions, and — more importantly — the recursive_protection feature does not prevent it for a large class of expressions, even though it is enabled by default in the datafusion crate.

expr_to_sql_inner is annotated with #[cfg_attr(feature = "recursive_protection", recursive::recursive)], so direct inner -> inner recursion (binary ops, etc.) is protected. However, several recursion paths route back through the public Unparser::expr_to_sql, which is not annotated:

  • function_args_to_sql (every scalar function argument) -> self.expr_to_sql(e)
  • make_array_to_sql, array_element_to_sql, map_to_sql, get_field_to_sql -> self.expr_to_sql(...)
  • dialect scalar-function overrides, e.g. PostgreSqlDialect::array_has_to_sql_any -> unparser.expr_to_sql(needle) / expr_to_sql(haystack)

Because the recursive trampoline only kicks in on the annotated expr_to_sql_inner, recursing through the un-annotated public expr_to_sql means the stack-growing protection is effectively bypassed for nested function arguments. So a query whose unparse path goes through nested scalar functions overflows the real OS stack regardless of recursive_protection.

This is related to #19787 (general "reduce recursion level of the unparser") but is a distinct, concrete defect: the protection that already exists is silently defeated on the function-argument / dialect-override paths.

To Reproduce

Self-contained, public-API reproduction (uses datafusion-functions-nested for array_has, but any nested scalar function — e.g. nested array_element, make_array, or even nested Expr arithmetic via function_args_to_sql-bound functions — exhibits the same behavior; a deep chain of plain binary + overflows at the same rate too):

use datafusion::logical_expr::{col, lit, Expr};
use datafusion::sql::unparser::{dialect::PostgreSqlDialect, Unparser};
use datafusion_functions_nested::expr_fn::array_has;

fn main() {
    const DEPTH: usize = 2_000;

    // A simple linear chain: array_has(array_has(... array_has(col, 'x') ...), 'x')
    let mut e: Expr = col("c");
    for _ in 0..DEPTH {
        e = array_has(e, lit("x"));
    }

    // Unparse on a small thread stack to make the overflow fast/deterministic.
    // (It also overflows on the default stack at a larger DEPTH.)
    let handle = std::thread::Builder::new()
        .stack_size(512 * 1024)
        .spawn(move || {
            let dialect = PostgreSqlDialect {};
            let unparser = Unparser::new(&dialect);
            unparser.expr_to_sql(&e).map(|sql| sql.to_string())
        })
        .unwrap();
    let _ = handle.join(); // aborts the process before returning
}

Output:

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow, aborting

The same overflow occurs whether or not the recursive_protection feature is enabled, and with both PostgreSqlDialect and DefaultDialect. The cost is roughly linear in nesting depth (measured ~200 KB of stack per nested level in a debug build), so the protection genuinely never engages on this path — it does not merely shift the threshold.

Expected behavior

With recursive_protection enabled (the default for the datafusion crate), unparsing a deeply-nested but otherwise valid expression should not overflow the OS stack — it should either succeed (via the stack-growing trampoline) or return a Result::Err for a recursion-limit, never abort the process.

Additional context

  • datafusion commit: 0fc55d06797cac2190239c82b2d5f0a06b4d42e7 (workspace version 54.0.0)
  • Affected code:
    • datafusion/sql/src/unparser/expr.rs: public expr_to_sql (~L96) is not #[recursive]; expr_to_sql_inner (~L105) is. The leak sites are the self.expr_to_sql(...) calls in make_array_to_sql (~L663), array_element_to_sql (~L690), map_to_sql (~L784/L792), and function_args_to_sql (~L926).
    • datafusion/sql/src/unparser/dialect.rs: PostgreSqlDialect::array_has_to_sql_any (~L409) calls unparser.expr_to_sql(needle) / expr_to_sql(haystack).

Likely fixes: have these internal recursion sites call the (annotated) expr_to_sql_inner instead of the public expr_to_sql, and/or move the #[recursive] annotation onto the public expr_to_sql (or add an annotated private recursion entry point that both use). Note expr_to_sql also runs remove_unnecessary_nesting in pretty mode, which would need the same treatment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions