Skip to content

SQL planner correctness & procedural execution safety (8 sub-items) #47

@hollanf

Description

@hollanf

Eight distinct bugs across the SQL planner, plan-conversion, and procedural executor. Two are security-adjacent (RLS predicate corruption in #7 from the companion parser epic, trigger fuel in #3 here, procedure cache collision in #7). The rest are correctness bugs that return wrong rows or silently drop user-written constraints.


1. const_fold::fold_binary uses unchecked integer arithmetic → debug panic, release silent wrap

File: nodedb-sql/src/planner/const_fold.rs:67-72

(SqlValue::Int(a), BinaryOp::Add, SqlValue::Int(b)) => SqlValue::Int(a + b),
(SqlValue::Int(a), BinaryOp::Sub, SqlValue::Int(b)) => SqlValue::Int(a - b),
(SqlValue::Int(a), BinaryOp::Mul, SqlValue::Int(b)) => SqlValue::Int(a * b),

No checked_*. Debug build panics (attempt to add with overflow) and drops the connection; release build silently wraps, embedding the wrong constant in the plan. Postgres raises integer out of range.

Repro:

SELECT 9223372036854775807 + 1;
INSERT INTO t(id, v) VALUES ('k', 9223372036854775807 * 2);

2. Procedural executor eval_binary_op panics on i64 overflow; float divide-by-zero returns silent ±Inf

File: nodedb/src/control/planner/procedural/executor/eval.rs:176-202

(Value::Integer(a), Value::Integer(b)) => match op {
    Plus => Some(Value::Integer(a + b)),
    Minus => Some(Value::Integer(a - b)),
    Multiply => Some(Value::Integer(a * b)),
    Divide if *b != 0 => Some(Value::Integer(a / b)),    // still panics on i64::MIN / -1
    ...
(Value::Float(a), Value::Float(b)) => match op {
    ...
    Divide if *b != 0.0 => Some(Value::Float(a / b)),    // -0.0 passes the guard

Trigger / stored-procedure bodies that go through ASSIGN/IF/FOR bounds reach this path. At i64::MAX, a trivial IF x + 1 > 0 … panics debug workers, wraps to negative in release (flipping the branch). Integer i64::MIN / -1 also panics. Float / -0.0 returns -inf, and evaluate_condition treats Infinity as truthy.

Repro:

DO $$
DECLARE x BIGINT := 9223372036854775807;
BEGIN
  IF x + 1 > 0 THEN RAISE EXCEPTION 'bad'; END IF;
END$$;

3. Triggers run with effectively unlimited fuel (u64::MAX) + 1-hour deadline

Files:

pub fn unlimited() -> Self {
    Self {
        fuel_remaining: u64::MAX,
        deadline: Instant::now() + std::time::Duration::from_secs(3600),
        max_iterations: u64::MAX,
        timeout_secs: 3600,
    }
}

execute_block (triggers) calls ExecutionBudget::unlimited() unconditionally. consume_iteration only decrements once per loop iteration, and with u64::MAX fuel the loop runs until the 3600-second wall clock — per triggered row. A trivial LOOP ; END LOOP; trigger body freezes a Control Plane worker for an hour and compounds under concurrency. No operator-facing knob caps the fuel.

Repro:

CREATE TABLE t (id TEXT PRIMARY KEY, v INT);
CREATE TRIGGER tr AFTER INSERT ON t FOR EACH ROW EXECUTE PROCEDURE
  $$BEGIN LOOP END LOOP; END$$;
INSERT INTO t(id, v) VALUES ('a', 1);
-- worker pinned for 1 hour, connection hangs, another insert pins another worker

4. convert_join RIGHT-JOIN swap forgets to swap inline_left / inline_right

File: nodedb/src/control/planner/sql_plan_convert/scan.rs:172-189

let inline_left  = if matches!(left, SqlPlan::Join { .. }) { ... } else { None };
let inline_right = super::aggregate::inline_join_side(right, tenant_id, ctx)?;

let mut on_keys = on.to_vec();
let effective_join_type = if join_type.as_str() == "right" {
    std::mem::swap(&mut left_collection, &mut right_collection);
    std::mem::swap(&mut left_alias, &mut right_alias);
    on_keys = on_keys.into_iter().map(|(l, r)| (r, l)).collect();
    "left".to_string()
} ...

For a RIGHT JOIN the collections/aliases/keys are swapped to "left join" form, but inline_left/inline_right were computed from the original plans and are not swapped — so after the rewrite, left_collection holds what was originally the right side while inline_left points to the original left plan. Any RIGHT JOIN with a nested join on the left produces wrong output because the hash/merge join iterates the wrong side as the build/probe input. Additionally, inner_tasks.into_iter().next() at line 174 silently drops all but the first task if the nested join returned more than one.

Repro:

CREATE TABLE t1 (id TEXT PRIMARY KEY, c TEXT);
CREATE TABLE t2 (id TEXT PRIMARY KEY, t1_id TEXT);
CREATE TABLE t3 (id TEXT PRIMARY KEY, t2_id TEXT);
SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id
  RIGHT JOIN t3 ON t2.id = t3.t2_id;

Compare against a known-correct engine; row set differs.


5. plan_recursive_cte collapses the UNION into a flat RecursiveScan and hardcodes max_iterations: 100

File: nodedb-sql/src/planner/cte.rs:34-95

let base = plan_cte_branch(left, catalog, functions)?;
let recursive = plan_cte_branch(right, catalog, functions)?;
let collection = extract_collection(&base)
    .or_else(|| extract_collection(&recursive))
    .unwrap_or_default();
Ok(SqlPlan::RecursiveScan {
    collection,
    base_filters: extract_filters(&base),
    recursive_filters: extract_filters(&recursive),
    max_iterations: 100,      // hardcoded
    distinct: true,           // hardcoded — breaks UNION ALL semantics
    limit: 10000,
})

The recursive branch of a WITH RECURSIVE CTE always references the CTE (typically via a join on the CTE name). This planner extracts only the underlying collection and a flat filters list from each branch, so:

  1. Projections/aliases in the CTE column list are dropped.
  2. Joins inside the recursive branch are thrown away — the engine just re-filters the base table N times, not the recursive result.
  3. max_iterations: 100 is a literal — no OPTION (MAXRECURSION …) / SET override; deep recursion returns partial results silently.
  4. distinct: true overrides any UNION ALL the user wrote — duplicates that should be preserved are removed.

Repro:

WITH RECURSIVE c(n) AS (
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM c WHERE n < 5
)
SELECT * FROM c;
-- Expected: 1,2,3,4,5 (5 rows). Actual: varies — often just base row, or wrong collection.

6. extract_join_constraint keeps only the first non-equi predicate; NATURAL JOIN silently becomes cartesian

File: nodedb-sql/src/planner/join.rs:198-224

let cond = if non_equi.is_empty() {
    None
} else {
    Some(convert_expr(non_equi.first().unwrap())?)
};
...
ast::JoinConstraint::Natural => Ok((Vec::new(), None)),
ast::JoinConstraint::None    => Ok((Vec::new(), None)),

extract_equi_keys walks top-level ANDs, collecting col = col pairs in keys and everything else in non_equi. Only non_equi.first() is attached as the join condition; every subsequent non-equi predicate is silently dropped. Rows the user filtered out survive the join.

The Natural / None arms return empty keys + None condition — i.e. cartesian product with no error raised.

Repros:

-- Second non-equi condition silently dropped:
SELECT * FROM t1 JOIN t2
  ON t1.id = t2.id AND t1.x > t2.x AND t1.y < t2.y;

-- Cartesian product, no error:
SELECT * FROM t1 NATURAL JOIN t2;

7. ProcedureBlockCache keys on 64-bit hash with no fallback equality check → collisions return the wrong body

File: nodedb/src/control/planner/procedural/executor/plan_cache.rs:29-113

struct CacheEntry { block: Arc<ProceduralBlock> }    // ← no body_sql stored
...
if let Some(entry) = inner.entries.get(&key) {
    return Ok(Arc::clone(&entry.block));
}
...
fn hash_body(body_sql: &str) -> u64 {
    let mut hasher = DefaultHasher::new();
    body_sql.hash(&mut hasher);
    hasher.finish()
}

Two trigger/procedure bodies whose 64-bit SipHash output collides cause the cache to hand back the previously-cached block for a completely different body. Entries store only the parsed block, never the source string, so there is no equality check to catch the collision. Symptom is catastrophic: e.g. a BEFORE UPDATE trigger runs the compiled body of a different AFTER INSERT trigger.


8. evaluate_default_expr silently drops the column if DEFAULT isn't in a fixed keyword list

File: nodedb/src/control/planner/sql_plan_convert/value/defaults.rs:7-71 + caller at nodedb/src/control/planner/sql_plan_convert/dml.rs:37-43

match upper.as_str() {
    "UUID_V7" | ... => Some(...),
    "NOW()" => Some(...),
    _ => parse_parametric_or_literal(expr, &upper),
}

Only UUID_V7, NOW(), NANOID(N), CUID2(N), integer, float, and bare-quoted string are recognised. Any other DDL default (DEFAULT upper('x'), DEFAULT current_timestamp, DEFAULT date_add(now(),'1d'), DEFAULT gen_random_uuid()) returns None, and the caller in convert_insert treats None as "no default" → the column is omitted from the row:

if !expanded.iter().any(|(k, _)| k == col_name)
    && let Some(val) = super::value::evaluate_default_expr(default_expr)
{
    expanded.push((col_name.clone(), ...));
}

INSERT succeeds silently with the column missing.

Repro:

CREATE TABLE t (id TEXT PRIMARY KEY, a TEXT DEFAULT upper('x'));
INSERT INTO t(id) VALUES ('k');
-- Expected: a = 'X'. Actual: a is absent / null.

Checklist

  • 1. const_fold::fold_binary use checked_*; return error or keep expression un-folded on overflow
  • 2. Procedural eval_binary_op use checked_*; explicit divide-by-zero error including -0.0; guard i64::MIN / -1
  • 3. Cap trigger fuel by tenant config; remove ExecutionBudget::unlimited() for user-supplied bodies
  • 4. Swap inline_left/inline_right alongside the collections/aliases in RIGHT-JOIN rewrite; handle multi-task nested-join output
  • 5. Preserve CTE structure (projections, joins) and expose max_iterations / distinct as plan parameters
  • 6. Attach every non-equi predicate as AND-of-conditions; raise an error (or compute NATURAL join column list) for Natural/None
  • 7. Store body_sql in CacheEntry and compare on hit; or switch to HashMap<String, _>
  • 8. Route unknown DEFAULT expressions through the full planner/const-folder instead of a keyword whitelist

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions