Skip to content

SQL parser & DDL hand-rolled-parse correctness bugs (6 sub-items) #46

@hollanf

Description

@hollanf

Six independent bugs in hand-rolled parsing across DDL and preprocessor paths. All are reproducible via plain SQL, all corrupt statement semantics silently (no error, or error on valid input). One of them (#6) can panic the process on non-ASCII input.


1. CREATE TRIGGER body split mis-detects BEGIN inside string literals / WHEN clauses

File: nodedb/src/control/server/pgwire/ddl/trigger/parse.rs:321-338 (find_begin_pos)

let pos = upper[search_from..].find("BEGIN")?;
let abs_pos = search_from + pos;
let before_ok = abs_pos == 0 || !s.as_bytes()[abs_pos - 1].is_ascii_alphanumeric() && ...
// ↑ word-boundary check but does not skip over string literals

find_begin_pos scans for BEGIN on the full string including inside single-quoted literals in the WHEN clause. First match wins, so the header/body split lands at the wrong byte.

Repro:

CREATE TRIGGER tr BEFORE INSERT ON c FOR EACH ROW
  WHEN (NEW.label = 'BEGIN' OR NEW.status = 'open')
  BEGIN RETURN; END

The split places BEGIN from the string literal as the body start → malformed header parse or silently-mangled body.


2. INSERT with ) inside a string literal breaks the VALUES pre-parse

File: nodedb/src/control/server/pgwire/ddl/collection/insert_parse.rs:128-149

let first_close = match sql[first_open..values_kw].rfind(')') { ... };
// ...
let vals_close = match after_values.rfind(')') { ... };

Both rfind(')') calls scan raw bytes and ignore quoted strings. A value containing ) pushes vals_close past the true VALUES closing paren; downstream split_values sees unbalanced parens and coerces wrong types.

Repro:

INSERT INTO t (a) VALUES ('hello)world') RETURNING id

3. split_values loses quote tracking inside [...] / (...)

File: nodedb/src/control/server/pgwire/ddl/sql_parse.rs:6-29

for i in 0..bytes.len() {
    match bytes[i] {
        b'\'' if bracket_depth == 0 => in_quote = !in_quote,
        b'[' | b'(' if !in_quote => bracket_depth += 1,
        b']' | b')' if !in_quote => bracket_depth = (bracket_depth - 1).max(0),
        ...

b'\'' only flips in_quote when bracket_depth == 0. Inside an array or nested paren, quote tracking is disabled → a ) inside a string decrements the bracket counter, and subsequent real delimiters balance wrongly. A comma outside the array but still inside the intended string run is treated as a top-level separator (or vice versa).

Repro:

INSERT INTO t (a, b) VALUES (ARRAY['x)y', 'z'], 42)

4. CREATE RLS POLICY predicate corrupted by trim_matches('(' | ')')

File: nodedb/src/control/server/pgwire/ddl/rls/parse.rs:73-77

let predicate_str = parts[using_idx + 1..pred_end]
    .join(" ")
    .trim_matches(|c: char| c == '(' || c == ')')
    .to_string();

trim_matches removes all leading and trailing parens (any of ( or )), not only one matched pair. A balanced ((x > 0) AND (y = 1)) becomes x > 0) AND (y = 1 → broken parse, often falls through to a legacy space-split path that stores a nonsensical ScanFilter. Since this governs RLS policy storage, the stored predicate is a security-sensitive malfunction.

Repro:

CREATE RLS POLICY p ON t FOR READ USING ((x > 0) AND (y = 1))

5. Object-literal { ... } rewriter in SQL preprocessor breaks on ''-escaped quotes

File: nodedb-sql/src/parser/preprocess.rs:270-294 (find_matching_brace)

for i in start..chars.len() {
    match chars[i] {
        '\'' if !in_string => in_string = true,
        '\'' if in_string => {
            if i + 1 < chars.len() && chars[i + 1] == '\'' {
                continue;            // ← only advances by 1; re-processes the escape's second quote
            }
            in_string = false;
        }
        ...

The continue in a for i in start..chars.len() only advances by one, landing on the escape's second ', which the next iteration reads as a closing quote. in_string flips to false prematurely; the closing } is then encountered while in_string is still wrong, so the function returns None and the caller skips the rewrite — passing raw { ... } text to sqlparser which errors out.

Repro:

SELECT * FROM t WHERE text_match(x, 'q', { note: 'it''s' })

6. CRITICAL: Expression tokenizer in nodedb-query panics on multi-byte UTF-8

File: nodedb-query/src/expr_parse.rs:100-127, 139-160

if i + 1 < bytes.len() {
    let two = &input[i..i + 2];         // ← panics if i..i+2 crosses a char boundary
    if matches!(two, "<=" | ">=" | ...) { ... }
}
...
// string-literal body:
s.push(bytes[i] as char);               // ← each raw byte becomes Latin-1 char

The tokenizer iterates byte-by-byte but slices &input[i..i+2] as a &str. If i points into the middle of a multi-byte UTF-8 codepoint, the slice is not a char boundary and Rust panics with byte index N is not a char boundary. This tokenizer is reachable from CREATE COLLECTION … GENERATED ALWAYS AS (<expr>) and the typeguard / check-constraint compile path. The string-literal loop additionally corrupts multi-byte content by casting each byte as a single char.

Repro:

CREATE COLLECTION c FIELDS (x STRING, y STRING GENERATED ALWAYS AS ('' || x))

Any 3-byte codepoint followed by <=/>=/!=/<>/|| nearby in the expression triggers the panic.


7. CREATE MATERIALIZED VIEW chops SELECT body on any WITH (including CTEs)

File: nodedb/src/control/server/pgwire/ddl/materialized_view/parse.rs:48-108

let with_pos = remaining.find(" WITH").or_else(|| { ... });
let query_end = with_pos.map(|p| query_start + p).unwrap_or(sql.len());
let query_sql = sql[query_start..query_end].trim().to_string();
...
let with_pos = match upper.rfind("WITH") { ... };   // in extract_refresh_mode

Both find(" WITH") and rfind("WITH") match any occurrence of the substring — including inside the SELECT body (a CTE WITH cte AS (...), a column alias with_x, a string literal 'WITH'). The parser truncates query_sql at the CTE start and parses the CTE-prefix as the refresh-options clause, so a valid CTE-based SELECT becomes an error or a corrupted materialized view.

Repros:

CREATE MATERIALIZED VIEW v ON t AS
  SELECT * FROM t WITH (recursive_opt = 1);
CREATE MATERIALIZED VIEW v ON t AS
  WITH s AS (SELECT * FROM t) SELECT * FROM s
  WITH (refresh = 'manual');

Checklist

  • 1. Skip string literals in find_begin_pos (trigger parse)
  • 2. Skip string literals in rfind(')') calls in insert_parse
  • 3. Track quote state independently of bracket depth in split_values
  • 4. Replace trim_matches('(' | ')') with a single balanced-paren strip (or don't strip at all)
  • 5. Advance cursor by 2 on '' escape in find_matching_brace
  • 6. Switch expr_parse tokenizer to &[char] or use str::char_indices / pattern = "<=" matching on &str
  • 7. Parse MATERIALIZED VIEW via sqlparser rather than substring-find

All items are independently reproducible via the SQL above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions