Six independent bugs in hand-rolled parsing across DDL and preprocessor paths. All are reproducible via plain SQL, all corrupt statement semantics silently (no error, or error on valid input). One of them (#6) can panic the process on non-ASCII input.
1. CREATE TRIGGER body split mis-detects BEGIN inside string literals / WHEN clauses
File: nodedb/src/control/server/pgwire/ddl/trigger/parse.rs:321-338 (find_begin_pos)
let pos = upper[search_from..].find("BEGIN")?;
let abs_pos = search_from + pos;
let before_ok = abs_pos == 0 || !s.as_bytes()[abs_pos - 1].is_ascii_alphanumeric() && ...
// ↑ word-boundary check but does not skip over string literals
find_begin_pos scans for BEGIN on the full string including inside single-quoted literals in the WHEN clause. First match wins, so the header/body split lands at the wrong byte.
Repro:
CREATE TRIGGER tr BEFORE INSERT ON c FOR EACH ROW
WHEN (NEW.label = 'BEGIN' OR NEW.status = 'open')
BEGIN RETURN; END
The split places BEGIN from the string literal as the body start → malformed header parse or silently-mangled body.
2. INSERT with ) inside a string literal breaks the VALUES pre-parse
File: nodedb/src/control/server/pgwire/ddl/collection/insert_parse.rs:128-149
let first_close = match sql[first_open..values_kw].rfind(')') { ... };
// ...
let vals_close = match after_values.rfind(')') { ... };
Both rfind(')') calls scan raw bytes and ignore quoted strings. A value containing ) pushes vals_close past the true VALUES closing paren; downstream split_values sees unbalanced parens and coerces wrong types.
Repro:
INSERT INTO t (a) VALUES ('hello)world') RETURNING id
3. split_values loses quote tracking inside [...] / (...)
File: nodedb/src/control/server/pgwire/ddl/sql_parse.rs:6-29
for i in 0..bytes.len() {
match bytes[i] {
b'\'' if bracket_depth == 0 => in_quote = !in_quote,
b'[' | b'(' if !in_quote => bracket_depth += 1,
b']' | b')' if !in_quote => bracket_depth = (bracket_depth - 1).max(0),
...
b'\'' only flips in_quote when bracket_depth == 0. Inside an array or nested paren, quote tracking is disabled → a ) inside a string decrements the bracket counter, and subsequent real delimiters balance wrongly. A comma outside the array but still inside the intended string run is treated as a top-level separator (or vice versa).
Repro:
INSERT INTO t (a, b) VALUES (ARRAY['x)y', 'z'], 42)
4. CREATE RLS POLICY predicate corrupted by trim_matches('(' | ')')
File: nodedb/src/control/server/pgwire/ddl/rls/parse.rs:73-77
let predicate_str = parts[using_idx + 1..pred_end]
.join(" ")
.trim_matches(|c: char| c == '(' || c == ')')
.to_string();
trim_matches removes all leading and trailing parens (any of ( or )), not only one matched pair. A balanced ((x > 0) AND (y = 1)) becomes x > 0) AND (y = 1 → broken parse, often falls through to a legacy space-split path that stores a nonsensical ScanFilter. Since this governs RLS policy storage, the stored predicate is a security-sensitive malfunction.
Repro:
CREATE RLS POLICY p ON t FOR READ USING ((x > 0) AND (y = 1))
5. Object-literal { ... } rewriter in SQL preprocessor breaks on ''-escaped quotes
File: nodedb-sql/src/parser/preprocess.rs:270-294 (find_matching_brace)
for i in start..chars.len() {
match chars[i] {
'\'' if !in_string => in_string = true,
'\'' if in_string => {
if i + 1 < chars.len() && chars[i + 1] == '\'' {
continue; // ← only advances by 1; re-processes the escape's second quote
}
in_string = false;
}
...
The continue in a for i in start..chars.len() only advances by one, landing on the escape's second ', which the next iteration reads as a closing quote. in_string flips to false prematurely; the closing } is then encountered while in_string is still wrong, so the function returns None and the caller skips the rewrite — passing raw { ... } text to sqlparser which errors out.
Repro:
SELECT * FROM t WHERE text_match(x, 'q', { note: 'it''s' })
6. CRITICAL: Expression tokenizer in nodedb-query panics on multi-byte UTF-8
File: nodedb-query/src/expr_parse.rs:100-127, 139-160
if i + 1 < bytes.len() {
let two = &input[i..i + 2]; // ← panics if i..i+2 crosses a char boundary
if matches!(two, "<=" | ">=" | ...) { ... }
}
...
// string-literal body:
s.push(bytes[i] as char); // ← each raw byte becomes Latin-1 char
The tokenizer iterates byte-by-byte but slices &input[i..i+2] as a &str. If i points into the middle of a multi-byte UTF-8 codepoint, the slice is not a char boundary and Rust panics with byte index N is not a char boundary. This tokenizer is reachable from CREATE COLLECTION … GENERATED ALWAYS AS (<expr>) and the typeguard / check-constraint compile path. The string-literal loop additionally corrupts multi-byte content by casting each byte as a single char.
Repro:
CREATE COLLECTION c FIELDS (x STRING, y STRING GENERATED ALWAYS AS ('你' || x))
Any 3-byte codepoint followed by <=/>=/!=/<>/|| nearby in the expression triggers the panic.
7. CREATE MATERIALIZED VIEW chops SELECT body on any WITH (including CTEs)
File: nodedb/src/control/server/pgwire/ddl/materialized_view/parse.rs:48-108
let with_pos = remaining.find(" WITH").or_else(|| { ... });
let query_end = with_pos.map(|p| query_start + p).unwrap_or(sql.len());
let query_sql = sql[query_start..query_end].trim().to_string();
...
let with_pos = match upper.rfind("WITH") { ... }; // in extract_refresh_mode
Both find(" WITH") and rfind("WITH") match any occurrence of the substring — including inside the SELECT body (a CTE WITH cte AS (...), a column alias with_x, a string literal 'WITH'). The parser truncates query_sql at the CTE start and parses the CTE-prefix as the refresh-options clause, so a valid CTE-based SELECT becomes an error or a corrupted materialized view.
Repros:
CREATE MATERIALIZED VIEW v ON t AS
SELECT * FROM t WITH (recursive_opt = 1);
CREATE MATERIALIZED VIEW v ON t AS
WITH s AS (SELECT * FROM t) SELECT * FROM s
WITH (refresh = 'manual');
Checklist
All items are independently reproducible via the SQL above.
Six independent bugs in hand-rolled parsing across DDL and preprocessor paths. All are reproducible via plain SQL, all corrupt statement semantics silently (no error, or error on valid input). One of them (#6) can panic the process on non-ASCII input.
1.
CREATE TRIGGERbody split mis-detectsBEGINinside string literals / WHEN clausesFile:
nodedb/src/control/server/pgwire/ddl/trigger/parse.rs:321-338(find_begin_pos)find_begin_posscans forBEGINon the full string including inside single-quoted literals in theWHENclause. First match wins, so the header/body split lands at the wrong byte.Repro:
The split places
BEGINfrom the string literal as the body start → malformed header parse or silently-mangled body.2.
INSERTwith)inside a string literal breaks the VALUES pre-parseFile:
nodedb/src/control/server/pgwire/ddl/collection/insert_parse.rs:128-149Both
rfind(')')calls scan raw bytes and ignore quoted strings. A value containing)pushesvals_closepast the true VALUES closing paren; downstreamsplit_valuessees unbalanced parens and coerces wrong types.Repro:
3.
split_valuesloses quote tracking inside[...]/(...)File:
nodedb/src/control/server/pgwire/ddl/sql_parse.rs:6-29b'\''only flipsin_quotewhenbracket_depth == 0. Inside an array or nested paren, quote tracking is disabled → a)inside a string decrements the bracket counter, and subsequent real delimiters balance wrongly. A comma outside the array but still inside the intended string run is treated as a top-level separator (or vice versa).Repro:
4.
CREATE RLS POLICYpredicate corrupted bytrim_matches('(' | ')')File:
nodedb/src/control/server/pgwire/ddl/rls/parse.rs:73-77trim_matchesremoves all leading and trailing parens (any of(or)), not only one matched pair. A balanced((x > 0) AND (y = 1))becomesx > 0) AND (y = 1→ broken parse, often falls through to a legacy space-split path that stores a nonsensicalScanFilter. Since this governs RLS policy storage, the stored predicate is a security-sensitive malfunction.Repro:
5. Object-literal
{ ... }rewriter in SQL preprocessor breaks on''-escaped quotesFile:
nodedb-sql/src/parser/preprocess.rs:270-294(find_matching_brace)The
continuein afor i in start..chars.len()only advances by one, landing on the escape's second', which the next iteration reads as a closing quote.in_stringflips tofalseprematurely; the closing}is then encountered whilein_stringis still wrong, so the function returnsNoneand the caller skips the rewrite — passing raw{ ... }text to sqlparser which errors out.Repro:
6. CRITICAL: Expression tokenizer in
nodedb-querypanics on multi-byte UTF-8File:
nodedb-query/src/expr_parse.rs:100-127, 139-160The tokenizer iterates byte-by-byte but slices
&input[i..i+2]as a&str. Ifipoints into the middle of a multi-byte UTF-8 codepoint, the slice is not a char boundary and Rust panics withbyte index N is not a char boundary. This tokenizer is reachable fromCREATE COLLECTION … GENERATED ALWAYS AS (<expr>)and the typeguard / check-constraint compile path. The string-literal loop additionally corrupts multi-byte content by casting each byte as a singlechar.Repro:
Any 3-byte codepoint followed by
<=/>=/!=/<>/||nearby in the expression triggers the panic.7.
CREATE MATERIALIZED VIEWchops SELECT body on anyWITH(including CTEs)File:
nodedb/src/control/server/pgwire/ddl/materialized_view/parse.rs:48-108Both
find(" WITH")andrfind("WITH")match any occurrence of the substring — including inside the SELECT body (a CTEWITH cte AS (...), a column aliaswith_x, a string literal'WITH'). The parser truncatesquery_sqlat the CTE start and parses the CTE-prefix as the refresh-options clause, so a valid CTE-based SELECT becomes an error or a corrupted materialized view.Repros:
Checklist
find_begin_pos(trigger parse)rfind(')')calls in insert_parsesplit_valuestrim_matches('(' | ')')with a single balanced-paren strip (or don't strip at all)''escape infind_matching_brace&[char]or usestr::char_indices/pattern = "<="matching on&strAll items are independently reproducible via the SQL above.