Background
SqlParser::parse_statements currently relies on the AST tokenizer
to find ; boundaries. This creates a tight coupling: any change
in tokenizer error-recovery behaviour (e.g. the 0.2.5 change to
/* handling) can silently break statement splitting.
A pre-scan (unclosed_block_comment_start) was added as a
workaround, but it reimplements a subset of the tokenizer's lexical
rules (', ", $$, --) and cannot cover all literal forms
the tokenizer accepts (backtick-quoted identifiers, @ literals,
backslash escapes).
Proposal
Replace the tokenizer-based splitting with a standalone state
machine whose only job is to find ; outside of:
- single/double/backtick-quoted strings (with escape handling)
- dollar-quoted strings (
$$...$$)
- block comments (
/* ... */)
- line comments (
-- ...\n)
The tokenizer is then only used downstream for syntax validation,
highlighting, and formatting — never for splitting.
This also unblocks multi-statement support: the splitter produces
N statements, and the upper layer decides whether to execute them
sequentially or in batch.
Background
SqlParser::parse_statementscurrently relies on the AST tokenizerto find
;boundaries. This creates a tight coupling: any changein tokenizer error-recovery behaviour (e.g. the 0.2.5 change to
/*handling) can silently break statement splitting.A pre-scan (
unclosed_block_comment_start) was added as aworkaround, but it reimplements a subset of the tokenizer's lexical
rules (
',",$$,--) and cannot cover all literal formsthe tokenizer accepts (backtick-quoted identifiers,
@literals,backslash escapes).
Proposal
Replace the tokenizer-based splitting with a standalone state
machine whose only job is to find
;outside of:$$...$$)/* ... */)-- ...\n)The tokenizer is then only used downstream for syntax validation,
highlighting, and formatting — never for splitting.
This also unblocks multi-statement support: the splitter produces
N statements, and the upper layer decides whether to execute them
sequentially or in batch.