Skip to content

fix(codegen): support double-quoted identifiers wherever gram.y uses IDENT#37

Merged
gmr merged 2 commits into
mainfrom
fix/quoted-identifiers-and-index-elem-options
Jun 12, 2026
Merged

fix(codegen): support double-quoted identifiers wherever gram.y uses IDENT#37
gmr merged 2 commits into
mainfrom
fix/quoted-identifiers-and-index-elem-options

Conversation

@gmr

@gmr gmr commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

Quoted identifiers ("My Table", "a""b") now parse everywhere PostgreSQL accepts them. Also fixes the generate-postgres recipe, which silently wrote the regenerated parser to a gitignored directory.

Problem

postgres/grammar.js defined a quoted_identifier token but nothing referenced it — ColId, type_function_name, ColLabel, etc. only accepted $.identifier, so CREATE TABLE "Foo" (id int); and SELECT "a""b"; produced ERROR nodes. In PostgreSQL's lexer a double-quoted identifier is an IDENT terminal, so every IDENT call site must accept both forms.

Separately, just generate-postgres ran tree-sitter generate postgres/grammar.js from the repo root, which with the current CLI writes output to ./src (gitignored) instead of postgres/src — regeneration appeared to succeed while leaving the committed parser stale.

Solution

Per the codegen-first rule, the fix is in script/codegen.js: BASE_TOKEN_MAP now maps IDENT/UIDENT to a hidden _ident rule (choice($.identifier, $.quoted_identifier)) emitted with the lexer rules. Because the rule is hidden, the CST shape stays (ColId (identifier)) for unquoted names, with quoted names surfacing as (ColId (quoted_identifier)). word stays on the bare identifier token as tree-sitter requires, and the prec.dynamic keyword-vs-identifier wrappers now wrap _ident — safe since quoted identifiers never lex as keywords. The recipe now runs tree-sitter generate from inside postgres/, matching generate-plpgsql.

Impact

New (quoted_identifier) leaves can appear anywhere an (identifier) leaf could; downstream consumers that match (ColId (identifier)) exclusively will not match quoted names (they previously got ERROR nodes, so this is strictly additive). No new GLR conflicts.

Testing

  • Regenerated with PG_SOURCE_DIR on REL_18_3; diff confined to generated postgres files
  • Full corpus passes (50 postgres including 4 new quoted-identifier cases, 60 plpgsql)
  • cargo test (Rust bindings) passes
  • Downstream acceptance: patched pglifecycle (rust-rewrite) to this checkout, removed the #[ignore]s on quoted_identifiers_unquote and parses_index_options — both pass (the latter was fixed by #22 Fix/create index error #28); pglifecycle then reverted

Summary by CodeRabbit

  • New Features

    • Extended PostgreSQL grammar to support quoted identifiers in additional parsing contexts including role options, column labels, type functions, extract arguments, zone values, collation names, and XML table column options.
  • Tests

    • Added test coverage for quoted identifier parsing in DDL and SELECT operations, including qualified names and complex identifier scenarios.

gmr and others added 2 commits June 11, 2026 19:52
postgres/grammar.js defined quoted_identifier but nothing referenced
it, so CREATE TABLE "Foo" (id int) and SELECT "a""b" produced ERROR
nodes. In PostgreSQL's lexer a double-quoted identifier is an IDENT
terminal, so every IDENT call site must accept both forms.

- Map IDENT/UIDENT in BASE_TOKEN_MAP to a new hidden _ident rule,
  choice($.identifier, $.quoted_identifier), emitted with the lexer
  rules. Hidden, so the CST keeps (ColId (identifier)) stable and
  quoted forms surface as (ColId (quoted_identifier))
- word stays on the bare identifier token as tree-sitter requires;
  the prec.left/prec.dynamic keyword-vs-identifier wrappers now wrap
  _ident, which is safe because quoted identifiers never lex as
  keywords
- Add corpus cases for quoted table/column names, quoted qualified
  names, and COLLATE with a quoted collation in index options

No new GLR conflicts. Validated against pglifecycle rust-rewrite:
its quoted_identifiers_unquote and parses_index_options tests pass
with this checkout patched in (the latter was fixed by #28).

Co-authored-by: Claude <noreply@anthropic.com>
With the current tree-sitter CLI, `tree-sitter generate
postgres/grammar.js` from the repo root writes output to ./src
(gitignored) instead of postgres/src, silently leaving the committed
parser stale. Run generate from inside postgres/ instead, matching
the generate-plpgsql recipe.

Co-authored-by: Claude <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 53892562-b921-4735-b1b6-211732622c3d

📥 Commits

Reviewing files that changed from the base of the PR and between 96e6b52 and 528eee3.

📒 Files selected for processing (8)
  • justfile
  • postgres/grammar.js
  • postgres/src/grammar.json
  • postgres/src/node-types.json
  • postgres/src/parser.c
  • postgres/test/corpus/ddl.txt
  • postgres/test/corpus/select.txt
  • script/codegen.js

📝 Walkthrough

Walkthrough

This PR extends the PostgreSQL tree-sitter grammar to accept quoted identifiers in 13 grammar productions where only unquoted identifiers were previously allowed. It introduces a hidden _ident rule, updates the code generator and build system, regenerates grammar artifacts, and adds test coverage for the new functionality.

Changes

Quoted Identifier Grammar Extension

Layer / File(s) Summary
Build and code generator infrastructure
justfile, script/codegen.js
Build recipe updated to run tree-sitter generate from postgres directory. Code generator routes IDENT and UIDENT tokens through new hidden _ident lexer rule that unions quoted and unquoted identifier forms.
Grammar rule definition and applications
postgres/grammar.js
New hidden _ident rule defined as choice of identifier and quoted_identifier. Updated 13 productions to use $._ident: AlterOptRoleElem, zone_value, RowSecurityDefaultPermissive, old_aggr_elem, createdb_opt_name, xmltable_column_option_el, extract_arg, and core rules (ColId, type_function_name, NonReservedWord, ColLabel, BareColLabel).
Generated grammar and node-type artifacts
postgres/src/grammar.json, postgres/src/node-types.json, postgres/src/parser.c
Grammar JSON introduces _ident symbol and routes 11 references through it. Node-types JSON adds quoted_identifier as accepted child type in 13 productions. Parser C binary pointer updated.
Test corpus coverage for quoted identifiers
postgres/test/corpus/ddl.txt, postgres/test/corpus/select.txt
DDL corpus adds CREATE TABLE test with escaped-quote identifiers and CREATE INDEX test with quoted collation. SELECT corpus adds tests for quoted identifiers in select targets and qualified schema/table names.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • gmr/tree-sitter-postgres#1: Adds PostgreSQL grammar and code generator infrastructure that this PR extends with token routing through the new _ident rule.

Poem

🐰 A rabbit hops through quoted names so fine,
With escapes and schema, all in line,
From ColId to zone_value they align,
The _ident rule makes grammar shine! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding support for double-quoted identifiers in code generation wherever gram.y uses IDENT, which is the core fix described in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.43.0)
postgres/grammar.js

Comment @coderabbitai help to get the list of available commands and usage tips.

@gmr

gmr commented Jun 12, 2026

Copy link
Copy Markdown
Owner Author

PR monitoring complete: ready to merge.

  • CI: test check passing on head 528eee3
  • CodeRabbit: review completed with no actionable comments
  • Review threads: 0 unresolved

No changes were needed during monitoring. Leaving the merge to @gmr.

@gmr gmr merged commit 653d7f3 into main Jun 12, 2026
2 checks passed
@gmr gmr deleted the fix/quoted-identifiers-and-index-elem-options branch June 12, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant