Skip to content

[Compiler] Implement shared tokenizer + recursive-descent parser (ADR 0010) #306

@cssbruno

Description

@cssbruno

Follow-up from #299 (ADR 0010) and PR #300.

Problem

ADR 0010 (docs/engineering/decisions/0010-tokenizer-recursive-descent-parser.md)
accepted the direction: replace the line-oriented parser with a single shared
tokenizer and a recursive-descent parser with error recovery, behind the stable
gwdkast AST seam. The decision is recorded; the implementation is not started.
This issue tracks building it.

Relevant code:

  • internal/lang/lexer.go: the rune scanner to promote into the shared lexer
  • internal/parser/syntax.go, patterns.go: the line-oriented parser and
    per-line lexLine to retire
  • internal/parser/braces.go: brace state that becomes ordinary lexer state
  • internal/gwdkast: the stable AST output contract

Scope

Implement the ADR in phases behind the AST seam:

  1. Promote internal/lang's scanner into the shared tokenizer (with byte
    offsets, [Compiler] Carry byte offsets in source positions #294 follow-up) and retire internal/parser lexLine.
  2. Build a recursive-descent parser to gwdkast.File with error recovery
    (synchronize at declaration boundaries and block braces; accumulate
    diagnostics instead of returning on the first error).
  3. Gate with golden AST-equivalence tests against the line-oriented parser and
    the conformance corpus ([Language] Add a machine-checked .gwdk conformance corpus #295); cut over per declaration kind; remove the
    line-oriented path when equivalence holds.

The AST-backed formatter and granular per-construct diagnostic codes deferred in
#250 consume this parser and are out of scope here.

Acceptance Criteria

  • One shared tokenizer feeds both the compiler parser and editor/CLI tooling.
  • One syntax error no longer hides the rest of the file; diagnostics accumulate.
  • AST output matches the line-oriented parser for the supported subset until
    cutover, verified by golden equivalence tests and the corpus.
  • The line-oriented parser and lexLine are removed once equivalence holds.

Verification

go test ./internal/parser ./internal/lang ./internal/compiler ./cmd/gowdk

Metadata

Metadata

Assignees

No one assigned

    Labels

    compilerCompiler internals, pipeline, and generated metadataparser.gwdk parser and syntax handling

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions