Skip to content

fix(protobuf): string/comment-aware lexer — robustness fixes that missed #593's merge#597

Merged
coseto6125 merged 1 commit into
mainfrom
fix/protobuf-robustness-to-main
Jun 22, 2026
Merged

fix(protobuf): string/comment-aware lexer — robustness fixes that missed #593's merge#597
coseto6125 merged 1 commit into
mainfrom
fix/protobuf-robustness-to-main

Conversation

@coseto6125

Copy link
Copy Markdown
Owner

Why

The robustness work from the code review of #593 (string/comment-aware lexer + span/reuse cleanup) was pushed to the feature branch after #593 had already been squash-merged, so those commits never reached main. As a result main's protobuf parser still has the 8 silent mis-parse bugs the review found and fixed.

This must land before the 0.8.0 tag — otherwise 0.8.0 ships the broken parser.

main currently lacks: lex_statements, parse_keyword_ident, and the entire protobuf_robustness.rs regression suite (verified via grep on origin/main).

What

Replaces the naive per-line brace-count parser with a single char-level lex_statements pass tracking string / line-comment / block-comment state, splitting the source at { } ; into logical statements. Both extractors walk that clean stream.

Fixes (each with a regression test, all live-verified during review):

  • single-line message/service bodies dropped their field/route
  • per-rpc option (google.api.http) = { get: "/v1/{id}" } (string-literal brace) desynced depth and dropped every following rpc — pervasive in gRPC-gateway protos
  • multi-line rpc bodies (rpc X(A) returns (B) { ... }) dropped the rpc
  • oneof-only messages vanished; oneof fields now belong to the enclosing message (block-kind stack keeps nested message fields scoped to it, unlike transparent oneof)
  • truncated/unclosed message at EOF flushes its owner Struct node (else schema_field_mirrors re-drops the orphaned fields)
  • // inside a string literal no longer treated as a comment
  • malformed package foo. / .foo / foo..bar rejected (was producing double-dot wire paths)

Also: Struct span now covers header→closing brace; extracted shared parse_keyword_ident helper (service/message/rpc); updated two existing tests that asserted the old buggy "oneof/nested fields not emitted" behavior.

Verification

Full ecp-analyzer suite green; cargo clippy --workspace --all-targets --all-features -D warnings clean. End-to-end: a proto with block comments + oneof + google.api.http options indexes to the correct Struct + SchemaField×3 + Route×2 (previously dropped most of them).

This is a config/IaC-style single-grammar detector (.proto only), so the 14-language coverage rule doesn't apply.

The robustness work done in code review of #593 (string/comment-aware
lexer + span/reuse cleanup) landed on the feature branch AFTER the PR had
already been squash-merged, so it never reached main. This PR brings it
in so the 0.8.0 release ships the fixed parser, not the version with 8
known silent mis-parses.

Replaces the naive per-line brace-count parser with a single char-level
`lex_statements` pass tracking string / line-comment / block-comment
state, splitting at `{` `}` `;` into logical statements. Both extractors
walk that clean stream, so structure is never confused by string/comment
content or by multiple statements sharing a line.

Fixes (each with a regression test in protobuf_robustness.rs):
- single-line `message`/`service` bodies dropped their field/route
- per-rpc `option (google.api.http) = { get: "/v1/{id}" }` desynced depth
  and dropped following rpcs (pervasive in gRPC-gateway protos)
- multi-line rpc bodies dropped the rpc
- oneof-only messages vanished; oneof fields now belong to the enclosing
  message (a block-kind stack keeps nested `message` fields scoped to it)
- truncated/unclosed message at EOF flushes its owner Struct node
- `//` inside a string literal no longer treated as a comment
- malformed `package foo.` rejected (was producing double-dot wire paths)

Also: Struct span now covers header→closing brace; extracted shared
parse_keyword_ident helper; updated two tests that asserted the old buggy
oneof/nested behavior.

Full ecp-analyzer suite green; clippy --all-features -D warnings clean.
@coseto6125 coseto6125 enabled auto-merge (squash) June 22, 2026 21:49
@coseto6125 coseto6125 added the merge-queue Opt-in to Mergify merge queue label Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor
ecp impact cache (0 symbols) — internal, used by ecp dev pr-analyze

[]

@github-actions github-actions Bot added the ecp:risk-low ecp signal label Jun 22, 2026
@coseto6125 coseto6125 merged commit 91e02f3 into main Jun 22, 2026
18 checks passed
@coseto6125 coseto6125 deleted the fix/protobuf-robustness-to-main branch June 22, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecp:risk-low ecp signal merge-queue Opt-in to Mergify merge queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant