fix(protobuf): string/comment-aware lexer — robustness fixes that missed #593's merge#597
Merged
Merged
Conversation
The robustness work done in code review of #593 (string/comment-aware lexer + span/reuse cleanup) landed on the feature branch AFTER the PR had already been squash-merged, so it never reached main. This PR brings it in so the 0.8.0 release ships the fixed parser, not the version with 8 known silent mis-parses. Replaces the naive per-line brace-count parser with a single char-level `lex_statements` pass tracking string / line-comment / block-comment state, splitting at `{` `}` `;` into logical statements. Both extractors walk that clean stream, so structure is never confused by string/comment content or by multiple statements sharing a line. Fixes (each with a regression test in protobuf_robustness.rs): - single-line `message`/`service` bodies dropped their field/route - per-rpc `option (google.api.http) = { get: "/v1/{id}" }` desynced depth and dropped following rpcs (pervasive in gRPC-gateway protos) - multi-line rpc bodies dropped the rpc - oneof-only messages vanished; oneof fields now belong to the enclosing message (a block-kind stack keeps nested `message` fields scoped to it) - truncated/unclosed message at EOF flushes its owner Struct node - `//` inside a string literal no longer treated as a comment - malformed `package foo.` rejected (was producing double-dot wire paths) Also: Struct span now covers header→closing brace; extracted shared parse_keyword_ident helper; updated two tests that asserted the old buggy oneof/nested behavior. Full ecp-analyzer suite green; clippy --all-features -D warnings clean.
Contributor
ecp impact cache (0 symbols) — internal, used by
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The robustness work from the code review of #593 (string/comment-aware lexer + span/reuse cleanup) was pushed to the feature branch after #593 had already been squash-merged, so those commits never reached
main. As a resultmain's protobuf parser still has the 8 silent mis-parse bugs the review found and fixed.This must land before the 0.8.0 tag — otherwise 0.8.0 ships the broken parser.
maincurrently lacks:lex_statements,parse_keyword_ident, and the entireprotobuf_robustness.rsregression suite (verified via grep onorigin/main).What
Replaces the naive per-line brace-count parser with a single char-level
lex_statementspass tracking string / line-comment / block-comment state, splitting the source at{};into logical statements. Both extractors walk that clean stream.Fixes (each with a regression test, all live-verified during review):
message/servicebodies dropped their field/routeoption (google.api.http) = { get: "/v1/{id}" }(string-literal brace) desynced depth and dropped every following rpc — pervasive in gRPC-gateway protosrpc X(A) returns (B) { ... }) dropped the rpcmessagefields scoped to it, unlike transparentoneof)//inside a string literal no longer treated as a commentpackage foo./.foo/foo..barrejected (was producing double-dot wire paths)Also: Struct span now covers header→closing brace; extracted shared
parse_keyword_identhelper (service/message/rpc); updated two existing tests that asserted the old buggy "oneof/nested fields not emitted" behavior.Verification
Full ecp-analyzer suite green;
cargo clippy --workspace --all-targets --all-features -D warningsclean. End-to-end: a proto with block comments + oneof +google.api.httpoptions indexes to the correctStruct + SchemaField×3 + Route×2(previously dropped most of them).This is a config/IaC-style single-grammar detector (
.protoonly), so the 14-language coverage rule doesn't apply.