feat(protobuf): gRPC service/rpc + make proto message/schema reach the graph#593
Merged
Conversation
The protobuf provider extracted only `message` fields — `service { rpc }`
blocks were explicitly ignored, so gRPC service contracts were invisible
to the graph (graph-completeness gap, CLAUDE.md gate A): an LLM tracing
"what endpoints does this service expose" or matching a client stub call
to its definition hit a dead end at the `.proto`.
Add a line-oriented `service`/`rpc` extractor mirroring the existing
`extract_proto_fields` state machine. Each rpc becomes a `RawRoute`
(method `GRPC`, path `/<package.>Service/Method` — the gRPC HTTP/2 wire
convention), which the builder finalizes into a `NodeKind::Route` exactly
like an HTTP endpoint. Reusing `Route` means zero schema change (no new
NodeKind/RelType — keeps rkyv discriminants stable) and gRPC services flow
through the existing `ecp routes` / `ecp contracts` tooling for free; the
`/Service/Method` path string is the same one a stub call keys on, so
cross-repo contract matching works without new edge types.
Scope: REST cross-service edges (`RelType::Fetches`) already exist and
already normalize dynamic path segments — this closes the gRPC half of
the polyglot service-contract gap. GraphQL remains uncovered.
Tests: provider-level (same verification standard as the existing
`protobuf_schema.rs`) — package-prefixed paths, streaming rpc, multi-
service files, message-only files emit no routes, and message field
extraction still coexists. Config/IaC-style single-grammar detector, so
the 14-language rule does not apply.
The protobuf provider's `GRPC` RawRoutes were silently dropped at builder.rs Pass 1.5: `detect_from_call` gated every route on the HTTP method allowlist (`HTTP_METHODS.contains`), so `GRPC` matched nothing and returned None — gRPC service contracts parsed correctly by the provider never reached the graph. End-to-end a `.proto` indexed to zero Route nodes despite `parse_file` returning them. Add a gRPC fast path: a `GRPC` method with a wire-format `/…` path is a confirmed service endpoint (the provider emits already-normalized records, not literals to be filtered), so it bypasses the HTTP allowlist whose only job is rejecting non-route call sites. HTTP detection is unchanged — all 21 builder + 25 route_detector tests pass, including the Express `use` mount-point case. Regression coverage: `detect_from_call_accepts_grpc_service_method`, `detect_from_call_grpc_requires_leading_slash` (route_detector), and `grpc_raw_route_promotes_to_route_node` (builder end-to-end: RawRoute → Route node). Verified live: `ecp routes` on a `.proto` now lists `GRPC /pkg.Service/Method` endpoints. Note: the sibling gap — protobuf `message` SchemaFields also not reaching the graph (different root cause: messages emit no owner Class node, so schema_field_mirrors drops them) — is NOT fixed here; tracked separately.
Proto `message` fields parsed correctly but were dropped end-to-end: a message emitted no owner node, so `schema_field_mirrors` (which resolves each RawSchemaField.owner_class against the SymbolTable to attach HasProperty) found no owner and silently discarded every field. A `.proto` indexed to a lone File node — the schema-field feature was dead through the full pipeline, only ever exercised by parse_file-level unit tests. Emit each top-level `message` carrying ≥1 field as a `NodeKind::Struct` (value-type aggregate, no inheritance/vtable — must not be pattern-matched as a `Class`). The owner lookup now resolves, so SchemaField nodes + HasProperty edges land. Empty messages emit no node (no schema surface to own → would be an orphan). Perf: fold the message-node collection into the existing single `extract_proto_fields` line scan (pending-message flush on block close) rather than adding a second pass over the source — fields and their owner struct come out of one walk. Net cost over the prior behavior is one deferred Vec push per non-empty message, no extra iteration. Tests: new `protobuf_graph_e2e.rs` drives the full ProtobufProvider → GraphBuilder::build pipeline (Struct + SchemaField + HasProperty; empty-message orphan guard; service/message coexistence) — the first proto test exercising the builder rather than parse_file alone. Full ecp-analyzer suite green (2484 tests). Closes the SchemaField half of the proto-to-graph gap; the gRPC-route half was fixed in the sibling commit.
…e clone `extract_proto_fields` tracked the open message in two places — `current_message: Option<String>` and `pending.0` — holding the same name, which forced a `name.clone()` on every message header. Derive everything from `pending` instead: `Some` ⟺ inside a top-level message, `pending.0` is the owner name for field attribution. Removes the clone (one heap alloc per message) and one piece of duplicated state. Pure simplification — all 26 proto tests unchanged and green.
…plexity The pre-push clippy hook (--all-features -D warnings) flagged the inline Option<(String, (u32,u32,u32,u32), bool)> as type_complexity. Extract a PendingMessage alias. No behavior change.
Contributor
ecp impact cache (0 symbols) — internal, used by
|
coseto6125
added a commit
that referenced
this pull request
Jun 22, 2026
…#597) The robustness work done in code review of #593 (string/comment-aware lexer + span/reuse cleanup) landed on the feature branch AFTER the PR had already been squash-merged, so it never reached main. This PR brings it in so the 0.8.0 release ships the fixed parser, not the version with 8 known silent mis-parses. Replaces the naive per-line brace-count parser with a single char-level `lex_statements` pass tracking string / line-comment / block-comment state, splitting at `{` `}` `;` into logical statements. Both extractors walk that clean stream, so structure is never confused by string/comment content or by multiple statements sharing a line. Fixes (each with a regression test in protobuf_robustness.rs): - single-line `message`/`service` bodies dropped their field/route - per-rpc `option (google.api.http) = { get: "/v1/{id}" }` desynced depth and dropped following rpcs (pervasive in gRPC-gateway protos) - multi-line rpc bodies dropped the rpc - oneof-only messages vanished; oneof fields now belong to the enclosing message (a block-kind stack keeps nested `message` fields scoped to it) - truncated/unclosed message at EOF flushes its owner Struct node - `//` inside a string literal no longer treated as a comment - malformed `package foo.` rejected (was producing double-dot wire paths) Also: Struct span now covers header→closing brace; extracted shared parse_keyword_ident helper; updated two tests that asserted the old buggy oneof/nested behavior. Full ecp-analyzer suite green; clippy --all-features -D warnings clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Makes the protobuf provider's output actually reach the graph end-to-end, and adds gRPC service contracts. Investigation of the original "cross-service HTTP edge" idea found REST is already covered (
RelType::Fetcheslinks consumer calls to Route nodes with dynamic-segment normalization + cross-repoecp contracts), so the real gaps were gRPC and the fact that protobuf output never reached the graph at all.5 commits, three distinct fixes around one theme (proto → graph):
1.
feat: capture gRPCservice { rpc }as Route nodesThe provider parsed only
messagefields;service/rpcwas explicitly ignored, so gRPC contracts were invisible (gate A — graph completeness). Added a line-orientedservice/rpcextractor: each rpc →RawRoute { method: "GRPC", path: "/<package.>Service/Method" }(the gRPC HTTP/2 wire convention), finalized into aNodeKind::Routelike an HTTP endpoint. No new NodeKind/RelType — reusesRoute, so rkyv discriminants stay stable and gRPC flows through existingecp routes/ecp contracts.2.
fix: let gRPC RawRoutes survivedetect_from_call(pre-existing bug)detect_from_callgated every route on the HTTP method allowlist, soGRPCmatched nothing and was dropped at builder Pass 1.5. Added a gRPC fast path (aGRPCmethod +/…path is a confirmed endpoint, bypasses the HTTP allowlist whose only job is rejecting non-route call sites). HTTP detection unchanged — all 21 builder + 25 route_detector tests pass incl. the Expressusemount-point case.3.
fix: emitmessageas a Struct node so schema fields reach the graph (pre-existing dead feature)The bigger find: proto
messagefields parsed but were dropped end-to-end — a message emitted no owner node, soschema_field_mirrors(resolvesowner_classagainst the SymbolTable to attachHasProperty) found no owner and silently discarded every field. A.protoindexed to a lone File node; the schema-field feature was dead through the full pipeline, only ever exercised byparse_file-level unit tests.Fix: emit each top-level
messagewith ≥1 field as aNodeKind::Struct(value-type aggregate, no inheritance/vtable — must not be pattern-matched asClass). Owner lookup now resolves → SchemaField nodes + HasProperty edges land. Empty messages emit no node (no schema surface → orphan).Perf (no negative impact)
extract_proto_fieldsline scan (deferred flush on block-close) — no second pass over the source.current_messagestate and its per-messagename.clone(); derive owner from thependingtuple instead. Pure simplification.These touch only index-time parsing (
parse_filehas zero query-hot-path callers — verified viaecp impact), so no effect on per-query latency.Verification
ecp admin indexon a.protonow yieldsStruct(User)+SchemaField(email→User, age→User)+HasProperty(User→email/age)+Route(GRPC /api.v1.UserService/GetUser)— previously just a File node.protobuf_graph_e2e.rsdrives the fullProtobufProvider → GraphBuilder::buildpipeline (the first proto test exercising the builder, not justparse_file): Struct+SchemaField+HasProperty, empty-message orphan guard, service/message coexistence.cargo clippy --workspace --all-targets --all-features -D warningsclean.Scope notes
Config/IaC-style single-grammar detector (
.protoonly), so the 14-mainstream-language rule doesn't apply (no per-language variants). GraphQL service contracts remain a separate uncovered gap. Resolves the proto-to-graph follow-up filed during this work.