Skip to content

feat(protobuf): gRPC service/rpc + make proto message/schema reach the graph#593

Merged
coseto6125 merged 5 commits into
mainfrom
feat/grpc-service-rpc-graph
Jun 22, 2026
Merged

feat(protobuf): gRPC service/rpc + make proto message/schema reach the graph#593
coseto6125 merged 5 commits into
mainfrom
feat/grpc-service-rpc-graph

Conversation

@coseto6125

@coseto6125 coseto6125 commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Makes the protobuf provider's output actually reach the graph end-to-end, and adds gRPC service contracts. Investigation of the original "cross-service HTTP edge" idea found REST is already covered (RelType::Fetches links consumer calls to Route nodes with dynamic-segment normalization + cross-repo ecp contracts), so the real gaps were gRPC and the fact that protobuf output never reached the graph at all.

5 commits, three distinct fixes around one theme (proto → graph):

1. feat: capture gRPC service { rpc } as Route nodes

The provider parsed only message fields; service/rpc was explicitly ignored, so gRPC contracts were invisible (gate A — graph completeness). Added a line-oriented service/rpc extractor: each rpc → RawRoute { method: "GRPC", path: "/<package.>Service/Method" } (the gRPC HTTP/2 wire convention), finalized into a NodeKind::Route like an HTTP endpoint. No new NodeKind/RelType — reuses Route, so rkyv discriminants stay stable and gRPC flows through existing ecp routes / ecp contracts.

2. fix: let gRPC RawRoutes survive detect_from_call (pre-existing bug)

detect_from_call gated every route on the HTTP method allowlist, so GRPC matched nothing and was dropped at builder Pass 1.5. Added a gRPC fast path (a GRPC method + /… path is a confirmed endpoint, bypasses the HTTP allowlist whose only job is rejecting non-route call sites). HTTP detection unchanged — all 21 builder + 25 route_detector tests pass incl. the Express use mount-point case.

3. fix: emit message as a Struct node so schema fields reach the graph (pre-existing dead feature)

The bigger find: proto message fields parsed but were dropped end-to-end — a message emitted no owner node, so schema_field_mirrors (resolves owner_class against the SymbolTable to attach HasProperty) found no owner and silently discarded every field. A .proto indexed to a lone File node; the schema-field feature was dead through the full pipeline, only ever exercised by parse_file-level unit tests.

Fix: emit each top-level message with ≥1 field as a NodeKind::Struct (value-type aggregate, no inheritance/vtable — must not be pattern-matched as Class). Owner lookup now resolves → SchemaField nodes + HasProperty edges land. Empty messages emit no node (no schema surface → orphan).

Perf (no negative impact)

  • Fold message-node collection into the existing single extract_proto_fields line scan (deferred flush on block-close) — no second pass over the source.
  • Drop the redundant current_message state and its per-message name.clone(); derive owner from the pending tuple instead. Pure simplification.

These touch only index-time parsing (parse_file has zero query-hot-path callers — verified via ecp impact), so no effect on per-query latency.

Verification

  • ecp admin index on a .proto now yields Struct(User) + SchemaField(email→User, age→User) + HasProperty(User→email/age) + Route(GRPC /api.v1.UserService/GetUser) — previously just a File node.
  • New protobuf_graph_e2e.rs drives the full ProtobufProvider → GraphBuilder::build pipeline (the first proto test exercising the builder, not just parse_file): Struct+SchemaField+HasProperty, empty-message orphan guard, service/message coexistence.
  • Full ecp-analyzer suite green (2484 tests). cargo clippy --workspace --all-targets --all-features -D warnings clean.

Scope notes

Config/IaC-style single-grammar detector (.proto only), so the 14-mainstream-language rule doesn't apply (no per-language variants). GraphQL service contracts remain a separate uncovered gap. Resolves the proto-to-graph follow-up filed during this work.

The protobuf provider extracted only `message` fields — `service { rpc }`
blocks were explicitly ignored, so gRPC service contracts were invisible
to the graph (graph-completeness gap, CLAUDE.md gate A): an LLM tracing
"what endpoints does this service expose" or matching a client stub call
to its definition hit a dead end at the `.proto`.

Add a line-oriented `service`/`rpc` extractor mirroring the existing
`extract_proto_fields` state machine. Each rpc becomes a `RawRoute`
(method `GRPC`, path `/<package.>Service/Method` — the gRPC HTTP/2 wire
convention), which the builder finalizes into a `NodeKind::Route` exactly
like an HTTP endpoint. Reusing `Route` means zero schema change (no new
NodeKind/RelType — keeps rkyv discriminants stable) and gRPC services flow
through the existing `ecp routes` / `ecp contracts` tooling for free; the
`/Service/Method` path string is the same one a stub call keys on, so
cross-repo contract matching works without new edge types.

Scope: REST cross-service edges (`RelType::Fetches`) already exist and
already normalize dynamic path segments — this closes the gRPC half of
the polyglot service-contract gap. GraphQL remains uncovered.

Tests: provider-level (same verification standard as the existing
`protobuf_schema.rs`) — package-prefixed paths, streaming rpc, multi-
service files, message-only files emit no routes, and message field
extraction still coexists. Config/IaC-style single-grammar detector, so
the 14-language rule does not apply.
@coseto6125 coseto6125 enabled auto-merge (squash) June 22, 2026 20:33
@coseto6125 coseto6125 added the merge-queue Opt-in to Mergify merge queue label Jun 22, 2026
The protobuf provider's `GRPC` RawRoutes were silently dropped at
builder.rs Pass 1.5: `detect_from_call` gated every route on the HTTP
method allowlist (`HTTP_METHODS.contains`), so `GRPC` matched nothing and
returned None — gRPC service contracts parsed correctly by the provider
never reached the graph. End-to-end a `.proto` indexed to zero Route
nodes despite `parse_file` returning them.

Add a gRPC fast path: a `GRPC` method with a wire-format `/…` path is a
confirmed service endpoint (the provider emits already-normalized
records, not literals to be filtered), so it bypasses the HTTP allowlist
whose only job is rejecting non-route call sites. HTTP detection is
unchanged — all 21 builder + 25 route_detector tests pass, including the
Express `use` mount-point case.

Regression coverage: `detect_from_call_accepts_grpc_service_method`,
`detect_from_call_grpc_requires_leading_slash` (route_detector), and
`grpc_raw_route_promotes_to_route_node` (builder end-to-end:
RawRoute → Route node). Verified live: `ecp routes` on a `.proto` now
lists `GRPC /pkg.Service/Method` endpoints.

Note: the sibling gap — protobuf `message` SchemaFields also not reaching
the graph (different root cause: messages emit no owner Class node, so
schema_field_mirrors drops them) — is NOT fixed here; tracked separately.
Proto `message` fields parsed correctly but were dropped end-to-end: a
message emitted no owner node, so `schema_field_mirrors` (which resolves
each RawSchemaField.owner_class against the SymbolTable to attach
HasProperty) found no owner and silently discarded every field. A `.proto`
indexed to a lone File node — the schema-field feature was dead through
the full pipeline, only ever exercised by parse_file-level unit tests.

Emit each top-level `message` carrying ≥1 field as a `NodeKind::Struct`
(value-type aggregate, no inheritance/vtable — must not be pattern-matched
as a `Class`). The owner lookup now resolves, so SchemaField nodes +
HasProperty edges land. Empty messages emit no node (no schema surface to
own → would be an orphan).

Perf: fold the message-node collection into the existing single
`extract_proto_fields` line scan (pending-message flush on block close)
rather than adding a second pass over the source — fields and their owner
struct come out of one walk. Net cost over the prior behavior is one
deferred Vec push per non-empty message, no extra iteration.

Tests: new `protobuf_graph_e2e.rs` drives the full
ProtobufProvider → GraphBuilder::build pipeline (Struct + SchemaField +
HasProperty; empty-message orphan guard; service/message coexistence) —
the first proto test exercising the builder rather than parse_file alone.
Full ecp-analyzer suite green (2484 tests).

Closes the SchemaField half of the proto-to-graph gap; the gRPC-route
half was fixed in the sibling commit.
…e clone

`extract_proto_fields` tracked the open message in two places —
`current_message: Option<String>` and `pending.0` — holding the same name,
which forced a `name.clone()` on every message header. Derive everything
from `pending` instead: `Some` ⟺ inside a top-level message, `pending.0`
is the owner name for field attribution. Removes the clone (one heap alloc
per message) and one piece of duplicated state. Pure simplification — all
26 proto tests unchanged and green.
…plexity

The pre-push clippy hook (--all-features -D warnings) flagged the inline
Option<(String, (u32,u32,u32,u32), bool)> as type_complexity. Extract a
PendingMessage alias. No behavior change.
@coseto6125 coseto6125 changed the title feat(protobuf): capture gRPC service/rpc as Route nodes feat(protobuf): gRPC service/rpc + make proto message/schema reach the graph Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor
ecp impact cache (0 symbols) — internal, used by ecp dev pr-analyze

[]

@github-actions github-actions Bot added the ecp:risk-low ecp signal label Jun 22, 2026
@coseto6125 coseto6125 merged commit f3cdde9 into main Jun 22, 2026
18 checks passed
@coseto6125 coseto6125 deleted the feat/grpc-service-rpc-graph branch June 22, 2026 21:09
coseto6125 added a commit that referenced this pull request Jun 22, 2026
…#597)

The robustness work done in code review of #593 (string/comment-aware
lexer + span/reuse cleanup) landed on the feature branch AFTER the PR had
already been squash-merged, so it never reached main. This PR brings it
in so the 0.8.0 release ships the fixed parser, not the version with 8
known silent mis-parses.

Replaces the naive per-line brace-count parser with a single char-level
`lex_statements` pass tracking string / line-comment / block-comment
state, splitting at `{` `}` `;` into logical statements. Both extractors
walk that clean stream, so structure is never confused by string/comment
content or by multiple statements sharing a line.

Fixes (each with a regression test in protobuf_robustness.rs):
- single-line `message`/`service` bodies dropped their field/route
- per-rpc `option (google.api.http) = { get: "/v1/{id}" }` desynced depth
  and dropped following rpcs (pervasive in gRPC-gateway protos)
- multi-line rpc bodies dropped the rpc
- oneof-only messages vanished; oneof fields now belong to the enclosing
  message (a block-kind stack keeps nested `message` fields scoped to it)
- truncated/unclosed message at EOF flushes its owner Struct node
- `//` inside a string literal no longer treated as a comment
- malformed `package foo.` rejected (was producing double-dot wire paths)

Also: Struct span now covers header→closing brace; extracted shared
parse_keyword_ident helper; updated two tests that asserted the old buggy
oneof/nested behavior.

Full ecp-analyzer suite green; clippy --all-features -D warnings clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecp:risk-low ecp signal merge-queue Opt-in to Mergify merge queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant