Skip to content

feat(analyzer): Kubernetes manifest detector (resource → container-image RawImport)#595

Merged
coseto6125 merged 1 commit into
mainfrom
feat/k8s-manifest-detector
Jun 22, 2026
Merged

feat(analyzer): Kubernetes manifest detector (resource → container-image RawImport)#595
coseto6125 merged 1 commit into
mainfrom
feat/k8s-manifest-detector

Conversation

@coseto6125

Copy link
Copy Markdown
Owner

What

A schema-aware Kubernetes manifest detector, modeled closely on the existing DockerComposeProvider. Reuses tree-sitter-yaml (no new grammar dependency) and layers a Rust schema-walk on top of the raw AST.

Semantic mapping (mirrors compose's service=Class, image=RawImport)

k8s construct Graph
<doc>.metadata.name of a resource NodeKind::Class; the kind (Deployment/Service/…) lands in type_annotation so an LLM can disambiguate two same-named resources
<doc>…containers[].image: <ref> RawImportthe load-bearing edge
container name, apiVersion, status, other spec scalars ignored (compose-style restraint)
  • Multi-document (----separated) files are fully walked — Service + Deployment bundles are the canonical k8s shape.
  • Container images are found at any nesting depth (Podspec.containers, Deploymentspec.template.spec.containers, CronJobspec.jobTemplate.spec.template.spec.containers, plus initContainers) by recursive descent, not a per-kind path table — survives new workload kinds and corpus relayout.

LLM-utility justification — Gate A (Graph completeness)

Without the container-image edge, an LLM tracing "what runs this service" dead-ends at the manifest YAML. The image ref links the Deployment/Pod to its Dockerfile / registry artifact. Mixed-stack monorepos (DevOps + Web) are the load-bearing use case. No new NodeKind was introduced — Class / RawImport are reused exactly as compose does, so no new schema doc-comment is owed.

Dispatch design decision + why

The hard part: .yaml/.yml shares its extension with arbitrary YAML — unlike compose (fixed filename) or GitHub Actions (fixed path). Path alone cannot identify a k8s manifest.

Chosen approach: route .yaml/.yml (non-compose, non-GHA) to the Kubernetes provider, which content-sniffs apiVersion: + kind: internally. This mirrors the established "json" => "openapi" precedent (the OpenAPI provider applies a 200-byte prefix gate; non-OpenAPI JSON costs near-zero). Rejected the alternative of a pipeline-level pre-sniff because it would duplicate gate logic and diverge from how every other content-gated provider is wired.

  • from_path stays honest as path-onlyLanguage::from_normalized_path still returns Language::Yaml for .yaml (it has no bytes to sniff, and the resolver language-barrier only needs the yaml family). The Language::Kubernetes variant exists for display (as_str) and was appended at the end of the enum (defensive — Language is not rkyv-archived today, verified: it derives only Debug, Clone, Copy, PartialEq, Eq, Default, but other resolution code matches on it).
  • Sniff is cheap: byte-level scan of the first ≤1KB for apiVersion: + kind: at column 0 (start-of-file or after \n), before UTF-8 validation and before any tree-sitter parse. No allocation. Indented/nested kind: keys in arbitrary YAML do not false-positive. Non-k8s YAML returns an empty LocalGraph at near-zero cost — the hot path for non-k8s YAML is not slowed.

Pre-existing routing gap fixed as a side effect

provider_name_for_path returned None for .yaml/.yml (only the compose/GHA filename specials routed yaml), and should_analyze_path did not list yml/yaml — so plain .yaml files were never scanned in the analyze path. Verified empirically: indexing a repo with an OpenAPI .yaml produced 0 nodes on main. Routing .yaml to the sniffing k8s provider also makes these files reachable. (The orphaned OpenAPI-yaml and YAML document-block paths remain out of scope for this PR — noted as a follow-up below.)

All dispatch sites touched (grepped: from_path, from_normalized_path, provider_name_for_path, DockerCompose, Language::Yaml, match ext, DockerComposeProvider, provider registration, dump tool)

  1. crates/ecp-analyzer/src/lib.rspub mod kubernetes;
  2. crates/ecp-core/src/analyzer/pipeline.rsprovider_name_for_path: .yaml/.yml"kubernetes" (after the compose + GHA specials); doc-comment updated
  3. crates/ecp-analyzer/src/resolution/index.rsLanguage::Kubernetes (appended), as_str arm
  4. crates/ecp-cli/src/commands/admin/index.rsNeededProviders.kubernetes flag, detect_needed_providers yaml → kubernetes, provider registration (add!), should_analyze_path yaml allow-list
  5. crates/ecp-cli/src/reanalyze.rsmake_provider("kubernetes") + ALL_PROVIDER_NAMES

No commands/analyze.rs exists (the from_normalized_path doc-comment references a stale path; the real dispatch is provider_name_for_path). No language-dump/debug tool enumerates Language variants beyond as_str (exhaustive match — compiler-enforced).

Tests — crates/ecp-analyzer/tests/k8s_manifest.rs (11 tests)

  • deployment_emits_class_and_image_import — Deployment → Class + image RawImport, kind in type_annotation, alias = resource name
  • pod_containers_directly_under_spec — Pod's direct spec.containers
  • cronjob_deeply_nested_containers_reachedjobTemplate.spec.template.spec.containers
  • init_containers_image_capturedinitContainers + containers both captured
  • multi_document_service_and_deployment----separated bundle; both resources emitted, only Deployment carries an image
  • plain_app_config_yaml_emits_nothingnegative: ordinary app config emits nothing
  • ci_config_with_image_key_not_misclassifiednegative: a CI config with an image: key but no top-level apiVersion+kind is not classified as k8s
  • sniff_requires_both_apiversion_and_kind, sniff_rejects_indented_keys, sniff_allows_keys_after_document_separator — content-sniff correctness
  • resource_without_containers_emits_class_no_imports — ConfigMap → Class, no image

Plus yaml_routes_to_kubernetes_provider in pipeline.rs (asserts .yaml/.yml"kubernetes" while compose/GHA specials still win).

14-language rule does not apply: this is a config/IaC detector targeting a single schema (k8s manifests), the way docker_compose and openapi are single-schema providers — not a change to the 14 mainstream-language parsers. (Stated per the project test-discipline note so this isn't rejected for single-language coverage.)

Verification (actual output)

  • cargo test -p ecp-analyzer — all suites pass; k8s_manifest 11/11
  • cargo test -p egent-code-plexus --tests1243 passed, 9 ignored, 0 failed (204 suites)
  • cargo test -p ecp-core --lib analyzer::pipeline — dispatch test passes
  • cargo clippy -p ecp-analyzer --all-targets and -p egent-code-plexus --all-targets — clean (only pre-existing tree-sitter-swift C warnings)
  • E2E: ecp admin index on a fixture repo (multi-doc Service+Deployment + a plain app-config.yaml) produced exactly 2 Class nodes (web-svc, web-deploy) with correct kind annotations; the plain config yaml produced no Class (sniff gate confirmed). Image RawImport lands in LocalGraph.imports (asserted by unit tests); it promotes to a graph edge when a matching target node exists — identical behavior to the compose provider (verified compose emits the same node set and no edge for unresolved image refs).

Out of scope (follow-ups)

  • Kustomize overlays — overlay merge semantics need separate LLM-utility analysis.
  • Helm — templated charts are not plain YAML.
  • The pre-existing orphaned OpenAPI-yaml / YAML document-block routing (those still don't reach the graph in the analyze path) is left untouched here to keep this PR surgical.

…ort)

Schema-aware k8s manifest provider modeled on DockerComposeProvider —
reuses tree-sitter-yaml (no new grammar dep) and layers a Rust schema
walk on top.

Semantic mapping (mirrors compose's service=Class, image=RawImport):
- <doc>.metadata.name      → NodeKind::Class; `kind` (Deployment/Service)
                             lands in type_annotation for LLM disambiguation
- <doc>…containers[].image → RawImport (load-bearing edge: links the
                             manifest to its Dockerfile/registry artifact so
                             "what runs this service" resolves instead of
                             dead-ending at the YAML)
- container name / apiVersion / status → ignored (compose-style restraint)

Multi-document (`---`) files are fully walked (Service+Deployment bundles
are canonical). Container images are found at any nesting depth
(Pod spec.containers, Deployment spec.template.spec.containers, CronJob
jobTemplate…) by recursive descent, not a per-kind path table.

Dispatch: `.yaml`/`.yml` shares its extension with arbitrary YAML, so path
alone can't identify a k8s manifest. The provider content-sniffs top-level
`apiVersion:` + `kind:` in the first ≤1KB before any parse — non-k8s YAML
costs near-zero (mirrors `.json` → openapi's 200-byte gate). `from_path`
stays path-only; routing sends all non-compose/non-GHA `.yaml` to the
sniffing provider.

Dispatch sites touched (all greps for from_path/provider registration):
- ecp-analyzer lib.rs: `pub mod kubernetes`
- ecp-core pipeline.rs provider_name_for_path: `.yaml`/`.yml` → "kubernetes"
- ecp-analyzer resolution/index.rs: Language::Kubernetes (appended), as_str
- ecp-cli admin/index.rs: NeededProviders.kubernetes, detection,
  registration, should_analyze_path yaml allow
- ecp-cli reanalyze.rs: make_provider + ALL_PROVIDER_NAMES

LLM-utility gate A (graph completeness): without this edge an LLM tracing
"what runs this service" hits a dead end at the manifest YAML. Mixed-stack
monorepos (DevOps + Web) are the load-bearing use case.

Out of scope: Kustomize overlays, Helm.
@coseto6125 coseto6125 enabled auto-merge (squash) June 22, 2026 20:50
@coseto6125 coseto6125 added the merge-queue Opt-in to Mergify merge queue label Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor
ecp impact cache (0 symbols) — internal, used by ecp dev pr-analyze

[]

@github-actions github-actions Bot added the ecp:risk-low ecp signal label Jun 22, 2026
@coseto6125 coseto6125 merged commit a920746 into main Jun 22, 2026
18 checks passed
@coseto6125 coseto6125 deleted the feat/k8s-manifest-detector branch June 22, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecp:risk-low ecp signal merge-queue Opt-in to Mergify merge queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant