feat(analyzer): Kubernetes manifest detector (resource → container-image RawImport)#595
Merged
Merged
Conversation
…ort)
Schema-aware k8s manifest provider modeled on DockerComposeProvider —
reuses tree-sitter-yaml (no new grammar dep) and layers a Rust schema
walk on top.
Semantic mapping (mirrors compose's service=Class, image=RawImport):
- <doc>.metadata.name → NodeKind::Class; `kind` (Deployment/Service)
lands in type_annotation for LLM disambiguation
- <doc>…containers[].image → RawImport (load-bearing edge: links the
manifest to its Dockerfile/registry artifact so
"what runs this service" resolves instead of
dead-ending at the YAML)
- container name / apiVersion / status → ignored (compose-style restraint)
Multi-document (`---`) files are fully walked (Service+Deployment bundles
are canonical). Container images are found at any nesting depth
(Pod spec.containers, Deployment spec.template.spec.containers, CronJob
jobTemplate…) by recursive descent, not a per-kind path table.
Dispatch: `.yaml`/`.yml` shares its extension with arbitrary YAML, so path
alone can't identify a k8s manifest. The provider content-sniffs top-level
`apiVersion:` + `kind:` in the first ≤1KB before any parse — non-k8s YAML
costs near-zero (mirrors `.json` → openapi's 200-byte gate). `from_path`
stays path-only; routing sends all non-compose/non-GHA `.yaml` to the
sniffing provider.
Dispatch sites touched (all greps for from_path/provider registration):
- ecp-analyzer lib.rs: `pub mod kubernetes`
- ecp-core pipeline.rs provider_name_for_path: `.yaml`/`.yml` → "kubernetes"
- ecp-analyzer resolution/index.rs: Language::Kubernetes (appended), as_str
- ecp-cli admin/index.rs: NeededProviders.kubernetes, detection,
registration, should_analyze_path yaml allow
- ecp-cli reanalyze.rs: make_provider + ALL_PROVIDER_NAMES
LLM-utility gate A (graph completeness): without this edge an LLM tracing
"what runs this service" hits a dead end at the manifest YAML. Mixed-stack
monorepos (DevOps + Web) are the load-bearing use case.
Out of scope: Kustomize overlays, Helm.
Contributor
ecp impact cache (0 symbols) — internal, used by
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A schema-aware Kubernetes manifest detector, modeled closely on the existing
DockerComposeProvider. Reusestree-sitter-yaml(no new grammar dependency) and layers a Rust schema-walk on top of the raw AST.Semantic mapping (mirrors compose's
service=Class,image=RawImport)<doc>.metadata.nameof a resourceNodeKind::Class; thekind(Deployment/Service/…) lands intype_annotationso an LLM can disambiguate two same-named resources<doc>…containers[].image: <ref>RawImport— the load-bearing edgename,apiVersion,status, other spec scalars----separated) files are fully walked — Service + Deployment bundles are the canonical k8s shape.Pod→spec.containers,Deployment→spec.template.spec.containers,CronJob→spec.jobTemplate.spec.template.spec.containers, plusinitContainers) by recursive descent, not a per-kind path table — survives new workload kinds and corpus relayout.LLM-utility justification — Gate A (Graph completeness)
Without the container-image edge, an LLM tracing "what runs this service" dead-ends at the manifest YAML. The
imageref links the Deployment/Pod to its Dockerfile / registry artifact. Mixed-stack monorepos (DevOps + Web) are the load-bearing use case. No newNodeKindwas introduced —Class/RawImportare reused exactly as compose does, so no new schema doc-comment is owed.Dispatch design decision + why
The hard part:
.yaml/.ymlshares its extension with arbitrary YAML — unlike compose (fixed filename) or GitHub Actions (fixed path). Path alone cannot identify a k8s manifest.Chosen approach: route
.yaml/.yml(non-compose, non-GHA) to the Kubernetes provider, which content-sniffsapiVersion:+kind:internally. This mirrors the established"json" => "openapi"precedent (the OpenAPI provider applies a 200-byte prefix gate; non-OpenAPI JSON costs near-zero). Rejected the alternative of a pipeline-level pre-sniff because it would duplicate gate logic and diverge from how every other content-gated provider is wired.from_pathstays honest as path-only —Language::from_normalized_pathstill returnsLanguage::Yamlfor.yaml(it has no bytes to sniff, and the resolver language-barrier only needs the yaml family). TheLanguage::Kubernetesvariant exists for display (as_str) and was appended at the end of the enum (defensive —Languageis not rkyv-archived today, verified: it derives onlyDebug, Clone, Copy, PartialEq, Eq, Default, but other resolution code matches on it).apiVersion:+kind:at column 0 (start-of-file or after\n), before UTF-8 validation and before any tree-sitter parse. No allocation. Indented/nestedkind:keys in arbitrary YAML do not false-positive. Non-k8s YAML returns an emptyLocalGraphat near-zero cost — the hot path for non-k8s YAML is not slowed.Pre-existing routing gap fixed as a side effect
provider_name_for_pathreturnedNonefor.yaml/.yml(only the compose/GHA filename specials routed yaml), andshould_analyze_pathdid not listyml/yaml— so plain.yamlfiles were never scanned in the analyze path. Verified empirically: indexing a repo with an OpenAPI.yamlproduced 0 nodes onmain. Routing.yamlto the sniffing k8s provider also makes these files reachable. (The orphaned OpenAPI-yaml and YAML document-block paths remain out of scope for this PR — noted as a follow-up below.)All dispatch sites touched (grepped:
from_path,from_normalized_path,provider_name_for_path,DockerCompose,Language::Yaml,match ext,DockerComposeProvider, provider registration, dump tool)crates/ecp-analyzer/src/lib.rs—pub mod kubernetes;crates/ecp-core/src/analyzer/pipeline.rs—provider_name_for_path:.yaml/.yml→"kubernetes"(after the compose + GHA specials); doc-comment updatedcrates/ecp-analyzer/src/resolution/index.rs—Language::Kubernetes(appended),as_strarmcrates/ecp-cli/src/commands/admin/index.rs—NeededProviders.kubernetesflag,detect_needed_providersyaml → kubernetes, provider registration (add!),should_analyze_pathyaml allow-listcrates/ecp-cli/src/reanalyze.rs—make_provider("kubernetes")+ALL_PROVIDER_NAMESNo
commands/analyze.rsexists (thefrom_normalized_pathdoc-comment references a stale path; the real dispatch isprovider_name_for_path). No language-dump/debug tool enumeratesLanguagevariants beyondas_str(exhaustive match — compiler-enforced).Tests —
crates/ecp-analyzer/tests/k8s_manifest.rs(11 tests)deployment_emits_class_and_image_import— Deployment → Class + image RawImport, kind intype_annotation, alias = resource namepod_containers_directly_under_spec— Pod's directspec.containerscronjob_deeply_nested_containers_reached—jobTemplate.spec.template.spec.containersinit_containers_image_captured—initContainers+containersboth capturedmulti_document_service_and_deployment—----separated bundle; both resources emitted, only Deployment carries an imageplain_app_config_yaml_emits_nothing— negative: ordinary app config emits nothingci_config_with_image_key_not_misclassified— negative: a CI config with animage:key but no top-levelapiVersion+kindis not classified as k8ssniff_requires_both_apiversion_and_kind,sniff_rejects_indented_keys,sniff_allows_keys_after_document_separator— content-sniff correctnessresource_without_containers_emits_class_no_imports— ConfigMap → Class, no imagePlus
yaml_routes_to_kubernetes_providerinpipeline.rs(asserts.yaml/.yml→"kubernetes"while compose/GHA specials still win).14-language rule does not apply: this is a config/IaC detector targeting a single schema (k8s manifests), the way
docker_composeandopenapiare single-schema providers — not a change to the 14 mainstream-language parsers. (Stated per the project test-discipline note so this isn't rejected for single-language coverage.)Verification (actual output)
cargo test -p ecp-analyzer— all suites pass;k8s_manifest11/11cargo test -p egent-code-plexus --tests— 1243 passed, 9 ignored, 0 failed (204 suites)cargo test -p ecp-core --lib analyzer::pipeline— dispatch test passescargo clippy -p ecp-analyzer --all-targetsand-p egent-code-plexus --all-targets— clean (only pre-existing tree-sitter-swift C warnings)ecp admin indexon a fixture repo (multi-doc Service+Deployment + a plain app-config.yaml) produced exactly 2Classnodes (web-svc,web-deploy) with correctkindannotations; the plain config yaml produced no Class (sniff gate confirmed). Image RawImport lands inLocalGraph.imports(asserted by unit tests); it promotes to a graph edge when a matching target node exists — identical behavior to the compose provider (verified compose emits the same node set and no edge for unresolved image refs).Out of scope (follow-ups)