Skip to content

feat: cloudgov platform audit — Platform-tenant conformance auditor#8

Merged
stxkxs merged 1 commit into
mainfrom
platform-auditor
May 30, 2026
Merged

feat: cloudgov platform audit — Platform-tenant conformance auditor#8
stxkxs merged 1 commit into
mainfrom
platform-auditor

Conversation

@stxkxs
Copy link
Copy Markdown
Member

@stxkxs stxkxs commented May 30, 2026

See the commit message for full detail. First slice of Phase 2 (the independent auditor) from the approved plan.

Summary

  • cloudgov platform audit — reads every Platform CR (agents.stxkxs.io/v1alpha1) and verifies each tenant's live K8s state against the eks-agent-platform contract: namespace + PSS=restricted + ownership labels, tenant-default ResourceQuota/LimitRange, tenant-egress NetworkPolicy (egress-typed, namespace-wide), and tenant-runtime ServiceAccount IRSA annotation vs status.iamRoleArn, plus the spec.identity model-list invariant. Read-only — the operator enforces; this catches drift.
  • Reuses everything: the --fail-on gate, table/JSON/SARIF output, and a new platform_audit MCP tool (16th). No new dependency — uses the existing client-go (typed + dynamic clients).
  • Tests: fake typed+dynamic clients cover conformant / drift / namespace-missing / not-ready-skip / identity-invalid.

Verification

go build, go test ./..., go vet, golangci-lint v2.12.2 (uncapped) all pass. Binary advertises sarif for platform audit; MCP exposes 16 tools incl. platform_audit.

Next slice

AWS-side conformance (IRSA role / baseline Bedrock policy / suspension tags / KMS grant / trust policy) and budget cross-refs (SOC2 kill-switch, Platform ≤ Tenant compliance).

Adds the independent-auditor capability (Phase 2): cloudgov now verifies
that deployed nanohype Platform tenants still match the eks-agent-platform
contract. Read-only — the operator enforces; this catches drift, manual
tampering, and reconcile gaps, and produces evidence the fab merge-gate
can cite.

─── Auditor ───

- internal/platform: lists Platform CRs (agents.stxkxs.io/v1alpha1) via a
  dynamic client and, for each Ready/Suspended platform, verifies its
  tenant namespace against the operator's contract:
    - namespace exists with pod-security.kubernetes.io/enforce=restricted
      and the eks-agent-platform/{platform,tenant,persona} labels
    - tenant-default ResourceQuota and LimitRange present
    - tenant-egress NetworkPolicy present, egress-typed (ingress
      default-deny preserved), applied namespace-wide
    - tenant-runtime ServiceAccount carries the eks.amazonaws.com/role-arn
      annotation matching Platform.status.iamRoleArn
    - spec.identity declares exactly one of allowedModels /
      allowedModelFamilies
  Not-yet-Ready platforms are skipped with an informational note.
- internal/cloud/k8s: NewClients returns a typed clientset + dynamic
  client from the same kubeconfig chain, so the auditor reads CRs
  alongside core objects. No new dependency — reuses client-go.
- internal/cloud: PlatformFinding + PlatformFindingType.

─── Surfaces ───

- cmd/platform.go: `cloudgov platform audit`
  (--kubeconfig / --output table|json|sarif / --severity), wired into the
  --fail-on severity gate.
- output: WritePlatform (JSON), PlatformFindings (table),
  WritePlatformSARIF.
- MCP: a 16th tool, platform_audit; AGENTS.md updated.

─── Tests ───

- internal/platform/audit_test.go: fake typed + dynamic clients cover
  conformant, drift (missing NetworkPolicy/ServiceAccount, wrong PSS),
  namespace-missing, not-ready-skip, and identity-invalid cases.

Verification: go build ./..., go test ./..., go vet ./..., and
golangci-lint v2.12.2 (uncapped) all pass. The binary advertises the
sarif format for platform audit and the MCP server exposes 16 tools
including platform_audit.

This is the K8s-side slice of the auditor; AWS-side IRSA/KMS/trust-policy
conformance and budget cross-references (SOC2 kill-switch, Platform <=
Tenant compliance strictness) are the next slice.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
@stxkxs stxkxs merged commit 30b749e into main May 30, 2026
4 checks passed
@stxkxs stxkxs deleted the platform-auditor branch May 30, 2026 04:13
stxkxs added a commit that referenced this pull request May 30, 2026
…I group

Completes the Platform-tenant auditor with the budget + compliance
cross-resource checks, and corrects the custom-resource API group that
slices #8/#9 hardcoded wrong.

─── Bug fix (affects the already-merged auditor) ───

The Platform and Tenant CRs live in the platform.nanohype.dev API group
and BudgetPolicy in governance.nanohype.dev — NOT agents.stxkxs.io, which
the auditor's platformGVR hardcoded. Against a real cluster, `platform
audit` listed zero Platforms (the GVR matched nothing). Corrected
platformGVR and added the tenant + budget GVRs. The earlier unit tests
passed only because their fixtures used the same wrong group; the fixtures
now use the real groups, so they actually exercise the shipping GVR.

─── Budget + compliance cross-references ───

- internal/platform: auditBudgetCompliance runs for every Platform (spec
  consistency, independent of phase) and reports:
    - spec.budget.name empty or pointing at a BudgetPolicy that doesn't
      exist (BUDGET_POLICY_MISSING)
    - a SOC2 platform whose referenced BudgetPolicy has
      killSwitchEnabled=false (KILL_SWITCH_DISABLED)
    - a Platform less strict than its owning Tenant — Tenant requires
      soc2/hipaa but the Platform doesn't (COMPLIANCE_WEAKER_THAN_TENANT)
    - spec.tenant pointing at a Tenant CR that doesn't exist (TENANT_MISSING)
  BudgetPolicy is looked up in the Platform's namespace; the Tenant CR is
  cluster-scoped. Reuses the dynamic client already threaded through Audit.
- internal/cloud: the four new finding types; SARIF rules extended.

─── Tests ───

- audit_test.go: fixtures corrected to the real API groups and given a
  matching BudgetPolicy + Tenant; new cases cover budget-missing,
  kill-switch-disabled, and compliance-weaker-than-tenant (10 tests total).

No new dependency. Verification: go build ./..., go test ./..., go vet
./..., and golangci-lint v2.12.2 (uncapped) all pass.

This completes Phase 2: the Platform auditor now spans the cluster, AWS
IRSA, and budget/compliance sides of the eks-agent-platform contract.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant