Skip to content

feat: platform audit — AWS-side IRSA role conformance#9

Merged
stxkxs merged 1 commit into
mainfrom
platform-irsa-audit
May 30, 2026
Merged

feat: platform audit — AWS-side IRSA role conformance#9
stxkxs merged 1 commit into
mainfrom
platform-irsa-audit

Conversation

@stxkxs
Copy link
Copy Markdown
Member

@stxkxs stxkxs commented May 30, 2026

See the commit message for full detail. Second slice of the Platform-tenant auditor (the K8s side landed in #8) — adds the AWS half of the contract.

Summary

  • IRSA role conformance — for each Ready/Suspended Platform with a status.iamRoleArn, verifies the IAM role exists, trusts only system:serviceaccount:<ns>:tenant-runtime, has no inline policies, carries the declared extraPolicyArns, has a suspension tag consistent with status.suspendedAt, and (when active) has a baseline managed policy attached.
  • Plumbingaws.GetRoleInfo (new platform.RoleReader); Audit takes a role-reader (nil = skip AWS checks). The command + platform_audit MCP tool build it from the AWS credential chain and degrade gracefully (k8s-side audit still runs) when creds are absent.
  • No new dependency — reuses aws-sdk-go-v2 IAM.
  • Tests — fake role-reader covers role-missing / drift / conformant; k8s-only tests pass nil.

Verification

go build, go test ./..., go vet, golangci-lint v2.12.2 (uncapped) all pass.

Remaining (slice 3)

Budget + compliance CR cross-refs: spec.budget.name resolves to a BudgetPolicy, SOC2 → kill-switch enabled, and Platform ≤ Tenant compliance strictness (needs the governance-group BudgetPolicy/Tenant GVRs).

Extends the Platform-tenant auditor with the AWS half of the contract: it
now verifies the per-tenant IRSA role behind Platform.status.iamRoleArn,
not just the cluster resources. Read-only; needs AWS credentials.

─── IRSA conformance ───

- internal/cloud/aws: GetRoleInfo reads a role's ARN, URL-decoded trust
  policy, tags, attached managed-policy ARNs, and inline-policy names via
  the IAM API (NoSuchEntity -> not found). Satisfies platform.RoleReader.
- internal/platform: Audit now takes a RoleReader (nil skips the AWS
  checks). For each Ready/Suspended platform with a status.iamRoleArn it
  flags:
    - role missing (status points at a non-existent role)
    - trust policy not constrained to
      system:serviceaccount:<ns>:tenant-runtime
    - inline policies present (the contract is managed-policy-only)
    - declared spec.identity.extraPolicyArns not attached
    - suspension tag (platform.nanohype.dev/suspended) disagreeing with
      status.suspendedAt
    - no managed policies attached on an active tenant (baseline expected)
- internal/cloud: IAMRoleInfo plus the six IRSA finding types.

─── Wiring ───

- cmd/platform.go and the platform_audit MCP tool build the role-reader
  from the default AWS credential chain; when creds are absent the IRSA
  checks are skipped (with a note) so the k8s-side audit still runs.
- SARIF platform rules extended with the new finding types.

─── Tests ───

- audit_test.go: a fake RoleReader covers role-missing, drift (inline +
  trust mismatch + no-baseline), and fully-conformant role cases; the
  existing k8s-only tests pass a nil reader.

No new dependency — reuses aws-sdk-go-v2 IAM. Verification: go build ./...,
go test ./..., go vet ./..., and golangci-lint v2.12.2 (uncapped) all pass.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
@stxkxs stxkxs merged commit c11992b into main May 30, 2026
4 checks passed
@stxkxs stxkxs deleted the platform-irsa-audit branch May 30, 2026 04:29
stxkxs added a commit that referenced this pull request May 30, 2026
…I group

Completes the Platform-tenant auditor with the budget + compliance
cross-resource checks, and corrects the custom-resource API group that
slices #8/#9 hardcoded wrong.

─── Bug fix (affects the already-merged auditor) ───

The Platform and Tenant CRs live in the platform.nanohype.dev API group
and BudgetPolicy in governance.nanohype.dev — NOT agents.stxkxs.io, which
the auditor's platformGVR hardcoded. Against a real cluster, `platform
audit` listed zero Platforms (the GVR matched nothing). Corrected
platformGVR and added the tenant + budget GVRs. The earlier unit tests
passed only because their fixtures used the same wrong group; the fixtures
now use the real groups, so they actually exercise the shipping GVR.

─── Budget + compliance cross-references ───

- internal/platform: auditBudgetCompliance runs for every Platform (spec
  consistency, independent of phase) and reports:
    - spec.budget.name empty or pointing at a BudgetPolicy that doesn't
      exist (BUDGET_POLICY_MISSING)
    - a SOC2 platform whose referenced BudgetPolicy has
      killSwitchEnabled=false (KILL_SWITCH_DISABLED)
    - a Platform less strict than its owning Tenant — Tenant requires
      soc2/hipaa but the Platform doesn't (COMPLIANCE_WEAKER_THAN_TENANT)
    - spec.tenant pointing at a Tenant CR that doesn't exist (TENANT_MISSING)
  BudgetPolicy is looked up in the Platform's namespace; the Tenant CR is
  cluster-scoped. Reuses the dynamic client already threaded through Audit.
- internal/cloud: the four new finding types; SARIF rules extended.

─── Tests ───

- audit_test.go: fixtures corrected to the real API groups and given a
  matching BudgetPolicy + Tenant; new cases cover budget-missing,
  kill-switch-disabled, and compliance-weaker-than-tenant (10 tests total).

No new dependency. Verification: go build ./..., go test ./..., go vet
./..., and golangci-lint v2.12.2 (uncapped) all pass.

This completes Phase 2: the Platform auditor now spans the cluster, AWS
IRSA, and budget/compliance sides of the eks-agent-platform contract.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant