From a5493c91da4ac79a76082ff085ea42590d3a93ae Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Wed, 11 Feb 2026 19:04:22 +0000 Subject: [PATCH 01/32] chore: Planning a better ingress --- ...udit-ingress-first-steps-execution-plan.md | 411 ++++++++++++++++++ ...udit-ingress-separate-webserver-options.md | 314 +++++++++++++ docs/design/best-practices-webhook-ingress.md | 213 +++++++++ 3 files changed, 938 insertions(+) create mode 100644 docs/design/audit-ingress-first-steps-execution-plan.md create mode 100644 docs/design/audit-ingress-separate-webserver-options.md create mode 100644 docs/design/best-practices-webhook-ingress.md diff --git a/docs/design/audit-ingress-first-steps-execution-plan.md b/docs/design/audit-ingress-first-steps-execution-plan.md new file mode 100644 index 00000000..0999b270 --- /dev/null +++ b/docs/design/audit-ingress-first-steps-execution-plan.md @@ -0,0 +1,411 @@ +# Audit ingress first steps execution plan + +## Status + +Execution-focused handoff plan for implementation agent. + +Scope is fixed to: + +- single deployment +- extra in-binary audit webserver +- path-based cluster recognition +- Kind remains the e2e cluster target + +This document intentionally excludes alternative architecture discussion. + +--- + +## 1. Required outcome + +Implement an initial production-ready split where: + +- admission webhook keeps running on current webhook server path [`/process-validating-webhook`](cmd/main.go:191) +- audit ingress moves to a separate server in the same binary on a different port +- audit ingress is exposed via a dedicated Service +- cluster identity is derived from request path segment +- ingress TLS requirements for audit are independently configurable + +--- + +## 2. Current code and chart risks that must be addressed + +### 2.1 Coupling risks + +- Both admission and audit handlers are registered on one webhook server in [`cmd/main.go`](cmd/main.go:101) and [`cmd/main.go`](cmd/main.go:204) +- One service endpoint currently fronts this surface in [`charts/gitops-reverser/templates/services.yaml`](charts/gitops-reverser/templates/services.yaml:3) +- One cert lifecycle currently serves this surface in [`charts/gitops-reverser/templates/certificates.yaml`](charts/gitops-reverser/templates/certificates.yaml:16) + +### 2.2 TLS posture risks + +- e2e audit kubeconfig currently uses insecure skip verify in [`test/e2e/kind/audit/webhook-config.yaml`](test/e2e/kind/audit/webhook-config.yaml:14) +- cluster docs use insecure skip verify in [`docs/audit-setup/cluster/audit/webhook-config.yaml`](docs/audit-setup/cluster/audit/webhook-config.yaml:11) + +### 2.3 Audit ingress hardening gaps in code + +In [`internal/webhook/audit_handler.go`](internal/webhook/audit_handler.go:86): + +- no request body size limit before decode +- no explicit server-level timeouts +- no concurrency guard for burst traffic +- no path-based cluster ID parser or allowlist validation + +### 2.4 E2E and docs drift already visible + +- Kind README references DNS endpoint while config uses fixed IP and path in [`test/e2e/kind/README.md`](test/e2e/kind/README.md:31) vs [`test/e2e/kind/audit/webhook-config.yaml`](test/e2e/kind/audit/webhook-config.yaml:12) +- Helm README contains defaults not fully aligned with values in [`charts/gitops-reverser/README.md`](charts/gitops-reverser/README.md:183) and [`charts/gitops-reverser/values.yaml`](charts/gitops-reverser/values.yaml:6) + +--- + +## 3. Implementation contract for first step + +### 3.1 Runtime topology + +Implement two servers in one process: + +- admission server + - existing controller-runtime webhook server + - keeps current cert and service behavior +- audit server + - dedicated `http.Server` listener on separate port + - independent TLS config inputs + - serves audit paths with cluster path segment + +### 3.2 Path-based cluster recognition contract + +Accepted path format: + +- `/audit-webhook/{clusterID}` + +Rules: + +- reject requests without `{clusterID}` +- reject unknown `{clusterID}` not in allowlist +- emit structured logs with resolved `clusterID` +- add metric labels for `cluster_id` and `cluster_id_valid` + +### 3.3 TLS policy contract for first step + +For phase 1: + +- keep server TLS mandatory for audit ingress +- support strict CA verification by source cluster configuration +- do not require mTLS in this phase +- preserve option to add mTLS later without path changes + +--- + +## 4. Concrete code work items + +### 4.1 Add audit server config model in main + +Target file: [`cmd/main.go`](cmd/main.go:253) + +Add new app config fields for audit server, separate from webhook server fields: + +- audit listen address and port +- audit cert path, cert name, cert key +- audit max request body bytes +- audit read timeout +- audit write timeout +- audit idle timeout +- audit allowed cluster IDs +- audit path prefix default `/audit-webhook` + +Add flags in [`parseFlags()`](cmd/main.go:270) for above. + +### 4.2 Implement dedicated audit server bootstrap + +Target file: [`cmd/main.go`](cmd/main.go:77) + +Add functions to: + +- build audit `http.ServeMux` +- register handler pattern on `/{prefix}/` +- initialize TLS cert watcher for audit cert files +- construct dedicated `http.Server` with explicit timeouts +- add graceful shutdown using manager context + +Implementation note: + +- audit server should be started via manager runnable so lifecycle follows manager start and stop. + +### 4.3 Extend audit handler with path identity and guardrails + +Target file: [`internal/webhook/audit_handler.go`](internal/webhook/audit_handler.go:50) + +Add config fields: + +- allowed cluster IDs set +- path prefix +- max request body bytes + +Add behavior: + +- parse and validate cluster ID from request path +- reject invalid path with `400` +- reject unknown cluster ID with `403` +- limit body read size before decode +- include cluster ID in all processing logs +- include cluster ID metric attribute for [`metrics.AuditEventsReceivedTotal`](internal/webhook/audit_handler.go:172) + +### 4.4 Keep admission webhook behavior untouched + +Do not change current validating webhook registration semantics in [`cmd/main.go`](cmd/main.go:190) and chart registration in [`charts/gitops-reverser/templates/validating-webhook.yaml`](charts/gitops-reverser/templates/validating-webhook.yaml:16). + +--- + +## 5. Helm chart work items + +### 5.1 Values schema additions + +Target file: [`charts/gitops-reverser/values.yaml`](charts/gitops-reverser/values.yaml:65) + +Add explicit `auditIngress` block with: + +- `enabled` +- `port` +- `pathPrefix` +- `allowedClusterIDs` +- `tls.certPath` +- `tls.certName` +- `tls.certKey` +- `tls.secretName` +- `timeouts.read` +- `timeouts.write` +- `timeouts.idle` +- `maxRequestBodyBytes` +- optional fixed `clusterIP` for Kind e2e compatibility + +Keep existing webhook block for admission as-is in first phase. + +### 5.2 Deployment args and ports + +Target file: [`charts/gitops-reverser/templates/deployment.yaml`](charts/gitops-reverser/templates/deployment.yaml:41) + +Add container args for audit server flags and add second named container port for audit ingress. + +Mount dedicated audit TLS secret path in addition to admission cert mount. + +### 5.3 Dedicated audit service template + +Target file: [`charts/gitops-reverser/templates/services.yaml`](charts/gitops-reverser/templates/services.yaml:3) + +Add new service resource: + +- name suffix `-audit` +- port 443 to target audit container port +- optional fixed clusterIP setting +- selector consistent with leader-only routing requirement + +Keep current service for admission webhook unchanged. + +### 5.4 Dedicated audit certificate + +Target file: [`charts/gitops-reverser/templates/certificates.yaml`](charts/gitops-reverser/templates/certificates.yaml:16) + +Add second certificate resource for audit service DNS names and audit secret. + +Keep existing serving cert for admission webhook. + +### 5.5 Chart docs updates + +Target file: [`charts/gitops-reverser/README.md`](charts/gitops-reverser/README.md:177) + +Update: + +- config table with new `auditIngress` settings +- architecture section to show two service surfaces +- examples for per-cluster path URLs in audit kubeconfig +- fix stale defaults and names where currently inconsistent with values + +--- + +## 6. Kustomize and default manifests work items + +### 6.1 Add audit service and optional fixed IP patch + +Relevant files: + +- [`config/webhook/service.yaml`](config/webhook/service.yaml:1) +- [`config/default/webhook_service_fixed_ip_patch.yaml`](config/default/webhook_service_fixed_ip_patch.yaml:1) +- [`config/default/kustomization.yaml`](config/default/kustomization.yaml:44) + +Actions: + +- add separate audit service manifest +- add separate fixed IP patch for audit service for Kind startup constraints +- keep admission service patch independent + +### 6.2 Add audit certificate resource + +Relevant files: + +- [`config/certmanager/certificate-webhook.yaml`](config/certmanager/certificate-webhook.yaml:1) +- [`config/default/kustomization.yaml`](config/default/kustomization.yaml:126) + +Actions: + +- add second cert for audit service DNS names +- add replacement wiring for audit service name and namespace into cert DNS entries +- keep admission CA injection for validating webhook intact + +### 6.3 Add manager patch entries for audit server args and mounts + +Relevant file: [`config/default/manager_webhook_patch.yaml`](config/default/manager_webhook_patch.yaml:1) + +Actions: + +- add audit-specific args +- add audit TLS volume and mount +- add audit container port + +--- + +## 7. Test plan updates required + +### 7.1 Unit tests + +#### Audit handler tests + +Target file: [`internal/webhook/audit_handler_test.go`](internal/webhook/audit_handler_test.go:50) + +Add table-driven cases for: + +- valid path with known cluster ID +- missing cluster ID path +- unknown cluster ID +- body larger than configured max bytes +- non-POST path handling remains unchanged + +### 7.2 Main bootstrap tests + +Add new tests for config parsing and audit server bootstrap behavior. + +Suggested new file: + +- `cmd/main_audit_server_test.go` + +Cover: + +- default flag values +- custom audit flag parsing +- invalid timeout parsing behavior if introduced +- audit server runnable registration + +### 7.3 E2E changes on Kind + +#### Keep Kind as the only cluster target + +Relevant files: + +- [`Makefile`](Makefile:69) +- [`test/e2e/kind/cluster-template.yaml`](test/e2e/kind/cluster-template.yaml:1) +- [`test/e2e/kind/audit/webhook-config.yaml`](test/e2e/kind/audit/webhook-config.yaml:1) +- [`test/e2e/e2e_test.go`](test/e2e/e2e_test.go:367) +- [`test/e2e/helpers.go`](test/e2e/helpers.go:159) + +Required changes: + +1. Audit webhook URL path must include cluster ID + - update to `/audit-webhook/` in Kind webhook config + +2. Audit service endpoint target + - point webhook config to the new dedicated audit service fixed IP + +3. Certificate readiness checks + - extend helper to wait for new audit cert secret in addition to existing secrets + +4. E2E validation assertions + - keep current audit metric checks + - add validation that unknown path cluster IDs are rejected + - add validation that known cluster IDs are accepted + +5. Optional strict TLS uplift for e2e + - phase 1 may keep insecure skip verify for bootstrap simplicity + - add plan note and TODO test for certificate-authority based verification once secret extraction is automated + +### 7.4 Smoke checks for service split + +Add e2e checks to verify: + +- admission service and audit service both exist +- audit service resolves to leader endpoint only +- audit ingress works on dedicated port and path + +--- + +## 8. Observability and operational requirements + +### 8.1 Logging + +Every accepted audit request must log: + +- cluster ID +- remote address +- request path +- event count +- processing outcome + +### 8.2 Metrics + +Extend audit metric labels to include cluster dimension. + +Ensure cardinality protection: + +- enforce allowlist so cluster label remains bounded + +### 8.3 Error handling + +Audit server must return: + +- `400` for malformed path or body +- `403` for disallowed cluster ID +- `405` for method mismatch +- `500` only for internal processing errors + +--- + +## 9. Backward compatibility behavior + +Phase 1 behavior for cluster path migration: + +- no fallback to bare `/audit-webhook` endpoint +- configuration and docs must explicitly require `/audit-webhook/{clusterID}` + +Reason: + +- prevents ambiguous identity +- avoids silent insecure defaults + +--- + +## 10. Acceptance criteria for coding agent + +Implementation is complete only when all are true: + +1. Separate in-binary audit server is active on separate port with separate service exposure. +2. Audit endpoint requires path-based cluster ID and allowlist validation. +3. Admission webhook behavior remains unchanged. +4. Helm and kustomize manifests include independent audit TLS and service resources. +5. Kind e2e setup is updated and passing with new audit path contract. +6. Tests cover path validation and certificate readiness adjustments. +7. Documentation for setup and e2e reflects new service and URL contract. +8. Validation pipeline passes in this sequence: + - `make fmt` + - `make generate` + - `make manifests` + - `make vet` + - `make lint` + - `make test` + - `make test-e2e` + +--- + +## 11. Handoff checklist + +- Update docs for cluster audit config in [`docs/audit-setup/cluster/audit/webhook-config.yaml`](docs/audit-setup/cluster/audit/webhook-config.yaml:1) +- Update Kind docs in [`test/e2e/kind/README.md`](test/e2e/kind/README.md:1) +- Update chart docs in [`charts/gitops-reverser/README.md`](charts/gitops-reverser/README.md:1) +- Keep architecture alternatives in [`docs/design/audit-ingress-separate-webserver-options.md`](docs/design/audit-ingress-separate-webserver-options.md:1) and keep this document implementation-only + +This plan is ready to hand to a coding agent for direct execution. \ No newline at end of file diff --git a/docs/design/audit-ingress-separate-webserver-options.md b/docs/design/audit-ingress-separate-webserver-options.md new file mode 100644 index 00000000..78261a56 --- /dev/null +++ b/docs/design/audit-ingress-separate-webserver-options.md @@ -0,0 +1,314 @@ +# Audit ingress separation and cluster differentiation options + +## Status + +Design proposal only, updated with webhook ingress best-practice alignment. + +## Context + +Today both endpoints are served by the same controller-runtime webhook server and the same Service: + +- Admission webhook endpoint [`/process-validating-webhook`](cmd/main.go:191) +- Audit endpoint [`/audit-webhook`](cmd/main.go:204) +- Single leader-only Service on port [`443`](charts/gitops-reverser/templates/services.yaml:18) targeting one webhook server port [`9443`](charts/gitops-reverser/values.yaml:70) + +This coupling limits independent exposure and independent TLS policy for incoming audit traffic. + +## Objectives + +- Move audit ingress to a separate webserver and separate port. +- Allow explicit configuration of incoming TLS requirements for audit traffic. +- Support audit streaming from external or secondary clusters. +- Provide cluster differentiation options with trade-offs. +- Align ingress and webhook controls with production best practices. + +## Non-goals + +- No implementation sequencing in this document. +- No API schema changes in this document. + +## Design principles + +- Isolate admission reliability from audit ingest throughput concerns. +- Make TLS posture explicit and configurable per ingress surface. +- Keep path-based cluster identity as the initial model, with hardening controls. +- Separate admission and audit operational knobs. +- Keep defaults safe while still operable in day-1 deployments. + +--- + +## Best-practice baseline to adopt + +From [`docs/design/best-practices-webhook-ingress.md`](docs/design/best-practices-webhook-ingress.md), these controls are most relevant: + +- Listener controls: `readTimeout`, `writeTimeout`, `idleTimeout`, `maxRequestBodyBytes` +- TLS controls: dedicated certs, CA trust, hot-reload support +- Registration controls for admission: `failurePolicy`, `timeoutSeconds`, selectors, tight rules +- Runtime controls: concurrency limit, metrics, request ID logging +- Audit-specific controls: queue, backpressure policy, dedup hinting, and separate endpoint or deployment + +--- + +## Separation options for audit webserver + +### Option A: Same pod, second HTTP server, separate Service and port + +Run a second server process inside the manager binary for audit ingest. + +Pros: + +- Lowest operational complexity. +- Independent port and TLS policy from admission. +- Minimal workload topology change. + +Cons: + +- Pod-level resource contention is still possible. +- Shared rollout and failure domain remains. + +### Option B: Same pod, sidecar audit gateway + +Add sidecar proxy for TLS termination and edge controls; manager receives internal traffic. + +Pros: + +- Mature L7 controls, rate limiting, and request-size guards. +- Useful when ingress policy complexity increases. + +Cons: + +- More config and operational surface in the same pod. + +### Option C: Separate Deployment for audit receiver + +Run audit receiver separately from controller manager. + +Pros: + +- Strongest fault and scaling isolation. +- Cleanest model for multi-cluster ingest growth. +- Independent SLO tuning for admission and audit. + +Cons: + +- Highest release and operations complexity. + +### Recommended architectural target + +- Near-term: Option A. +- Long-term scalable target: Option C. + +This matches your request for simplicity now, while preserving a clean path to stronger isolation later. + +--- + +## Incoming TLS requirements for audit ingress + +### TLS policy modes + +1. `strict` + - Source cluster verifies server cert chain and hostname. + - Production baseline. + +2. `pinned-ca` + - Like strict, with explicit pinned CA and dedicated audit cert lifecycle. + +3. `insecure` + - Dev and isolated tests only. + +4. `mtls` + - Server verification plus client cert auth. + - Strongest identity, highest operational overhead. + +### Default profile for your current preference + +- Separate audit endpoint and separate port. +- Default `strict`. +- `mtls` optional hardening profile. +- Forbid `insecure` outside explicit non-prod environments. + +### Config surface recommendation + +Use separate top-level config blocks to avoid mixing concerns: + +- `admissionWebhooks` +- `auditIngress` + +Suggested `auditIngress` fields: + +- `enabled` +- `listenAddress` +- `port` +- `pathPrefix` +- `tls.mode` +- `tls.secretName` +- `tls.clientCASecretName` +- `timeouts.read` +- `timeouts.write` +- `timeouts.idle` +- `maxRequestBodyBytes` +- `concurrency.maxInFlight` +- `queue.enabled` +- `queue.size` +- `queue.durability` +- `backpressure.mode` +- `identity.mode` +- `identity.allowedClusters` +- `network.allowedCIDRs` + +--- + +## Cluster differentiation options + +### Option 1: Path-based identity + +Examples: + +- `/audit-webhook/cluster-a` +- `/audit-webhook/cluster-b` + +Pros: + +- Very simple and native to audit webhook URL setup. +- No client cert lifecycle required. + +Cons: + +- Path value is not strong identity on its own. + +Required controls: + +- Strict allowlist for accepted cluster IDs. +- Reject unknown path IDs. +- Source network restrictions and logging. + +### Option 2: Header-based identity through trusted proxy + +Pros: + +- Centralized edge identity mapping. + +Cons: + +- Depends on trusted proxy boundary. + +### Option 3: Host or SNI based identity + +Pros: + +- Useful in DNS-centric ingress designs. + +Cons: + +- More cert and DNS complexity. + +### Option 4: mTLS subject-based identity + +Pros: + +- Strong cryptographic source identity. + +Cons: + +- Highest cert issuance and rotation burden. + +### Recommended cluster identity path + +Given your stated preference: + +1. Start with path-based identity. +2. Enforce strict allowlist. +3. Enforce network restrictions. +4. Keep mTLS available as a security profile switch. + +--- + +## Delivery and reliability notes for audit ingestion + +Audit ingest differs from admission webhook behavior and should assume: + +- bursts +- duplicates +- out-of-order arrival +- potential data loss under severe backpressure depending on source settings + +Minimum design controls: + +- bounded queue +- explicit full-queue behavior +- optional batching downstream +- dedup hint support using audit event metadata when available + +--- + +## Decision matrix + +| Dimension | Path identity no mTLS | Path identity optional mTLS | Mandatory mTLS | +|---|---|---|---| +| Operational simplicity | Highest | Medium | Lowest | +| Security assurance | Medium | Medium to high | Highest | +| Cluster onboarding friction | Lowest | Medium | Highest | +| Cert lifecycle burden | Low | Medium | High | +| Fit for your current goal | Best | Good next step | Too heavy initially | + +--- + +## Helm chart assessment: current state + +### What is good today + +- Leader-only webhook routing is already implemented in [`charts/gitops-reverser/templates/services.yaml`](charts/gitops-reverser/templates/services.yaml). +- TLS certificate automation exists through cert-manager in [`charts/gitops-reverser/templates/certificates.yaml`](charts/gitops-reverser/templates/certificates.yaml). +- Webhook cert mounting and runtime flags are wired in [`charts/gitops-reverser/templates/deployment.yaml`](charts/gitops-reverser/templates/deployment.yaml). +- Admission webhook settings expose useful controls in [`charts/gitops-reverser/values.yaml`](charts/gitops-reverser/values.yaml). + +### What should be improved for your target architecture + +1. Split config surfaces + - Current config mixes admission and audit under [`webhook`](charts/gitops-reverser/values.yaml:66). + - Introduce explicit `admissionWebhooks` and `auditIngress` blocks. + +2. Separate service exposure + - Today one leader-only service handles webhook traffic. + - Add dedicated audit service and port; keep admission service independent. + +3. Separate certificate lifecycle + - Current certificate SANs and secret are tied to leader-only service. + - Add separate cert and secret for audit service DNS names. + +4. Add ingress runtime safety knobs + - Missing explicit timeout and max-body controls for audit ingress. + - Add `maxInFlight` and queue settings for burst handling. + +5. Add audit identity and access controls + - Add path-prefix and allowlist settings in chart values. + - Add CIDR allowlist controls and corresponding policy templates. + +6. Improve docs consistency + - Chart README currently states defaults that differ from actual values in at least one place. + - Align values table and examples to the current chart behavior. + +### Risk notes in current chart + +- A single webhook port and service remains a coupling point for admission and audit traffic. +- No first-class audit ingress queue or backpressure knobs are represented in chart values. +- Security posture for cross-cluster audit traffic is not explicit as a separate concern. + +--- + +## Reference architecture sketch + +```mermaid +graph TD + A[Source cluster A api server] --> P[/audit-webhook/cluster-a] + B[Source cluster B api server] --> Q[/audit-webhook/cluster-b] + P --> S[Audit webserver separate port] + Q --> S + S --> V[Cluster ID allowlist validator] + V --> R[Queue and backpressure controls] + R --> E[Event pipeline] +``` + +## Final position + +A separate audit webserver on a separate port with configurable incoming TLS policy is the correct direction. For cluster differentiation, path-based identity is a practical default when it is combined with strict allowlist and network restrictions. The chart should be evolved to treat audit ingress as its own product surface with dedicated TLS, exposure, identity, and reliability controls. diff --git a/docs/design/best-practices-webhook-ingress.md b/docs/design/best-practices-webhook-ingress.md new file mode 100644 index 00000000..de387415 --- /dev/null +++ b/docs/design/best-practices-webhook-ingress.md @@ -0,0 +1,213 @@ +1) Mutating webhook from a Kubernetes Service: minimal settings to support +A. Listener + routing + +listenAddress / port (default 8443) + +path (e.g. /mutate), and optionally multiple paths if you’ll have multiple webhooks + +readTimeout / writeTimeout / idleTimeout + +maxRequestBodyBytes (defensive; AdmissionReview can be big with certain objects) + +B. TLS (this is non-negotiable in real clusters) + +Kubernetes expects HTTPS for webhooks (service or URL). Minimally support: + +Provide TLS cert + key + +Either via: tls.secretName (mounted secret) + +Or direct file paths (less “Kubernetes-y”, but useful for dev) + +Provide CA bundle for the webhook configuration + +In practice you’ll set caBundle on the MutatingWebhookConfiguration (or let cert-manager inject it) + +Best practice: integrate with cert-manager and expose: + +certManager.enabled (bool) + +certManager.issuerRef (name/kind/group) + +dnsNames (at least service.namespace.svc and service.namespace.svc.cluster.local) + +rotation: rely on cert-manager renewal; your pod must reload certs (or restart on secret change) + +C. Webhook registration (what you control via config/helm values) + +Even if you generate the MutatingWebhookConfiguration from code/helm, you want these as configurable knobs: + +Per webhook: + +failurePolicy: Fail vs Ignore + +Default recommendation: Fail for security/consistency webhooks; Ignore only if mutation is “nice to have” + +timeoutSeconds: keep low (1–5s). Default 2–3s. + +sideEffects: usually None (and mean it) + +admissionReviewVersions: support v1 (and accept v1beta1 only if you must) + +matchPolicy: typically Equivalent + +reinvocationPolicy: consider IfNeeded if you mutate fields other mutators might touch + +Selectors + +namespaceSelector (exclude system namespaces by default) + +objectSelector (optional but great for opt-in via label) + +Rules + +resources + operations you mutate (keep tight) + +scope: cluster vs namespaced where relevant + +D. Runtime safety knobs + +Expose: + +concurrency (max in-flight) + +rateLimit (optional but helpful under thundering herd) + +metrics (Prometheus) + request duration histogram + +pprof optional (dev only) + +logLevel with request IDs and admission UID + +E. Leader election (only if you have shared mutable state) + +For pure stateless mutation, you can run multiple replicas with no leader election. +If you rely on a single writer (e.g., CRD-backed shared cache warmup, or you do coordinated external writes), support: + +leaderElection.enabled + +lease namespace/name + +2) Mutating webhook best practices (the stuff that prevents outages) + +Correctness & determinism + +Make patches deterministic (same input → same output). + +Be idempotent (if called twice, you don’t double-apply). + +Respect dryRun (don’t create external side effects). + +Don’t depend on “live GET” calls in the hot path unless cached; API calls add latency and can deadlock during API stress. + +Performance + +Keep p99 latency low; webhooks are on the API request path. + +Prefer fast local validation/mutation + cached lookups. + +Set tight timeoutSeconds and tune server timeouts accordingly. + +Safety + +Default namespaceSelector to exclude kube-system, kube-public, kube-node-lease, and your own operator namespace until you explicitly need them. + +Use objectSelector to allow opt-in (label) for risky mutations. + +Use failurePolicy=Fail only when you’re confident in HA + readiness + rollout strategy. + +Rollout strategy + +Run at least 2 replicas (or more, depending on API QPS). + +Use a PodDisruptionBudget. + +Ensure readinessProbe only goes ready when: + +certs are loaded + +any required caches are warm (if you depend on them) + +Prefer “versioned” webhook names/paths when doing breaking changes. + +Observability + +Log: admission UID, kind, namespace/name, userInfo, decision, latency + +Metrics: requests, rejections, patch size, errors, timeouts + +3) Should you “support the same settings” for audit webhook handling? + +Some overlap, yes (TLS/HA/observability), but don’t treat them as the same product surface. Audit has very different operational requirements. + +What overlaps (you should support in both) + +HTTPS listener, cert management, rotation + +AuthN (ideally mTLS) and authorization/allowlisting + +Timeouts + max body size + +Concurrency limits and metrics + +What’s different (audit needs extra settings) + +Audit webhook backends can get a lot of traffic and the API server will retry under some failure modes, but you still need to assume: + +bursts + +duplicates + +out-of-order delivery + +occasional loss depending on audit config and backpressure + +So minimally for audit ingestion, add: + +queue.enabled + queue.size + +batching (optional, but very useful downstream) + +durability choice: + +memory queue (simple, lossy on restart) + +persistent queue (disk/DB/Kafka/etc.) + +Backpressure behavior + +what happens when full: drop / block / shed by priority + +Deduplication keying (best-effort): use audit event IDs if present + +Separate endpoint / separate Deployment strongly recommended + +Auth for audit + +For the audit webhook backend, the API server can be configured with a kubeconfig to talk to your endpoint, which makes mTLS client cert auth a clean approach. If you already have a public wildcard cert, that helps with server identity, but client auth is what prevents random in-cluster callers from spamming your audit ingest. + +Recommendation: + +Admission webhook: rely on in-cluster service + TLS + CA bundle (standard) + +Audit webhook: mTLS (client certs) and strict allowlisting/rate limits + +Practical recommendation on architecture + +Keep admission and audit as separate handlers, ideally separate deployments. + +Admission: optimized for latency + correctness + +Audit: optimized for throughput + buffering + durability + +Share libraries (TLS, metrics, logging), but do not share the same scaling knobs or failure modes. + +If you want a simple “minimal config surface” that still scales, expose two top-level blocks: + +admissionWebhooks: (tls, selectors, failurePolicy, timeouts, concurrency) + +auditIngest: (tls, authn, queue/durability, backpressure, concurrency) + +That’s the line where you stay sane when traffic grows. + +If you want, paste your current helm values / flags structure and I’ll suggest a clean config schema (what should be values vs generated defaults) without blowing up the number of knobs. \ No newline at end of file From 208c95af52c902e50e3d0b11ca21686405f42263 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Wed, 11 Feb 2026 19:04:41 +0000 Subject: [PATCH 02/32] chore: Improvements in the plan --- ...udit-ingress-first-steps-execution-plan.md | 30 +++++++------------ 1 file changed, 11 insertions(+), 19 deletions(-) diff --git a/docs/design/audit-ingress-first-steps-execution-plan.md b/docs/design/audit-ingress-first-steps-execution-plan.md index 0999b270..5eef98c8 100644 --- a/docs/design/audit-ingress-first-steps-execution-plan.md +++ b/docs/design/audit-ingress-first-steps-execution-plan.md @@ -47,7 +47,7 @@ In [`internal/webhook/audit_handler.go`](internal/webhook/audit_handler.go:86): - no request body size limit before decode - no explicit server-level timeouts - no concurrency guard for burst traffic -- no path-based cluster ID parser or allowlist validation +- no path-based cluster ID parser ### 2.4 E2E and docs drift already visible @@ -78,10 +78,11 @@ Accepted path format: Rules: +- path prefix is fixed to `/audit-webhook` in this phase (not configurable) - reject requests without `{clusterID}` -- reject unknown `{clusterID}` not in allowlist +- accept any non-empty `{clusterID}` and handle newly seen cluster IDs - emit structured logs with resolved `clusterID` -- add metric labels for `cluster_id` and `cluster_id_valid` +- add metric label for `cluster_id` ### 3.3 TLS policy contract for first step @@ -108,8 +109,6 @@ Add new app config fields for audit server, separate from webhook server fields: - audit read timeout - audit write timeout - audit idle timeout -- audit allowed cluster IDs -- audit path prefix default `/audit-webhook` Add flags in [`parseFlags()`](cmd/main.go:270) for above. @@ -120,7 +119,7 @@ Target file: [`cmd/main.go`](cmd/main.go:77) Add functions to: - build audit `http.ServeMux` -- register handler pattern on `/{prefix}/` +- register handler pattern on fixed `/audit-webhook/` - initialize TLS cert watcher for audit cert files - construct dedicated `http.Server` with explicit timeouts - add graceful shutdown using manager context @@ -135,15 +134,12 @@ Target file: [`internal/webhook/audit_handler.go`](internal/webhook/audit_handle Add config fields: -- allowed cluster IDs set -- path prefix - max request body bytes Add behavior: - parse and validate cluster ID from request path - reject invalid path with `400` -- reject unknown cluster ID with `403` - limit body read size before decode - include cluster ID in all processing logs - include cluster ID metric attribute for [`metrics.AuditEventsReceivedTotal`](internal/webhook/audit_handler.go:172) @@ -164,8 +160,6 @@ Add explicit `auditIngress` block with: - `enabled` - `port` -- `pathPrefix` -- `allowedClusterIDs` - `tls.certPath` - `tls.certName` - `tls.certKey` @@ -271,9 +265,9 @@ Target file: [`internal/webhook/audit_handler_test.go`](internal/webhook/audit_h Add table-driven cases for: -- valid path with known cluster ID +- valid path with cluster ID - missing cluster ID path -- unknown cluster ID +- newly seen cluster ID is accepted - body larger than configured max bytes - non-POST path handling remains unchanged @@ -317,8 +311,7 @@ Required changes: 4. E2E validation assertions - keep current audit metric checks - - add validation that unknown path cluster IDs are rejected - - add validation that known cluster IDs are accepted + - add validation that cluster IDs from path are accepted, including newly seen IDs 5. Optional strict TLS uplift for e2e - phase 1 may keep insecure skip verify for bootstrap simplicity @@ -352,14 +345,13 @@ Extend audit metric labels to include cluster dimension. Ensure cardinality protection: -- enforce allowlist so cluster label remains bounded +- sanitize and constrain cluster ID format/length before labeling ### 8.3 Error handling Audit server must return: - `400` for malformed path or body -- `403` for disallowed cluster ID - `405` for method mismatch - `500` only for internal processing errors @@ -384,7 +376,7 @@ Reason: Implementation is complete only when all are true: 1. Separate in-binary audit server is active on separate port with separate service exposure. -2. Audit endpoint requires path-based cluster ID and allowlist validation. +2. Audit endpoint requires path-based cluster ID on fixed `/audit-webhook/{clusterID}` and accepts newly seen cluster IDs. 3. Admission webhook behavior remains unchanged. 4. Helm and kustomize manifests include independent audit TLS and service resources. 5. Kind e2e setup is updated and passing with new audit path contract. @@ -408,4 +400,4 @@ Implementation is complete only when all are true: - Update chart docs in [`charts/gitops-reverser/README.md`](charts/gitops-reverser/README.md:1) - Keep architecture alternatives in [`docs/design/audit-ingress-separate-webserver-options.md`](docs/design/audit-ingress-separate-webserver-options.md:1) and keep this document implementation-only -This plan is ready to hand to a coding agent for direct execution. \ No newline at end of file +This plan is ready to hand to a coding agent for direct execution. From b09930fb6d9ebc18cdf7c46bdc3232f1e63c884d Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Wed, 11 Feb 2026 19:46:17 +0000 Subject: [PATCH 03/32] feat: Let's give the audit webhook handling it's own webserver (so that TLS config can be different) --- charts/gitops-reverser/README.md | 26 +- .../templates/certificates.yaml | 32 ++- .../gitops-reverser/templates/deployment.yaml | 28 ++ .../gitops-reverser/templates/services.yaml | 27 +- charts/gitops-reverser/values.yaml | 20 +- cmd/main.go | 260 +++++++++++++++--- cmd/main_audit_server_test.go | 139 ++++++++++ .../certificate-audit-webhook.yaml | 19 ++ config/certmanager/kustomization.yaml | 1 + .../audit_webhook_service_fixed_ip_patch.yaml | 10 + config/default/kustomization.yaml | 43 ++- config/default/manager_webhook_patch.yaml | 50 ++++ config/webhook/audit-service.yaml | 17 ++ config/webhook/kustomization.yaml | 1 + .../cluster/audit/webhook-config.yaml | 2 +- internal/webhook/audit_handler.go | 110 +++++++- internal/webhook/audit_handler_test.go | 109 +++++++- test/e2e/e2e_test.go | 54 +++- test/e2e/helpers.go | 22 +- test/e2e/kind/README.md | 14 +- test/e2e/kind/audit/webhook-config.yaml | 4 +- 21 files changed, 905 insertions(+), 83 deletions(-) create mode 100644 cmd/main_audit_server_test.go create mode 100644 config/certmanager/certificate-audit-webhook.yaml create mode 100644 config/default/audit_webhook_service_fixed_ip_patch.yaml create mode 100644 config/webhook/audit-service.yaml diff --git a/charts/gitops-reverser/README.md b/charts/gitops-reverser/README.md index 800f1931..60ec0b1c 100644 --- a/charts/gitops-reverser/README.md +++ b/charts/gitops-reverser/README.md @@ -85,7 +85,12 @@ The chart deploys 2 replicas by default with leader election: ▼ ┌──────────────────────────────────────────┐ │ gitops-reverser-leader-only (Service) │ -│ Routes to: role=leader │ +│ Admission webhook: /process-validating-webhook │ +└──────────────┬───────────────────────────┘ + │ +┌──────────────────────────────────────────┐ +│ gitops-reverser-audit (Service) │ +│ Audit webhook: /audit-webhook/{clusterID} │ └──────────────┬───────────────────────────┘ │ ┌─────────┴─────────┐ @@ -99,6 +104,7 @@ The chart deploys 2 replicas by default with leader election: **Key Features:** - **Leader-only service**: Routes webhook traffic only to the active leader pod +- **Dedicated audit service**: Separates audit ingress from admission webhook traffic - **Automatic failover**: Standby pod takes over if leader fails - **Pod anti-affinity**: Pods spread across different nodes - **Pod disruption budget**: Ensures at least 1 pod available during maintenance @@ -178,12 +184,18 @@ webhook: | Parameter | Description | Default | |-----------|-------------|---------| -| `namespaceCreation.enabled` | Create namespace automatically | `true` | -| `replicaCount` | Number of controller replicas | `2` | +| `replicaCount` | Number of controller replicas | `1` | | `leaderOnlyService.enabled` | Create service routing to leader only | `true` | | `image.repository` | Container image repository | `ghcr.io/configbutler/gitops-reverser` | | `controllerManager.leaderElection` | Enable leader election | `true` | | `webhook.validating.failurePolicy` | Webhook failure policy (Ignore/Fail) | `Ignore` | +| `auditIngress.enabled` | Enable dedicated audit HTTPS ingress server | `true` | +| `auditIngress.port` | Dedicated audit container port | `9444` | +| `auditIngress.maxRequestBodyBytes` | Max accepted audit request size | `10485760` | +| `auditIngress.timeouts.read` | Audit server read timeout | `15s` | +| `auditIngress.timeouts.write` | Audit server write timeout | `30s` | +| `auditIngress.timeouts.idle` | Audit server idle timeout | `60s` | +| `auditIngress.tls.secretName` | Secret name for audit TLS cert/key | `-audit-server-tls-cert` | | `certificates.certManager.enabled` | Use cert-manager for certificates | `true` | | `podDisruptionBudget.enabled` | Enable PodDisruptionBudget | `true` | | `resources.requests.cpu` | CPU request | `10m` | @@ -193,6 +205,14 @@ webhook: See [`values.yaml`](values.yaml) for complete configuration options. +### Audit Webhook URL Contract + +Source clusters must target: + +`https://:443/audit-webhook/` + +The bare path `/audit-webhook` is rejected. Use a non-empty cluster ID segment. + ## Custom Resource Definitions (CRDs) This chart automatically manages the following CRDs: diff --git a/charts/gitops-reverser/templates/certificates.yaml b/charts/gitops-reverser/templates/certificates.yaml index 235e492a..a0321c99 100644 --- a/charts/gitops-reverser/templates/certificates.yaml +++ b/charts/gitops-reverser/templates/certificates.yaml @@ -20,10 +20,6 @@ metadata: labels: {{- include "gitops-reverser.labels" . | nindent 4 }} spec: -{{- if .Values.audit.clusterIP }} - ipAddresses: - - {{ .Values.audit.clusterIP }} -{{- end }} dnsNames: - {{ include "gitops-reverser.fullname" . }}-leader-only.{{ .Release.Namespace }}.svc - {{ include "gitops-reverser.fullname" . }}-leader-only.{{ .Release.Namespace }}.svc.cluster.local @@ -37,4 +33,32 @@ spec: - server auth privateKey: rotationPolicy: Always +{{- if .Values.auditIngress.enabled }} +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: {{ include "gitops-reverser.fullname" . }}-audit-serving-cert + namespace: {{ .Release.Namespace }} + labels: + {{- include "gitops-reverser.labels" . | nindent 4 }} +spec: +{{- if .Values.auditIngress.clusterIP }} + ipAddresses: + - {{ .Values.auditIngress.clusterIP }} +{{- end }} + dnsNames: + - {{ include "gitops-reverser.fullname" . }}-audit.{{ .Release.Namespace }}.svc + - {{ include "gitops-reverser.fullname" . }}-audit.{{ .Release.Namespace }}.svc.cluster.local + issuerRef: + kind: {{ .Values.certificates.certManager.issuer.kind }} + name: {{ .Values.certificates.certManager.issuer.name }} + secretName: {{ .Values.auditIngress.tls.secretName | default (printf "%s-audit-server-tls-cert" (include "gitops-reverser.fullname" .)) }} + usages: + - digital signature + - key encipherment + - server auth + privateKey: + rotationPolicy: Always +{{- end }} {{- end }} diff --git a/charts/gitops-reverser/templates/deployment.yaml b/charts/gitops-reverser/templates/deployment.yaml index 5adf3cf1..df8ec3ad 100644 --- a/charts/gitops-reverser/templates/deployment.yaml +++ b/charts/gitops-reverser/templates/deployment.yaml @@ -48,6 +48,18 @@ spec: - --webhook-cert-path={{ .Values.webhook.server.certPath }} - --webhook-cert-name={{ .Values.webhook.server.certName }} - --webhook-cert-key={{ .Values.webhook.server.certKey }} + - --audit-ingress-enabled={{ .Values.auditIngress.enabled }} + - --audit-listen-address=0.0.0.0 + - --audit-port={{ .Values.auditIngress.port }} + - --audit-max-request-body-bytes={{ .Values.auditIngress.maxRequestBodyBytes }} + - --audit-read-timeout={{ .Values.auditIngress.timeouts.read }} + - --audit-write-timeout={{ .Values.auditIngress.timeouts.write }} + - --audit-idle-timeout={{ .Values.auditIngress.timeouts.idle }} + {{- if .Values.auditIngress.enabled }} + - --audit-cert-path={{ .Values.auditIngress.tls.certPath }} + - --audit-cert-name={{ .Values.auditIngress.tls.certName }} + - --audit-cert-key={{ .Values.auditIngress.tls.certKey }} + {{- end }} {{- if .Values.logging.level }} - --zap-log-level={{ .Values.logging.level }} {{- end }} @@ -67,6 +79,11 @@ spec: - name: webhook-server containerPort: {{ .Values.webhook.server.port }} protocol: TCP + {{- if .Values.auditIngress.enabled }} + - name: audit-server + containerPort: {{ .Values.auditIngress.port }} + protocol: TCP + {{- end }} - name: metrics containerPort: {{ .Values.controllerManager.metrics.port }} protocol: TCP @@ -106,6 +123,11 @@ spec: - name: cert mountPath: {{ .Values.webhook.server.certPath }} readOnly: true + {{- if .Values.auditIngress.enabled }} + - name: audit-cert + mountPath: {{ .Values.auditIngress.tls.certPath }} + readOnly: true + {{- end }} {{- with .Values.volumeMounts }} {{- toYaml . | nindent 12 }} {{- end }} @@ -120,6 +142,12 @@ spec: secret: secretName: {{ include "gitops-reverser.fullname" . }}-webhook-server-tls-cert defaultMode: 420 + {{- if .Values.auditIngress.enabled }} + - name: audit-cert + secret: + secretName: {{ .Values.auditIngress.tls.secretName | default (printf "%s-audit-server-tls-cert" (include "gitops-reverser.fullname" .)) }} + defaultMode: 420 + {{- end }} {{- with .Values.volumes }} {{- toYaml . | nindent 8 }} {{- end }} diff --git a/charts/gitops-reverser/templates/services.yaml b/charts/gitops-reverser/templates/services.yaml index 76a6ab1b..1d6d204d 100644 --- a/charts/gitops-reverser/templates/services.yaml +++ b/charts/gitops-reverser/templates/services.yaml @@ -10,9 +10,6 @@ metadata: app.kubernetes.io/component: leader-only spec: type: ClusterIP - {{- if .Values.audit.clusterIP }} - clusterIP: {{ .Values.audit.clusterIP }} - {{- end }} ports: - name: webhook-server port: 443 @@ -22,6 +19,30 @@ spec: {{- include "gitops-reverser.selectorLabels" . | nindent 4 }} role: leader # Pods get this label from within the operator source code: the Kube API lease mechanism is used to always have one active leader. --- +{{- if .Values.auditIngress.enabled }} +apiVersion: v1 +kind: Service +metadata: + name: {{ include "gitops-reverser.fullname" . }}-audit + namespace: {{ .Release.Namespace }} + labels: + {{- include "gitops-reverser.labels" . | nindent 4 }} + app.kubernetes.io/component: audit-ingress +spec: + type: ClusterIP + {{- if .Values.auditIngress.clusterIP }} + clusterIP: {{ .Values.auditIngress.clusterIP }} + {{- end }} + ports: + - name: audit-server + port: 443 + targetPort: {{ .Values.auditIngress.port }} + protocol: TCP + selector: + {{- include "gitops-reverser.selectorLabels" . | nindent 4 }} + role: leader +--- +{{- end }} apiVersion: v1 kind: Service metadata: diff --git a/charts/gitops-reverser/values.yaml b/charts/gitops-reverser/values.yaml index ef42416d..a21615ce 100644 --- a/charts/gitops-reverser/values.yaml +++ b/charts/gitops-reverser/values.yaml @@ -110,9 +110,23 @@ certificates: kind: Issuer create: true -# Settings for receiving audit events from the kubernetes api (the best way to run this) -audit: - clusterIP: 10.43.200.200 # Make sure that it's free, most clusters won't have a problem with this (it's in the default range and it's a high number) +# Dedicated audit ingress server configuration +auditIngress: + enabled: true + # Dedicated audit HTTPS listener port in the controller container + port: 9444 + # Optional fixed ClusterIP (useful for Kind/bootstrap environments before DNS is ready) + clusterIP: "" + tls: + certPath: "/tmp/k8s-audit-webhook-server/serving-certs" + certName: "tls.crt" + certKey: "tls.key" + secretName: "" + timeouts: + read: "15s" + write: "30s" + idle: "60s" + maxRequestBodyBytes: 10485760 # RBAC configuration rbac: diff --git a/cmd/main.go b/cmd/main.go index 3892e0c3..71dce98f 100644 --- a/cmd/main.go +++ b/cmd/main.go @@ -20,9 +20,15 @@ package main import ( "context" "crypto/tls" + "errors" "flag" + "fmt" + "net" + "net/http" "os" "path/filepath" + "strconv" + "strings" "time" // Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.) @@ -62,8 +68,15 @@ var ( const ( // Correlation store configuration. - correlationMaxEntries = 10000 - correlationTTL = 5 * time.Minute + correlationMaxEntries = 10000 + correlationTTL = 5 * time.Minute + flagParseFailureExitCode = 2 + defaultAuditPort = 9444 + defaultAuditMaxBodyBytes = int64(10 * 1024 * 1024) + defaultAuditReadTimeout = 15 * time.Second + defaultAuditWriteTimeout = 30 * time.Second + defaultAuditIdleTimeout = 60 * time.Second + defaultAuditShutdownTimeout = 10 * time.Second ) func init() { @@ -192,17 +205,30 @@ func main() { // Register experimental audit webhook for metrics collection auditHandler, err := webhookhandler.NewAuditHandler(webhookhandler.AuditHandlerConfig{ - DumpDir: cfg.auditDumpPath, + DumpDir: cfg.auditDumpPath, + MaxRequestBodyBytes: cfg.auditMaxRequestBodyBytes, }) fatalIfErr(err, "unable to create audit handler") - mgr.GetWebhookServer().Register("/audit-webhook", auditHandler) - if cfg.auditDumpPath != "" { - setupLog.Info("Experimental audit webhook handler registered with file dumping", - "http-path", "/audit-webhook", - "dump-path", cfg.auditDumpPath) + + var auditCertWatcher *certwatcher.CertWatcher + if cfg.auditIngressEnabled { + auditRunnable, watcher, initErr := initAuditServerRunnable(cfg, tlsOpts, auditHandler) + fatalIfErr(initErr, "unable to initialize audit ingress server") + auditCertWatcher = watcher + fatalIfErr(mgr.Add(auditRunnable), "unable to add audit ingress server runnable") + + if cfg.auditDumpPath != "" { + setupLog.Info("Audit ingress server configured with file dumping", + "http-path", "/audit-webhook/{clusterID}", + "dump-path", cfg.auditDumpPath, + "address", buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort)) + } else { + setupLog.Info("Audit ingress server configured", + "http-path", "/audit-webhook/{clusterID}", + "address", buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort)) + } } else { - setupLog.Info("Experimental audit webhook handler registered (file dumping disabled)", - "http-path", "/audit-webhook") + setupLog.Info("Audit ingress server disabled by flag", "flag", "--audit-ingress-enabled=false") } // NOTE: Old git.Worker has been replaced by WorkerManager + BranchWorker architecture @@ -233,7 +259,7 @@ func main() { // +kubebuilder:scaffold:builder // Cert watchers - addCertWatchersToManager(mgr, metricsCertWatcher, webhookCertWatcher) + addCertWatchersToManager(mgr, metricsCertWatcher, webhookCertWatcher, auditCertWatcher) // Health checks addHealthChecks(mgr) @@ -245,64 +271,129 @@ func main() { // appConfig holds parsed CLI flags and logging options. type appConfig struct { - metricsAddr string - metricsCertPath string - metricsCertName string - metricsCertKey string - webhookCertPath string - webhookCertName string - webhookCertKey string - enableLeaderElection bool - probeAddr string - secureMetrics bool - enableHTTP2 bool - auditDumpPath string - zapOpts zap.Options + metricsAddr string + metricsCertPath string + metricsCertName string + metricsCertKey string + webhookCertPath string + webhookCertName string + webhookCertKey string + enableLeaderElection bool + probeAddr string + secureMetrics bool + enableHTTP2 bool + auditDumpPath string + auditIngressEnabled bool + auditListenAddress string + auditPort int + auditCertPath string + auditCertName string + auditCertKey string + auditMaxRequestBodyBytes int64 + auditReadTimeout time.Duration + auditWriteTimeout time.Duration + auditIdleTimeout time.Duration + zapOpts zap.Options } // parseFlags parses CLI flags and returns the application configuration. func parseFlags() appConfig { + cfg, err := parseFlagsWithArgs(flag.CommandLine, os.Args[1:]) + if err != nil { + setupLog.Error(err, "unable to parse flags") + os.Exit(flagParseFailureExitCode) + } + return cfg +} + +func parseFlagsWithArgs(fs *flag.FlagSet, args []string) (appConfig, error) { var cfg appConfig - flag.StringVar(&cfg.metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+ + fs.StringVar(&cfg.metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+ "Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.") - flag.StringVar(&cfg.probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") - flag.BoolVar(&cfg.enableLeaderElection, "leader-elect", false, + fs.StringVar(&cfg.probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") + fs.BoolVar(&cfg.enableLeaderElection, "leader-elect", false, "Enable leader election for controller manager. "+ "Enabling this will ensure there is only one active controller manager.") - flag.BoolVar(&cfg.secureMetrics, "metrics-secure", true, + fs.BoolVar(&cfg.secureMetrics, "metrics-secure", true, "If set, the metrics endpoint is served securely via HTTPS. Use --metrics-secure=false to use HTTP instead.") - flag.StringVar( + fs.StringVar( &cfg.webhookCertPath, "webhook-cert-path", "", "The directory that contains the webhook certificate.", ) - flag.StringVar(&cfg.webhookCertName, "webhook-cert-name", "tls.crt", "The name of the webhook certificate file.") - flag.StringVar(&cfg.webhookCertKey, "webhook-cert-key", "tls.key", "The name of the webhook key file.") - flag.StringVar(&cfg.metricsCertPath, "metrics-cert-path", "", + fs.StringVar(&cfg.webhookCertName, "webhook-cert-name", "tls.crt", "The name of the webhook certificate file.") + fs.StringVar(&cfg.webhookCertKey, "webhook-cert-key", "tls.key", "The name of the webhook key file.") + fs.StringVar(&cfg.metricsCertPath, "metrics-cert-path", "", "The directory that contains the metrics server certificate.") - flag.StringVar( + fs.StringVar( &cfg.metricsCertName, "metrics-cert-name", "tls.crt", "The name of the metrics server certificate file.", ) - flag.StringVar(&cfg.metricsCertKey, "metrics-cert-key", "tls.key", "The name of the metrics server key file.") - flag.BoolVar(&cfg.enableHTTP2, "enable-http2", false, + fs.StringVar(&cfg.metricsCertKey, "metrics-cert-key", "tls.key", "The name of the metrics server key file.") + fs.BoolVar(&cfg.enableHTTP2, "enable-http2", false, "If set, HTTP/2 will be enabled for the metrics and webhook servers") - flag.StringVar(&cfg.auditDumpPath, "audit-dump-path", "", + fs.StringVar(&cfg.auditDumpPath, "audit-dump-path", "", "Directory to write audit events for debugging. If empty, audit event file dumping is disabled.") + fs.BoolVar(&cfg.auditIngressEnabled, "audit-ingress-enabled", true, + "Enable the dedicated HTTPS audit ingress server.") + fs.StringVar(&cfg.auditListenAddress, "audit-listen-address", "0.0.0.0", + "IP address for the dedicated audit ingress HTTPS server.") + fs.IntVar(&cfg.auditPort, "audit-port", defaultAuditPort, "Port for the dedicated audit ingress HTTPS server.") + fs.StringVar(&cfg.auditCertPath, "audit-cert-path", "", + "The directory that contains the audit ingress TLS certificate and key.") + fs.StringVar(&cfg.auditCertName, "audit-cert-name", "tls.crt", + "The name of the audit ingress TLS certificate file.") + fs.StringVar(&cfg.auditCertKey, "audit-cert-key", "tls.key", + "The name of the audit ingress TLS key file.") + fs.Int64Var(&cfg.auditMaxRequestBodyBytes, "audit-max-request-body-bytes", defaultAuditMaxBodyBytes, + "Maximum request body size in bytes accepted by the audit ingress handler.") + fs.DurationVar(&cfg.auditReadTimeout, "audit-read-timeout", defaultAuditReadTimeout, + "Read timeout for the dedicated audit ingress HTTPS server.") + fs.DurationVar(&cfg.auditWriteTimeout, "audit-write-timeout", defaultAuditWriteTimeout, + "Write timeout for the dedicated audit ingress HTTPS server.") + fs.DurationVar(&cfg.auditIdleTimeout, "audit-idle-timeout", defaultAuditIdleTimeout, + "Idle timeout for the dedicated audit ingress HTTPS server.") cfg.zapOpts = zap.Options{ Development: true, // Enable more detailed logging for debugging Level: zapcore.InfoLevel, // Change to DebugLevel for even more verbose output } - cfg.zapOpts.BindFlags(flag.CommandLine) + cfg.zapOpts.BindFlags(fs) - flag.Parse() - return cfg + if err := fs.Parse(args); err != nil { + return appConfig{}, err + } + if cfg.auditPort <= 0 { + return appConfig{}, fmt.Errorf("audit-port must be > 0, got %d", cfg.auditPort) + } + if cfg.auditMaxRequestBodyBytes <= 0 { + return appConfig{}, fmt.Errorf("audit-max-request-body-bytes must be > 0, got %d", cfg.auditMaxRequestBodyBytes) + } + if cfg.auditReadTimeout <= 0 { + return appConfig{}, fmt.Errorf("audit-read-timeout must be > 0, got %s", cfg.auditReadTimeout) + } + if cfg.auditWriteTimeout <= 0 { + return appConfig{}, fmt.Errorf("audit-write-timeout must be > 0, got %s", cfg.auditWriteTimeout) + } + if cfg.auditIdleTimeout <= 0 { + return appConfig{}, fmt.Errorf("audit-idle-timeout must be > 0, got %s", cfg.auditIdleTimeout) + } + if cfg.auditCertPath == "" { + cfg.auditCertPath = cfg.webhookCertPath + } + if cfg.auditCertName == "" { + cfg.auditCertName = cfg.webhookCertName + } + if cfg.auditCertKey == "" { + cfg.auditCertKey = cfg.webhookCertKey + } + + return cfg, nil } // fatalIfErr logs and exits the process if err is not nil. @@ -403,6 +494,88 @@ func buildMetricsServerOptions( return opts, metricsCertWatcher } +type auditServerRunnable struct { + server *http.Server +} + +func (r *auditServerRunnable) Start(ctx context.Context) error { + setupLog.Info("Starting dedicated audit ingress server", "address", r.server.Addr) + + shutdownDone := make(chan struct{}) + go func() { + defer close(shutdownDone) + <-ctx.Done() + shutdownCtx, cancel := context.WithTimeout(context.Background(), defaultAuditShutdownTimeout) + defer cancel() + if err := r.server.Shutdown(shutdownCtx); err != nil { + setupLog.Error(err, "Failed to shutdown dedicated audit ingress server") + } + }() + + err := r.server.ListenAndServeTLS("", "") + <-shutdownDone + if errors.Is(err, http.ErrServerClosed) { + return nil + } + return fmt.Errorf("audit ingress server failed: %w", err) +} + +func initAuditServerRunnable( + cfg appConfig, + baseTLS []func(*tls.Config), + handler http.Handler, +) (*auditServerRunnable, *certwatcher.CertWatcher, error) { + if strings.TrimSpace(cfg.auditCertPath) == "" { + return nil, nil, errors.New("audit-cert-path is required when audit ingress is enabled") + } + + certWatcher, err := certwatcher.New( + filepath.Join(cfg.auditCertPath, cfg.auditCertName), + filepath.Join(cfg.auditCertPath, cfg.auditCertKey), + ) + if err != nil { + return nil, nil, fmt.Errorf("failed to initialize audit ingress certificate watcher: %w", err) + } + + tlsOpts := append([]func(*tls.Config){}, baseTLS...) + tlsOpts = append(tlsOpts, func(config *tls.Config) { + config.GetCertificate = certWatcher.GetCertificate + }) + + serverTLS := &tls.Config{ + MinVersion: tls.VersionTLS12, + } + for _, opt := range tlsOpts { + opt(serverTLS) + } + + mux := buildAuditServeMux(handler) + server := &http.Server{ + Addr: buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort), + Handler: mux, + TLSConfig: serverTLS, + ReadTimeout: cfg.auditReadTimeout, + WriteTimeout: cfg.auditWriteTimeout, + IdleTimeout: cfg.auditIdleTimeout, + } + + return &auditServerRunnable{server: server}, certWatcher, nil +} + +func buildAuditServeMux(handler http.Handler) *http.ServeMux { + mux := http.NewServeMux() + mux.Handle("/audit-webhook", handler) + mux.Handle("/audit-webhook/", handler) + return mux +} + +func buildAuditServerAddress(listenAddress string, port int) string { + if strings.TrimSpace(listenAddress) == "" { + return fmt.Sprintf(":%d", port) + } + return net.JoinHostPort(listenAddress, strconv.Itoa(port)) +} + // newManager creates a new controller-runtime Manager with common options. func newManager( metricsOptions metricsserver.Options, @@ -450,7 +623,10 @@ func addLeaderPodLabeler(mgr ctrl.Manager, enabled bool) { } // addCertWatchersToManager attaches optional certificate watchers to the manager. -func addCertWatchersToManager(mgr ctrl.Manager, metricsCertWatcher, webhookCertWatcher *certwatcher.CertWatcher) { +func addCertWatchersToManager( + mgr ctrl.Manager, + metricsCertWatcher, webhookCertWatcher, auditCertWatcher *certwatcher.CertWatcher, +) { if metricsCertWatcher != nil { setupLog.Info("Adding metrics certificate watcher to manager") fatalIfErr(mgr.Add(metricsCertWatcher), "unable to add metrics certificate watcher to manager") @@ -459,6 +635,10 @@ func addCertWatchersToManager(mgr ctrl.Manager, metricsCertWatcher, webhookCertW setupLog.Info("Adding webhook certificate watcher to manager") fatalIfErr(mgr.Add(webhookCertWatcher), "unable to add webhook certificate watcher to manager") } + if auditCertWatcher != nil { + setupLog.Info("Adding audit ingress certificate watcher to manager") + fatalIfErr(mgr.Add(auditCertWatcher), "unable to add audit ingress certificate watcher to manager") + } } // addHealthChecks registers health and readiness checks. diff --git a/cmd/main_audit_server_test.go b/cmd/main_audit_server_test.go new file mode 100644 index 00000000..e1a5915f --- /dev/null +++ b/cmd/main_audit_server_test.go @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: Apache-2.0 +// +// Copyright 2025 ConfigButler +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package main + +import ( + "flag" + "net/http" + "net/http/httptest" + "testing" + "time" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestParseFlagsWithArgs_Defaults(t *testing.T) { + fs := flag.NewFlagSet("test-defaults", flag.ContinueOnError) + + cfg, err := parseFlagsWithArgs(fs, []string{}) + require.NoError(t, err) + + assert.True(t, cfg.auditIngressEnabled) + assert.Equal(t, "0.0.0.0", cfg.auditListenAddress) + assert.Equal(t, 9444, cfg.auditPort) + assert.Equal(t, int64(10485760), cfg.auditMaxRequestBodyBytes) + assert.Equal(t, 15*time.Second, cfg.auditReadTimeout) + assert.Equal(t, 30*time.Second, cfg.auditWriteTimeout) + assert.Equal(t, 60*time.Second, cfg.auditIdleTimeout) +} + +func TestParseFlagsWithArgs_CustomAuditValues(t *testing.T) { + fs := flag.NewFlagSet("test-custom", flag.ContinueOnError) + args := []string{ + "--webhook-cert-path=/tmp/webhook-certs", + "--audit-listen-address=127.0.0.1", + "--audit-port=9555", + "--audit-cert-path=/tmp/audit-certs", + "--audit-cert-name=cert.pem", + "--audit-cert-key=key.pem", + "--audit-max-request-body-bytes=2048", + "--audit-read-timeout=5s", + "--audit-write-timeout=8s", + "--audit-idle-timeout=13s", + } + + cfg, err := parseFlagsWithArgs(fs, args) + require.NoError(t, err) + + assert.Equal(t, "127.0.0.1", cfg.auditListenAddress) + assert.Equal(t, 9555, cfg.auditPort) + assert.Equal(t, "/tmp/audit-certs", cfg.auditCertPath) + assert.Equal(t, "cert.pem", cfg.auditCertName) + assert.Equal(t, "key.pem", cfg.auditCertKey) + assert.Equal(t, int64(2048), cfg.auditMaxRequestBodyBytes) + assert.Equal(t, 5*time.Second, cfg.auditReadTimeout) + assert.Equal(t, 8*time.Second, cfg.auditWriteTimeout) + assert.Equal(t, 13*time.Second, cfg.auditIdleTimeout) +} + +func TestParseFlagsWithArgs_FallsBackToWebhookCertPath(t *testing.T) { + fs := flag.NewFlagSet("test-fallback", flag.ContinueOnError) + args := []string{ + "--webhook-cert-path=/tmp/webhook-certs", + } + + cfg, err := parseFlagsWithArgs(fs, args) + require.NoError(t, err) + assert.Equal(t, "/tmp/webhook-certs", cfg.auditCertPath) +} + +func TestParseFlagsWithArgs_InvalidAuditSettings(t *testing.T) { + tests := []struct { + name string + args []string + }{ + { + name: "invalid port", + args: []string{"--audit-port=0"}, + }, + { + name: "invalid body size", + args: []string{"--audit-max-request-body-bytes=0"}, + }, + { + name: "invalid read timeout", + args: []string{"--audit-read-timeout=0s"}, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + fs := flag.NewFlagSet("test-invalid", flag.ContinueOnError) + _, err := parseFlagsWithArgs(fs, tt.args) + require.Error(t, err) + }) + } +} + +func TestBuildAuditServeMux_RoutesAuditPaths(t *testing.T) { + handler := http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusAccepted) + }) + + mux := buildAuditServeMux(handler) + + req := httptest.NewRequest(http.MethodPost, "/audit-webhook/cluster-a", nil) + w := httptest.NewRecorder() + mux.ServeHTTP(w, req) + assert.Equal(t, http.StatusAccepted, w.Code) + + req = httptest.NewRequest(http.MethodPost, "/audit-webhook", nil) + w = httptest.NewRecorder() + mux.ServeHTTP(w, req) + assert.Equal(t, http.StatusAccepted, w.Code) + + req = httptest.NewRequest(http.MethodPost, "/not-audit", nil) + w = httptest.NewRecorder() + mux.ServeHTTP(w, req) + assert.Equal(t, http.StatusNotFound, w.Code) +} + +func TestBuildAuditServerAddress(t *testing.T) { + assert.Equal(t, "0.0.0.0:9444", buildAuditServerAddress("0.0.0.0", 9444)) + assert.Equal(t, ":9444", buildAuditServerAddress("", 9444)) +} diff --git a/config/certmanager/certificate-audit-webhook.yaml b/config/certmanager/certificate-audit-webhook.yaml new file mode 100644 index 00000000..e62724ae --- /dev/null +++ b/config/certmanager/certificate-audit-webhook.yaml @@ -0,0 +1,19 @@ +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + labels: + app.kubernetes.io/name: gitops-reverser + app.kubernetes.io/managed-by: kustomize + name: audit-serving-cert + namespace: system +spec: + # SERVICE_NAME and SERVICE_NAMESPACE are replaced in config/default/kustomization.yaml + dnsNames: + - AUDIT_SERVICE_NAME.SERVICE_NAMESPACE.svc + - AUDIT_SERVICE_NAME.SERVICE_NAMESPACE.svc.cluster.local + issuerRef: + kind: Issuer + name: selfsigned-issuer + secretName: audit-webhook-server-cert + privateKey: + rotationPolicy: Always diff --git a/config/certmanager/kustomization.yaml b/config/certmanager/kustomization.yaml index fcb7498e..c24e7097 100644 --- a/config/certmanager/kustomization.yaml +++ b/config/certmanager/kustomization.yaml @@ -1,6 +1,7 @@ resources: - issuer.yaml - certificate-webhook.yaml +- certificate-audit-webhook.yaml - certificate-metrics.yaml configurations: diff --git a/config/default/audit_webhook_service_fixed_ip_patch.yaml b/config/default/audit_webhook_service_fixed_ip_patch.yaml new file mode 100644 index 00000000..f7fe3197 --- /dev/null +++ b/config/default/audit_webhook_service_fixed_ip_patch.yaml @@ -0,0 +1,10 @@ +# Patch to set fixed ClusterIP for dedicated audit webhook service. +# This is required because kube-apiserver starts before CoreDNS +# and cannot rely on Service DNS resolution during bootstrap. +apiVersion: v1 +kind: Service +metadata: + name: audit-webhook-service + namespace: system +spec: + clusterIP: 10.96.200.200 diff --git a/config/default/kustomization.yaml b/config/default/kustomization.yaml index 7b47b5f8..a73d8138 100644 --- a/config/default/kustomization.yaml +++ b/config/default/kustomization.yaml @@ -41,11 +41,11 @@ patches: target: kind: Deployment -# [AUDIT-WEBHOOK] Set fixed ClusterIP for webhook service so kube-apiserver can connect before CoreDNS is ready -- path: webhook_service_fixed_ip_patch.yaml +# [AUDIT-WEBHOOK] Set fixed ClusterIP for dedicated audit webhook service so kube-apiserver can connect before CoreDNS. +- path: audit_webhook_service_fixed_ip_patch.yaml target: kind: Service - name: webhook-service + name: audit-webhook-service # Uncomment the patches line if you enable Metrics and CertManager # [METRICS-WITH-CERTS] To enable metrics protected with certManager, uncomment the following line. @@ -160,6 +160,43 @@ replacements: index: 1 create: true + - source: + kind: Service + version: v1 + name: audit-webhook-service + fieldPath: .metadata.name # Name of the dedicated audit service + targets: + - select: + kind: Certificate + group: cert-manager.io + version: v1 + name: audit-serving-cert + fieldPaths: + - .spec.dnsNames.0 + - .spec.dnsNames.1 + options: + delimiter: '.' + index: 0 + create: true + - source: + kind: Service + version: v1 + name: audit-webhook-service + fieldPath: .metadata.namespace # Namespace of the dedicated audit service + targets: + - select: + kind: Certificate + group: cert-manager.io + version: v1 + name: audit-serving-cert + fieldPaths: + - .spec.dnsNames.0 + - .spec.dnsNames.1 + options: + delimiter: '.' + index: 1 + create: true + - source: # Uncomment the following block if you have a ValidatingWebhook (--programmatic-validation) kind: Certificate group: cert-manager.io diff --git a/config/default/manager_webhook_patch.yaml b/config/default/manager_webhook_patch.yaml index 963c8a4c..eb372030 100644 --- a/config/default/manager_webhook_patch.yaml +++ b/config/default/manager_webhook_patch.yaml @@ -22,6 +22,48 @@ name: webhook-server protocol: TCP +# Add the dedicated audit ingress server arguments. +- op: add + path: /spec/template/spec/containers/0/args/- + value: --audit-ingress-enabled=true +- op: add + path: /spec/template/spec/containers/0/args/- + value: --audit-listen-address=0.0.0.0 +- op: add + path: /spec/template/spec/containers/0/args/- + value: --audit-port=9444 +- op: add + path: /spec/template/spec/containers/0/args/- + value: --audit-cert-path=/tmp/k8s-audit-webhook-server/serving-certs +- op: add + path: /spec/template/spec/containers/0/args/- + value: --audit-max-request-body-bytes=10485760 +- op: add + path: /spec/template/spec/containers/0/args/- + value: --audit-read-timeout=15s +- op: add + path: /spec/template/spec/containers/0/args/- + value: --audit-write-timeout=30s +- op: add + path: /spec/template/spec/containers/0/args/- + value: --audit-idle-timeout=60s + +# Add the volumeMount for dedicated audit webhook certificates. +- op: add + path: /spec/template/spec/containers/0/volumeMounts/- + value: + mountPath: /tmp/k8s-audit-webhook-server/serving-certs + name: audit-webhook-certs + readOnly: true + +# Add dedicated audit ingress container port. +- op: add + path: /spec/template/spec/containers/0/ports/- + value: + containerPort: 9444 + name: audit-server + protocol: TCP + # Add the volume configuration for the webhook certificates - op: add path: /spec/template/spec/volumes/- @@ -29,3 +71,11 @@ name: webhook-certs secret: secretName: webhook-server-cert + +# Add the volume configuration for the dedicated audit ingress certificates. +- op: add + path: /spec/template/spec/volumes/- + value: + name: audit-webhook-certs + secret: + secretName: audit-webhook-server-cert diff --git a/config/webhook/audit-service.yaml b/config/webhook/audit-service.yaml new file mode 100644 index 00000000..e984a91d --- /dev/null +++ b/config/webhook/audit-service.yaml @@ -0,0 +1,17 @@ +apiVersion: v1 +kind: Service +metadata: + labels: + app.kubernetes.io/name: gitops-reverser + app.kubernetes.io/managed-by: kustomize + name: audit-webhook-service + namespace: system +spec: + ports: + - port: 443 + protocol: TCP + targetPort: 9444 + selector: + control-plane: controller-manager + app.kubernetes.io/name: gitops-reverser + role: leader diff --git a/config/webhook/kustomization.yaml b/config/webhook/kustomization.yaml index 9cf26134..992a7085 100644 --- a/config/webhook/kustomization.yaml +++ b/config/webhook/kustomization.yaml @@ -1,6 +1,7 @@ resources: - manifests.yaml - service.yaml +- audit-service.yaml configurations: - kustomizeconfig.yaml diff --git a/docs/audit-setup/cluster/audit/webhook-config.yaml b/docs/audit-setup/cluster/audit/webhook-config.yaml index 413e3bd4..1c25c18b 100644 --- a/docs/audit-setup/cluster/audit/webhook-config.yaml +++ b/docs/audit-setup/cluster/audit/webhook-config.yaml @@ -6,7 +6,7 @@ clusters: - name: audit-webhook cluster: # Use the ClusterIP, but with HTTPS - server: https://10.43.200.200:443/audit-webhook + server: https://10.43.200.200:443/audit-webhook/my-cluster # We could also configure an network wide ingress with it's own cert, but it's better if it has a small surface insecure-skip-tls-verify: true # base64content = kubectl get secret -n -o jsonpath='{.data.ca\.crt}' diff --git a/internal/webhook/audit_handler.go b/internal/webhook/audit_handler.go index 436f8647..fd12aadb 100644 --- a/internal/webhook/audit_handler.go +++ b/internal/webhook/audit_handler.go @@ -27,6 +27,7 @@ import ( "os" "path/filepath" "strconv" + "strings" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/metric" @@ -45,6 +46,12 @@ import ( const ( // DefaultAuditDumpDir is the default directory for audit event dumps. DefaultAuditDumpDir = "/tmp/audit-events" + // DefaultAuditMaxRequestBodyBytes limits incoming audit payload size. + DefaultAuditMaxRequestBodyBytes = int64(10 * 1024 * 1024) + // MaxClusterIDMetricLabelLength constrains label cardinality impact. + MaxClusterIDMetricLabelLength = 63 + // UnknownClusterIDMetricValue is used when cluster ID cannot be labeled safely. + UnknownClusterIDMetricValue = "unknown" ) // AuditHandlerConfig contains configuration for the audit handler. @@ -52,6 +59,8 @@ type AuditHandlerConfig struct { // DumpDir is the directory where audit events are written for debugging. // If empty, defaults to DefaultAuditDumpDir. DumpDir string + // MaxRequestBodyBytes is the maximum accepted HTTP request body size. + MaxRequestBodyBytes int64 } // AuditHandler handles incoming audit events and collects metrics. @@ -64,6 +73,10 @@ type AuditHandler struct { // NewAuditHandler creates a new audit handler with the given configuration. // If config.DumpDir is empty, file dumping is disabled. func NewAuditHandler(config AuditHandlerConfig) (*AuditHandler, error) { + if config.MaxRequestBodyBytes <= 0 { + config.MaxRequestBodyBytes = DefaultAuditMaxRequestBodyBytes + } + scheme := runtime.NewScheme() if err := audit.AddToScheme(scheme); err != nil { return nil, fmt.Errorf("failed to initialize scheme: %w", err) @@ -85,50 +98,67 @@ func NewAuditHandler(config AuditHandlerConfig) (*AuditHandler, error) { // ServeHTTP implements http.Handler for audit event processing. func (h *AuditHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { ctx := r.Context() - log := logf.FromContext(ctx) + log := logf.Log.WithName("audit-handler") if r.Method != http.MethodPost { http.Error(w, "Method not allowed", http.StatusMethodNotAllowed) return } + clusterID, err := extractClusterID(r.URL.Path) + if err != nil { + http.Error(w, err.Error(), http.StatusBadRequest) + return + } + + reqLog := log.WithValues( + "clusterID", clusterID, + "remoteAddr", r.RemoteAddr, + "path", r.URL.Path, + ) + eventListV1, err := h.decodeEventList(r) if err != nil { - log.Error(err, "Failed to decode audit event list") + reqLog.Error(err, "Failed to decode audit event list") http.Error(w, err.Error(), http.StatusBadRequest) return } if len(eventListV1.Items) == 0 { - log.Info("Received empty audit event list") + reqLog.Info("Received empty audit event list", "eventCount", 0, "processingOutcome", "empty") w.WriteHeader(http.StatusOK) _, err = w.Write([]byte("Empty event list processed")) if err != nil { - log.Error(err, "Failed to write response") + reqLog.Error(err, "Failed to write response") } return } - if err := h.processEvents(ctx, eventListV1.Items); err != nil { - log.Error(err, "Failed to process audit events") + if err := h.processEvents(ctx, clusterID, eventListV1.Items); err != nil { + reqLog.Error(err, "Failed to process audit events") http.Error(w, err.Error(), http.StatusInternalServerError) return } + reqLog.Info("Processed audit request", "eventCount", len(eventListV1.Items), "processingOutcome", "success") w.WriteHeader(http.StatusOK) _, err = w.Write([]byte("Audit event processed")) if err != nil { - log.Error(err, "Failed to write response") + reqLog.Error(err, "Failed to write response") } } // decodeEventList reads and decodes the audit event list from the request. func (h *AuditHandler) decodeEventList(r *http.Request) (*auditv1.EventList, error) { - body, err := io.ReadAll(r.Body) + limited := io.LimitReader(r.Body, h.config.MaxRequestBodyBytes+1) + body, err := io.ReadAll(limited) if err != nil { return nil, fmt.Errorf("failed to read body: %w", err) } defer r.Body.Close() + if int64(len(body)) > h.config.MaxRequestBodyBytes { + return nil, fmt.Errorf("request body too large: max %d bytes", h.config.MaxRequestBodyBytes) + } var eventListV1 auditv1.EventList _, _, err = h.deserializer.Decode(body, nil, &eventListV1) @@ -140,8 +170,9 @@ func (h *AuditHandler) decodeEventList(r *http.Request) (*auditv1.EventList, err } // processEvents processes a list of audit events. -func (h *AuditHandler) processEvents(ctx context.Context, events []auditv1.Event) error { - log := logf.FromContext(ctx) +func (h *AuditHandler) processEvents(ctx context.Context, clusterID string, events []auditv1.Event) error { + log := logf.Log.WithName("audit-handler") + clusterIDMetric := sanitizeClusterIDForMetric(clusterID) for _, auditEventV1 := range events { var auditEvent audit.Event @@ -161,6 +192,8 @@ func (h *AuditHandler) processEvents(ctx context.Context, events []auditv1.Event if auditEvent.ImpersonatedUser != nil { log.Info( "Audit event impersonated", + "clusterID", + clusterID, "authUser", auditEvent.User.Username, "impersonatedUser", @@ -170,6 +203,7 @@ func (h *AuditHandler) processEvents(ctx context.Context, events []auditv1.Event } metrics.AuditEventsReceivedTotal.Add(ctx, 1, metric.WithAttributes( + attribute.String("cluster_id", clusterIDMetric), attribute.String("gvr", gvr), attribute.String("action", action), attribute.String("user", user), @@ -179,6 +213,7 @@ func (h *AuditHandler) processEvents(ctx context.Context, events []auditv1.Event if process { // For now we hardly do a thing log.Info("Processed audit event", + "clusterID", clusterID, "gvr", gvr, "action", action, "auditID", auditEvent.AuditID, @@ -193,6 +228,61 @@ func (h *AuditHandler) processEvents(ctx context.Context, events []auditv1.Event return nil } +func extractClusterID(path string) (string, error) { + const auditPrefix = "/audit-webhook/" + if path == "/audit-webhook" { + return "", errors.New("missing cluster ID in path; expected /audit-webhook/{clusterID}") + } + if !strings.HasPrefix(path, auditPrefix) { + return "", errors.New("invalid path; expected /audit-webhook/{clusterID}") + } + + clusterID := strings.TrimPrefix(path, auditPrefix) + clusterID = strings.TrimSuffix(clusterID, "/") + if clusterID == "" { + return "", errors.New("missing cluster ID in path; expected /audit-webhook/{clusterID}") + } + if strings.Contains(clusterID, "/") { + return "", errors.New("invalid path; expected single segment cluster ID in /audit-webhook/{clusterID}") + } + + return clusterID, nil +} + +func sanitizeClusterIDForMetric(clusterID string) string { + clusterID = strings.TrimSpace(clusterID) + if clusterID == "" { + return UnknownClusterIDMetricValue + } + + var builder strings.Builder + builder.Grow(len(clusterID)) + for _, r := range clusterID { + if isAllowedClusterIDRune(r) { + builder.WriteRune(r) + continue + } + builder.WriteByte('_') + } + + sanitized := builder.String() + if len(sanitized) > MaxClusterIDMetricLabelLength { + sanitized = sanitized[:MaxClusterIDMetricLabelLength] + } + if sanitized == "" { + return UnknownClusterIDMetricValue + } + + return sanitized +} + +func isAllowedClusterIDRune(r rune) bool { + return (r >= 'a' && r <= 'z') || + (r >= 'A' && r <= 'Z') || + (r >= '0' && r <= '9') || + r == '-' || r == '_' || r == '.' +} + // extractGVR constructs the Group/Version/Resource string from the audit event // using k8s.io/apimachinery/pkg/runtime/schema utilities. func (h *AuditHandler) extractGVR(event *audit.Event) string { diff --git a/internal/webhook/audit_handler_test.go b/internal/webhook/audit_handler_test.go index c51900bc..40a2c20a 100644 --- a/internal/webhook/audit_handler_test.go +++ b/internal/webhook/audit_handler_test.go @@ -25,6 +25,7 @@ import ( "net/http" "net/http/httptest" "os" + "strings" "testing" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" @@ -51,39 +52,66 @@ func TestAuditHandler_ServeHTTP(t *testing.T) { tests := []struct { name string method string + path string body string expectedStatus int }{ { name: "valid audit event - create configmap", method: http.MethodPost, + path: "/audit-webhook/cluster-a", body: `{"kind":"EventList","apiVersion":"audit.k8s.io/v1","items":[{"kind":"Event","level":"RequestResponse","auditID":"test-id","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/default/configmaps","verb":"create","user":{"username":"test-user"},"objectRef":{"resource":"configmaps","namespace":"default","name":"test-config","apiVersion":"v1"},"responseStatus":{"code":200}}]}`, expectedStatus: http.StatusOK, }, { name: "valid audit event - update deployment", method: http.MethodPost, + path: "/audit-webhook/cluster-a", body: `{"kind":"EventList","apiVersion":"audit.k8s.io/v1","items":[{"kind":"Event","level":"RequestResponse","auditID":"test-id","stage":"ResponseComplete","requestURI":"/apis/apps/v1/namespaces/default/deployments/test-deploy","verb":"update","user":{"username":"test-user"},"objectRef":{"resource":"deployments","namespace":"default","name":"test-deploy","apiVersion":"apps/v1"},"responseStatus":{"code":200}}]}`, expectedStatus: http.StatusOK, }, { name: "multiple events in batch", method: http.MethodPost, + path: "/audit-webhook/cluster-a", body: `{"kind":"EventList","apiVersion":"audit.k8s.io/v1","items":[{"kind":"Event","auditID":"batch-event-1","verb":"create","user":{"username":"test-user"},"objectRef":{"resource":"configmaps","apiVersion":"v1"}},{"kind":"Event","auditID":"batch-event-2","verb":"update","user":{"username":"test-user"},"objectRef":{"resource":"deployments","apiVersion":"apps/v1"}}]}`, expectedStatus: http.StatusOK, }, + { + name: "newly seen cluster ID is accepted", + method: http.MethodPost, + path: "/audit-webhook/new-cluster-42", + body: `{"kind":"EventList","apiVersion":"audit.k8s.io/v1","items":[{"kind":"Event","auditID":"new-cluster-test","verb":"create","user":{"username":"test-user"},"objectRef":{"resource":"configmaps","apiVersion":"v1"}}]}`, + expectedStatus: http.StatusOK, + }, { name: "invalid method", method: http.MethodGet, + path: "/audit-webhook/cluster-a", body: `{"kind":"EventList","apiVersion":"audit.k8s.io/v1","items":[{"kind":"Event","auditID":"invalid-method-test","verb":"create","user":{"username":"test-user"},"objectRef":{"resource":"configmaps","apiVersion":"v1"}}]}`, expectedStatus: http.StatusMethodNotAllowed, }, { name: "invalid JSON", method: http.MethodPost, + path: "/audit-webhook/cluster-a", body: "invalid json", expectedStatus: http.StatusBadRequest, }, + { + name: "missing cluster ID path", + method: http.MethodPost, + path: "/audit-webhook", + body: `{"kind":"EventList","apiVersion":"audit.k8s.io/v1","items":[{"kind":"Event","auditID":"missing-cluster","verb":"create","user":{"username":"test-user"},"objectRef":{"resource":"configmaps","apiVersion":"v1"}}]}`, + expectedStatus: http.StatusBadRequest, + }, + { + name: "extra path segments are rejected", + method: http.MethodPost, + path: "/audit-webhook/cluster-a/extra", + body: `{"kind":"EventList","apiVersion":"audit.k8s.io/v1","items":[{"kind":"Event","auditID":"missing-cluster","verb":"create","user":{"username":"test-user"},"objectRef":{"resource":"configmaps","apiVersion":"v1"}}]}`, + expectedStatus: http.StatusBadRequest, + }, } for _, tt := range tests { @@ -94,7 +122,7 @@ func TestAuditHandler_ServeHTTP(t *testing.T) { require.NoError(t, err) // Create request - req := httptest.NewRequest(tt.method, "/audit-webhook", bytes.NewReader([]byte(tt.body))) + req := httptest.NewRequest(tt.method, tt.path, bytes.NewReader([]byte(tt.body))) w := httptest.NewRecorder() // Call handler @@ -162,7 +190,7 @@ func TestAuditHandler_InvalidJSON(t *testing.T) { }) require.NoError(t, err) - req := httptest.NewRequest(http.MethodPost, "/audit-webhook", bytes.NewReader([]byte("invalid json"))) + req := httptest.NewRequest(http.MethodPost, "/audit-webhook/cluster-a", bytes.NewReader([]byte("invalid json"))) w := httptest.NewRecorder() handler.ServeHTTP(w, req) @@ -201,7 +229,7 @@ func TestAuditHandler_FileDump(t *testing.T) { body, err := json.Marshal(eventList) require.NoError(t, err) - req := httptest.NewRequest(http.MethodPost, "/audit-webhook", bytes.NewReader(body)) + req := httptest.NewRequest(http.MethodPost, "/audit-webhook/cluster-a", bytes.NewReader(body)) w := httptest.NewRecorder() // Call handler @@ -266,7 +294,7 @@ func TestAuditHandler_FileDump(t *testing.T) { eventJSON, err := json.Marshal(eventList) require.NoError(t, err) - req := httptest.NewRequest(http.MethodPost, "/audit-webhook", bytes.NewReader(eventJSON)) + req := httptest.NewRequest(http.MethodPost, "/audit-webhook/cluster-a", bytes.NewReader(eventJSON)) w := httptest.NewRecorder() // Call handler @@ -373,3 +401,76 @@ func TestAuditHandler_ReadYAMLToJSON(t *testing.T) { // Log the JSON for verification t.Logf("Converted JSON: %s", jsonString) } + +func TestAuditHandler_RejectsOversizedBody(t *testing.T) { + handler, err := NewAuditHandler(AuditHandlerConfig{ + DumpDir: "/tmp/audit-events", + MaxRequestBodyBytes: 32, + }) + require.NoError(t, err) + + oversizedBody := `{"kind":"EventList","apiVersion":"audit.k8s.io/v1","items":[]}` + req := httptest.NewRequest(http.MethodPost, "/audit-webhook/cluster-a", bytes.NewReader([]byte(oversizedBody))) + w := httptest.NewRecorder() + + handler.ServeHTTP(w, req) + + assert.Equal(t, http.StatusBadRequest, w.Code) + assert.Contains(t, w.Body.String(), "request body too large") +} + +func TestExtractClusterID(t *testing.T) { + tests := []struct { + name string + path string + expectedID string + expectError bool + }{ + { + name: "valid cluster ID", + path: "/audit-webhook/cluster-a", + expectedID: "cluster-a", + }, + { + name: "valid cluster ID with trailing slash", + path: "/audit-webhook/cluster-a/", + expectedID: "cluster-a", + }, + { + name: "missing cluster ID", + path: "/audit-webhook", + expectError: true, + }, + { + name: "extra segment", + path: "/audit-webhook/cluster-a/extra", + expectError: true, + }, + { + name: "invalid prefix", + path: "/wrong/cluster-a", + expectError: true, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + clusterID, err := extractClusterID(tt.path) + if tt.expectError { + require.Error(t, err) + return + } + require.NoError(t, err) + assert.Equal(t, tt.expectedID, clusterID) + }) + } +} + +func TestSanitizeClusterIDForMetric(t *testing.T) { + assert.Equal(t, "cluster-a", sanitizeClusterIDForMetric("cluster-a")) + assert.Equal(t, "cluster_a", sanitizeClusterIDForMetric("cluster/a")) + assert.Equal(t, "unknown", sanitizeClusterIDForMetric(" ")) + + longID := strings.Repeat("a", MaxClusterIDMetricLabelLength+5) + assert.Len(t, sanitizeClusterIDForMetric(longID), MaxClusterIDMetricLabelLength) +} diff --git a/test/e2e/e2e_test.go b/test/e2e/e2e_test.go index 726578e5..445dad26 100644 --- a/test/e2e/e2e_test.go +++ b/test/e2e/e2e_test.go @@ -263,6 +263,40 @@ var _ = Describe("Manager", Ordered, func() { Eventually(verifyWebhookService, 30*time.Second).Should(Succeed()) }) + It("should expose both admission and audit leader-only services", func() { + By("verifying admission webhook service exists") + cmd := exec.Command("kubectl", "get", "svc", "gitops-reverser-webhook-service", "-n", namespace) + _, err := utils.Run(cmd) + Expect(err).NotTo(HaveOccurred(), "Admission webhook service should exist") + + By("verifying audit webhook service exists") + cmd = exec.Command("kubectl", "get", "svc", "gitops-reverser-audit-webhook-service", "-n", namespace) + _, err = utils.Run(cmd) + Expect(err).NotTo(HaveOccurred(), "Audit webhook service should exist") + + By("verifying audit service routes only to leader pod") + Eventually(func(g Gomega) { + endpointsCmd := exec.Command("kubectl", "get", "endpoints", + "gitops-reverser-audit-webhook-service", "-n", namespace, + "-o", "jsonpath={.subsets[*].addresses[*].targetRef.name}") + output, endpointsErr := utils.Run(endpointsCmd) + g.Expect(endpointsErr).NotTo(HaveOccurred(), "Failed to get audit service endpoints") + + lines := utils.GetNonEmptyLines(output) + var podNames []string + for _, line := range lines { + if !strings.HasPrefix(line, "Warning:") && + !strings.Contains(line, "deprecated") && + strings.Contains(line, "controller-manager") { + podNames = append(podNames, line) + } + } + + g.Expect(podNames).To(HaveLen(1), "audit service should route to exactly 1 pod (leader)") + g.Expect(podNames[0]).To(Equal(controllerPodName), "audit service should route to leader pod") + }, 30*time.Second).Should(Succeed()) + }) + It("should have webhook registration configured", func() { By("verifying webhook registration for event handler") verifyWebhook := func(g Gomega) { @@ -369,6 +403,11 @@ var _ = Describe("Manager", Ordered, func() { baselineAuditEvents, err := queryPrometheus("sum(gitopsreverser_audit_events_received_total) or vector(0)") Expect(err).NotTo(HaveOccurred()) fmt.Printf("📊 Baseline audit events: %.0f\n", baselineAuditEvents) + baselineClusterAuditEvents, err := queryPrometheus( + "sum(gitopsreverser_audit_events_received_total{cluster_id='kind-e2e'}) or vector(0)", + ) + Expect(err).NotTo(HaveOccurred()) + fmt.Printf("📊 Baseline kind-e2e audit events: %.0f\n", baselineClusterAuditEvents) By("creating a ConfigMap to trigger audit events") cmd := exec.Command("kubectl", "create", "configmap", "audit-test-cm", @@ -376,11 +415,22 @@ var _ = Describe("Manager", Ordered, func() { "--from-literal=test=audit") _, err = utils.Run(cmd) Expect(err).NotTo(HaveOccurred(), "ConfigMap creation should succeed") + cmd = exec.Command("kubectl", "patch", "configmap", "audit-test-cm", + "--namespace", namespace, + "--type=merge", + "--patch", `{"data":{"test":"audit-updated"}}`) + _, err = utils.Run(cmd) + Expect(err).NotTo(HaveOccurred(), "ConfigMap update should succeed") By("waiting for audit event metric to increment") - waitForMetric("sum(gitopsreverser_audit_events_received_total) or vector(0)", + waitForMetricWithTimeout("sum(gitopsreverser_audit_events_received_total) or vector(0)", func(v float64) bool { return v > baselineAuditEvents }, - "audit events should increment") + "audit events should increment", 2*time.Minute) + waitForMetricWithTimeout( + "sum(gitopsreverser_audit_events_received_total{cluster_id='kind-e2e'}) or vector(0)", + func(v float64) bool { return v > baselineClusterAuditEvents }, + "audit events should increment for cluster_id=kind-e2e", 2*time.Minute, + ) By("verifying audit events were received") currentAuditEvents, err := queryPrometheus("sum(gitopsreverser_audit_events_received_total) or vector(0)") diff --git a/test/e2e/helpers.go b/test/e2e/helpers.go index b859140e..2bbd15f2 100644 --- a/test/e2e/helpers.go +++ b/test/e2e/helpers.go @@ -42,6 +42,7 @@ import ( // namespace where the project is deployed in. const namespace = "sut" +const metricWaitDefaultTimeout = 30 * time.Second // metricsServiceName is the name of the metrics service of the project. const metricsServiceName = "gitops-reverser-controller-manager-metrics-service" @@ -98,13 +99,23 @@ func queryPrometheus(query string) (float64, error) { // waitForMetric waits for a Prometheus metric query to satisfy a condition func waitForMetric(query string, condition func(float64) bool, description string) { + waitForMetricWithTimeout(query, condition, description, metricWaitDefaultTimeout) +} + +// waitForMetricWithTimeout waits for a Prometheus metric query with a custom timeout. +func waitForMetricWithTimeout( + query string, + condition func(float64) bool, + description string, + timeout time.Duration, +) { By(fmt.Sprintf("waiting for metric: %s", description)) Eventually(func(g Gomega) { value, err := queryPrometheus(query) g.Expect(err).NotTo(HaveOccurred(), "Failed to query Prometheus") g.Expect(condition(value)).To(BeTrue(), fmt.Sprintf("%s (query: %s, value: %.2f)", description, query, value)) - }, 30*time.Second, 2*time.Second).Should(Succeed()) //nolint:mnd // reasonable timeout and polling interval + }, timeout, 2*time.Second).Should(Succeed()) //nolint:mnd // reasonable polling interval } // getPrometheusURL returns the URL for accessing Prometheus UI @@ -177,6 +188,15 @@ func waitForCertificateSecrets() { g.Expect(err).NotTo(HaveOccurred(), "metrics-server-cert secret should exist") }, 60*time.Second, 2*time.Second).Should(Succeed()) //nolint:mnd // reasonable timeout for cert-manager + By("waiting for dedicated audit certificate secret to be created by cert-manager") + Eventually(func(g Gomega) { + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) //nolint:mnd // reasonable timeout + defer cancel() + cmd := exec.CommandContext(ctx, "kubectl", "get", "secret", "audit-webhook-server-cert", "-n", namespace) + _, err := utils.Run(cmd) + g.Expect(err).NotTo(HaveOccurred(), "audit-webhook-server-cert secret should exist") + }, 60*time.Second, 2*time.Second).Should(Succeed()) //nolint:mnd // reasonable timeout for cert-manager + By("✅ All certificate secrets are ready") } diff --git a/test/e2e/kind/README.md b/test/e2e/kind/README.md index f8a728db..df684afd 100644 --- a/test/e2e/kind/README.md +++ b/test/e2e/kind/README.md @@ -4,7 +4,7 @@ This directory contains configuration files to set up a Kind cluster with Kubern ## Overview -The gitops-reverser operator exposes an experimental audit webhook endpoint at `/audit-webhook` that receives audit events from the Kubernetes API server. This setup configures Kind to send audit events to this endpoint for testing and metrics collection. +The gitops-reverser operator exposes an experimental audit webhook endpoint at `/audit-webhook/{clusterID}` that receives audit events from the Kubernetes API server. This setup configures Kind to send audit events to this endpoint for testing and metrics collection. ## Files @@ -28,7 +28,7 @@ The [`cluster.yaml`](cluster.yaml:1) mounts the audit policy and webhook configu The [`webhook-config.yaml`](audit/webhook-config.yaml:1) configures the kube-apiserver to send audit events to: ``` -https://gitops-reverser-webhook-service.gitops-reverser-system.svc.cluster.local:443/audit-webhook +https://10.96.200.200:443/audit-webhook/kind-e2e ``` **Important**: Uses `insecure-skip-tls-verify: true` because the webhook service uses a self-signed certificate from cert-manager. @@ -114,9 +114,9 @@ The audit webhook tracks metrics with labels: docker exec gitops-reverser-test-e2e-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log ``` -2. **Verify webhook service exists**: +2. **Verify audit webhook service exists**: ```bash - kubectl get svc -n gitops-reverser-system gitops-reverser-webhook-service + kubectl get svc -n gitops-reverser-system gitops-reverser-audit-webhook-service ``` 3. **Check if kube-apiserver can reach the webhook**: @@ -143,13 +143,13 @@ The K3s setup in [`docs/audit-setup/cluster/`](../../../docs/audit-setup/cluster The Kind setup uses: -- Service DNS: `gitops-reverser-webhook-service.gitops-reverser-system.svc.cluster.local` +- Fixed ClusterIP: `10.96.200.200` +- Path-based cluster identity: `/audit-webhook/kind-e2e` - Config location: `/etc/kubernetes/` (Kind standard) -- Automatic service discovery via DNS ## References - [Kubernetes Audit Documentation](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/) - [Kind Extra Mounts](https://kind.sigs.k8s.io/docs/user/configuration/#extra-mounts) - [Kubeadm Config Patches](https://kind.sigs.k8s.io/docs/user/configuration/#kubeadm-config-patches) -- [Audit Handler Implementation](../../../internal/webhook/audit_handler.go:1) \ No newline at end of file +- [Audit Handler Implementation](../../../internal/webhook/audit_handler.go:1) diff --git a/test/e2e/kind/audit/webhook-config.yaml b/test/e2e/kind/audit/webhook-config.yaml index f8cdf15e..489e7d94 100644 --- a/test/e2e/kind/audit/webhook-config.yaml +++ b/test/e2e/kind/audit/webhook-config.yaml @@ -8,8 +8,8 @@ clusters: cluster: # IMPORTANT: Use fixed ClusterIP instead of DNS name # kube-apiserver starts before CoreDNS, so DNS resolution fails at startup - # The ClusterIP is set in test/e2e/manifests/webhook-service-fixed-ip.yaml - server: https://10.96.200.200:443/audit-webhook + # The ClusterIP is set in config/default/audit_webhook_service_fixed_ip_patch.yaml + server: https://10.96.200.200:443/audit-webhook/kind-e2e # Skip TLS verification for testing (webhook uses self-signed cert from cert-manager) insecure-skip-tls-verify: true contexts: From db26a58232f9876e86ef4c2ec50147141b91e2e4 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Wed, 11 Feb 2026 20:13:55 +0000 Subject: [PATCH 04/32] chore: Removing the leader election stuff, for now we will run 1 pod --- charts/gitops-reverser/README.md | 55 +- .../templates/certificates.yaml | 4 +- .../gitops-reverser/templates/configmap.yaml | 5 +- .../gitops-reverser/templates/deployment.yaml | 5 +- charts/gitops-reverser/templates/rbac.yaml | 55 -- .../gitops-reverser/templates/services.yaml | 6 +- .../templates/validating-webhook.yaml | 2 +- charts/gitops-reverser/values.yaml | 8 - cmd/main.go | 37 +- config/manager/manager.yaml | 3 +- config/rbac/kustomization.yaml | 3 - config/rbac/leader_election_role.yaml | 40 - config/rbac/leader_election_role_binding.yaml | 15 - config/rbac/role.yaml | 10 - config/webhook/audit-service.yaml | 1 - config/webhook/service.yaml | 1 - ...https-server-alignment-and-service-plan.md | 229 ++++++ internal/leader/leader.go | 137 ---- internal/leader/leader_test.go | 688 ------------------ test/e2e/e2e_test.go | 81 +-- 20 files changed, 274 insertions(+), 1111 deletions(-) delete mode 100644 config/rbac/leader_election_role.yaml delete mode 100644 config/rbac/leader_election_role_binding.yaml create mode 100644 docs/design/https-server-alignment-and-service-plan.md delete mode 100644 internal/leader/leader.go delete mode 100644 internal/leader/leader_test.go diff --git a/charts/gitops-reverser/README.md b/charts/gitops-reverser/README.md index 60ec0b1c..7268557b 100644 --- a/charts/gitops-reverser/README.md +++ b/charts/gitops-reverser/README.md @@ -72,9 +72,9 @@ kubectl apply -f https://github.com/ConfigButler/gitops-reverser/releases/latest ## Architecture -### High Availability Setup +### Deployment Topology -The chart deploys 2 replicas by default with leader election: +The chart deploys 1 replica by default: ``` ┌─────────────────────────────────────────┐ @@ -84,7 +84,7 @@ The chart deploys 2 replicas by default with leader election: │ webhook requests ▼ ┌──────────────────────────────────────────┐ -│ gitops-reverser-leader-only (Service) │ +│ gitops-reverser-webhook (Service) │ │ Admission webhook: /process-validating-webhook │ └──────────────┬───────────────────────────┘ │ @@ -93,19 +93,17 @@ The chart deploys 2 replicas by default with leader election: │ Audit webhook: /audit-webhook/{clusterID} │ └──────────────┬───────────────────────────┘ │ - ┌─────────┴─────────┐ - ▼ ▼ -┌─────────┐ ┌─────────┐ -│ Pod 1 │ │ Pod 2 │ -│ LEADER │◄────────┤ STANDBY │ -│ Active │ election│ Ready │ -└─────────┘ └─────────┘ + ▼ + ┌─────────────┐ + │ Pod 1 │ + │ Controller │ + │ Active │ + └─────────────┘ ``` **Key Features:** -- **Leader-only service**: Routes webhook traffic only to the active leader pod +- **Single-pod operation**: Minimal moving parts while HA work is deferred - **Dedicated audit service**: Separates audit ingress from admission webhook traffic -- **Automatic failover**: Standby pod takes over if leader fails - **Pod anti-affinity**: Pods spread across different nodes - **Pod disruption budget**: Ensures at least 1 pod available during maintenance @@ -115,13 +113,12 @@ The chart deploys 2 replicas by default with leader election: #### Minimal (Testing/Development) -Single replica, no HA: +Single replica: ```yaml # minimal-values.yaml replicaCount: 1 controllerManager: - leaderElection: false podDisruptionBudget: enabled: false affinity: {} @@ -137,15 +134,15 @@ helm install gitops-reverser \ #### Production (Recommended) -Enhanced HA with 3 replicas: +Hardened single-replica deployment: ```yaml # production-values.yaml -replicaCount: 3 +replicaCount: 1 podDisruptionBudget: enabled: true - minAvailable: 2 + minAvailable: 1 resources: requests: @@ -185,9 +182,7 @@ webhook: | Parameter | Description | Default | |-----------|-------------|---------| | `replicaCount` | Number of controller replicas | `1` | -| `leaderOnlyService.enabled` | Create service routing to leader only | `true` | | `image.repository` | Container image repository | `ghcr.io/configbutler/gitops-reverser` | -| `controllerManager.leaderElection` | Enable leader election | `true` | | `webhook.validating.failurePolicy` | Webhook failure policy (Ignore/Fail) | `Ignore` | | `auditIngress.enabled` | Enable dedicated audit HTTPS ingress server | `true` | | `auditIngress.port` | Dedicated audit container port | `9444` | @@ -241,7 +236,7 @@ kubectl delete crd gitrepoconfigs.configbutler.ai watchrules.configbutler.ai ### Verify Installation ```bash -# Check pods (should see 2 replicas) +# Check pods (should see 1 replica) kubectl get pods -n gitops-reverser-system # Check CRDs @@ -250,8 +245,6 @@ kubectl get crd | grep configbutler # Check webhook kubectl get validatingwebhookconfiguration -l app.kubernetes.io/name=gitops-reverser -# Check leader election -kubectl get lease -n gitops-reverser-system ``` ### View Logs @@ -260,8 +253,6 @@ kubectl get lease -n gitops-reverser-system # All pods kubectl logs -n gitops-reverser-system -l app.kubernetes.io/name=gitops-reverser -f -# Leader pod only -kubectl logs -n gitops-reverser-system -l role=leader -f ``` ### Access Metrics @@ -333,20 +324,6 @@ kubectl logs -n cert-manager -l app=cert-manager -f kubectl rollout restart deployment cert-manager -n cert-manager ``` -### Leader Election Issues - -Check which pod is the leader: - -```bash -# View lease -kubectl get lease -n gitops-reverser-system - -# View pod labels -kubectl get pods -n gitops-reverser-system --show-labels - -# Leader should have label: role=leader -``` - ### Pods Not Scheduling If pods are pending due to anti-affinity rules: @@ -355,7 +332,7 @@ If pods are pending due to anti-affinity rules: # Check node count kubectl get nodes -# If you have only 1 node, reduce replicas or disable affinity +# If you have only 1 node, keep a single replica or disable affinity helm upgrade gitops-reverser \ oci://ghcr.io/configbutler/charts/gitops-reverser \ --namespace gitops-reverser-system \ diff --git a/charts/gitops-reverser/templates/certificates.yaml b/charts/gitops-reverser/templates/certificates.yaml index a0321c99..3c147d96 100644 --- a/charts/gitops-reverser/templates/certificates.yaml +++ b/charts/gitops-reverser/templates/certificates.yaml @@ -21,8 +21,8 @@ metadata: {{- include "gitops-reverser.labels" . | nindent 4 }} spec: dnsNames: - - {{ include "gitops-reverser.fullname" . }}-leader-only.{{ .Release.Namespace }}.svc - - {{ include "gitops-reverser.fullname" . }}-leader-only.{{ .Release.Namespace }}.svc.cluster.local + - {{ include "gitops-reverser.fullname" . }}-webhook.{{ .Release.Namespace }}.svc + - {{ include "gitops-reverser.fullname" . }}-webhook.{{ .Release.Namespace }}.svc.cluster.local issuerRef: kind: {{ .Values.certificates.certManager.issuer.kind }} name: {{ .Values.certificates.certManager.issuer.name }} diff --git a/charts/gitops-reverser/templates/configmap.yaml b/charts/gitops-reverser/templates/configmap.yaml index 3d653265..d2b167b1 100644 --- a/charts/gitops-reverser/templates/configmap.yaml +++ b/charts/gitops-reverser/templates/configmap.yaml @@ -15,13 +15,10 @@ data: bindAddress: {{ .Values.controllerManager.metrics.bindAddress }} webhook: port: {{ .Values.webhook.server.port }} - leaderElection: - leaderElect: {{ .Values.controllerManager.leaderElection }} - resourceName: 9ed3440e.configbutler.ai {{- if .Values.logging }} logging: level: {{ .Values.logging.level | default "info" }} development: {{ .Values.logging.development | default false }} encoder: {{ .Values.logging.encoder | default "json" }} stacktraceLevel: {{ .Values.logging.stacktraceLevel | default "error" }} - {{- end }} \ No newline at end of file + {{- end }} diff --git a/charts/gitops-reverser/templates/deployment.yaml b/charts/gitops-reverser/templates/deployment.yaml index df8ec3ad..56be5c60 100644 --- a/charts/gitops-reverser/templates/deployment.yaml +++ b/charts/gitops-reverser/templates/deployment.yaml @@ -31,7 +31,7 @@ spec: serviceAccountName: {{ include "gitops-reverser.serviceAccountName" . }} securityContext: {{- toYaml .Values.podSecurityContext | nindent 8 }} - terminationGracePeriodSeconds: 20 # Shutting down the leaders requires leader transfer to be completed before we shut down the pod (can take some time). + terminationGracePeriodSeconds: 20 containers: - name: manager securityContext: @@ -39,9 +39,6 @@ spec: image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" imagePullPolicy: {{ .Values.image.pullPolicy }} args: - {{- if .Values.controllerManager.leaderElection }} - - --leader-elect - {{- end }} - --health-probe-bind-address=:8081 - --metrics-bind-address=:8080 - --metrics-secure=false diff --git a/charts/gitops-reverser/templates/rbac.yaml b/charts/gitops-reverser/templates/rbac.yaml index 1ecde19f..a37fba09 100644 --- a/charts/gitops-reverser/templates/rbac.yaml +++ b/charts/gitops-reverser/templates/rbac.yaml @@ -72,61 +72,6 @@ roleRef: kind: ClusterRole name: {{ include "gitops-reverser.fullname" . }}-proxy-role subjects: -- kind: ServiceAccount - name: {{ include "gitops-reverser.serviceAccountName" . }} - namespace: {{ .Release.Namespace }} ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: Role -metadata: - name: {{ include "gitops-reverser.fullname" . }}-leader-election-role - namespace: {{ .Release.Namespace }} - labels: - {{- include "gitops-reverser.labels" . | nindent 4 }} -rules: -- apiGroups: - - "" - resources: - - configmaps - verbs: - - get - - list - - watch - - create - - update - - patch - - delete -- apiGroups: - - coordination.k8s.io - resources: - - leases - verbs: - - get - - list - - watch - - create - - update - - patch - - delete -- apiGroups: - - "" - resources: - - events - verbs: - - create ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: RoleBinding -metadata: - name: {{ include "gitops-reverser.fullname" . }}-leader-election-rolebinding - namespace: {{ .Release.Namespace }} - labels: - {{- include "gitops-reverser.labels" . | nindent 4 }} -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: Role - name: {{ include "gitops-reverser.fullname" . }}-leader-election-role -subjects: - kind: ServiceAccount name: {{ include "gitops-reverser.serviceAccountName" . }} namespace: {{ .Release.Namespace }} diff --git a/charts/gitops-reverser/templates/services.yaml b/charts/gitops-reverser/templates/services.yaml index 1d6d204d..25eb7b65 100644 --- a/charts/gitops-reverser/templates/services.yaml +++ b/charts/gitops-reverser/templates/services.yaml @@ -3,11 +3,11 @@ apiVersion: v1 kind: Service metadata: - name: {{ include "gitops-reverser.fullname" . }}-leader-only + name: {{ include "gitops-reverser.fullname" . }}-webhook namespace: {{ .Release.Namespace }} labels: {{- include "gitops-reverser.labels" . | nindent 4 }} - app.kubernetes.io/component: leader-only + app.kubernetes.io/component: webhook spec: type: ClusterIP ports: @@ -17,7 +17,6 @@ spec: protocol: TCP selector: {{- include "gitops-reverser.selectorLabels" . | nindent 4 }} - role: leader # Pods get this label from within the operator source code: the Kube API lease mechanism is used to always have one active leader. --- {{- if .Values.auditIngress.enabled }} apiVersion: v1 @@ -40,7 +39,6 @@ spec: protocol: TCP selector: {{- include "gitops-reverser.selectorLabels" . | nindent 4 }} - role: leader --- {{- end }} apiVersion: v1 diff --git a/charts/gitops-reverser/templates/validating-webhook.yaml b/charts/gitops-reverser/templates/validating-webhook.yaml index 7ea831f6..ae4807a1 100644 --- a/charts/gitops-reverser/templates/validating-webhook.yaml +++ b/charts/gitops-reverser/templates/validating-webhook.yaml @@ -14,7 +14,7 @@ webhooks: - v1 clientConfig: service: - name: {{ include "gitops-reverser.fullname" . }}-leader-only + name: {{ include "gitops-reverser.fullname" . }}-webhook namespace: {{ .Release.Namespace }} path: /process-validating-webhook {{- if not .Values.certificates.certManager.enabled }} diff --git a/charts/gitops-reverser/values.yaml b/charts/gitops-reverser/values.yaml index a21615ce..6f8998c3 100644 --- a/charts/gitops-reverser/values.yaml +++ b/charts/gitops-reverser/values.yaml @@ -5,12 +5,6 @@ # High Availability configuration - runs 1 replicas by default (HA support is not good enough yet) replicaCount: 1 -# Leader-only service configuration -leaderOnlyService: - # When true, creates a dedicated service that routes traffic only to the leader pod - # This is critical for HA deployments to ensure consistent processing of incomming API server events - enabled: true - image: repository: ghcr.io/configbutler/gitops-reverser pullPolicy: IfNotPresent @@ -50,8 +44,6 @@ securityContext: # Controller manager configuration controllerManager: - # Enable leader election for controller manager (required for HA) - leaderElection: true # Health probe configuration healthProbe: bindAddress: :8081 diff --git a/cmd/main.go b/cmd/main.go index 71dce98f..33fb4619 100644 --- a/cmd/main.go +++ b/cmd/main.go @@ -52,7 +52,6 @@ import ( "github.com/ConfigButler/gitops-reverser/internal/controller" "github.com/ConfigButler/gitops-reverser/internal/correlation" "github.com/ConfigButler/gitops-reverser/internal/git" - "github.com/ConfigButler/gitops-reverser/internal/leader" "github.com/ConfigButler/gitops-reverser/internal/metrics" "github.com/ConfigButler/gitops-reverser/internal/reconcile" "github.com/ConfigButler/gitops-reverser/internal/rulestore" @@ -115,10 +114,7 @@ func main() { ) // Manager - mgr := newManager(metricsServerOptions, webhookServer, cfg.probeAddr, cfg.enableLeaderElection) - - // Leader labeler (if elected) - addLeaderPodLabeler(mgr, cfg.enableLeaderElection) + mgr := newManager(metricsServerOptions, webhookServer, cfg.probeAddr) // Initialize rule store for watch rules ruleStore := rulestore.NewStore() @@ -278,7 +274,6 @@ type appConfig struct { webhookCertPath string webhookCertName string webhookCertKey string - enableLeaderElection bool probeAddr string secureMetrics bool enableHTTP2 bool @@ -312,9 +307,6 @@ func parseFlagsWithArgs(fs *flag.FlagSet, args []string) (appConfig, error) { fs.StringVar(&cfg.metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+ "Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.") fs.StringVar(&cfg.probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") - fs.BoolVar(&cfg.enableLeaderElection, "leader-elect", false, - "Enable leader election for controller manager. "+ - "Enabling this will ensure there is only one active controller manager.") fs.BoolVar(&cfg.secureMetrics, "metrics-secure", true, "If set, the metrics endpoint is served securely via HTTPS. Use --metrics-secure=false to use HTTP instead.") fs.StringVar( @@ -581,16 +573,12 @@ func newManager( metricsOptions metricsserver.Options, webhookServer webhook.Server, probeAddr string, - enableLeaderElection bool, ) ctrl.Manager { mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{ Scheme: scheme, Metrics: metricsOptions, WebhookServer: webhookServer, HealthProbeBindAddress: probeAddr, - LeaderElection: enableLeaderElection, - LeaderElectionID: "9ed3440e.configbutler.ai", - // LeaderElectionReleaseOnCancel: true, }) if err != nil { setupLog.Error(err, "unable to start manager") @@ -599,29 +587,6 @@ func newManager( return mgr } -// addLeaderPodLabeler adds the leader pod labeler runnable when leader election is enabled. -// +kubebuilder:rbac:groups="",resources=pods,verbs=get;list;watch;update;patch -func addLeaderPodLabeler(mgr ctrl.Manager, enabled bool) { - if !enabled { - return - } - - podName := leader.GetPodName() - podNamespace := leader.GetPodNamespace() - if podName != "" && podNamespace != "" { - setupLog.Info("Adding leader pod labeler", "pod", podName, "namespace", podNamespace) - podLabeler := &leader.PodLabeler{ - Client: mgr.GetClient(), - Log: ctrl.Log.WithName("leader-labeler"), - PodName: podName, - Namespace: podNamespace, - } - fatalIfErr(mgr.Add(podLabeler), "unable to add leader pod labeler") - } else { - setupLog.Info("POD_NAME or POD_NAMESPACE not set, skipping leader pod labeler") - } -} - // addCertWatchersToManager attaches optional certificate watchers to the manager. func addCertWatchersToManager( mgr ctrl.Manager, diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml index c94b390c..6437af9d 100644 --- a/config/manager/manager.yaml +++ b/config/manager/manager.yaml @@ -21,7 +21,7 @@ spec: matchLabels: control-plane: controller-manager app.kubernetes.io/name: gitops-reverser - replicas: 2 + replicas: 1 template: metadata: annotations: @@ -61,7 +61,6 @@ spec: - command: - /manager args: - - --leader-elect - --health-probe-bind-address=:8081 image: controller:latest name: manager diff --git a/config/rbac/kustomization.yaml b/config/rbac/kustomization.yaml index 86645ef5..e1fbf0f9 100644 --- a/config/rbac/kustomization.yaml +++ b/config/rbac/kustomization.yaml @@ -8,8 +8,6 @@ resources: - role.yaml - role_binding.yaml - test_user_role_binding.yaml - - leader_election_role.yaml - - leader_election_role_binding.yaml # The following RBAC configurations are used to protect # the metrics endpoint with authn/authz. These configurations # ensure that only authorized users and service accounts @@ -36,4 +34,3 @@ resources: - gitprovider_admin_role.yaml - gitprovider_editor_role.yaml - gitprovider_viewer_role.yaml - diff --git a/config/rbac/leader_election_role.yaml b/config/rbac/leader_election_role.yaml deleted file mode 100644 index a12b473f..00000000 --- a/config/rbac/leader_election_role.yaml +++ /dev/null @@ -1,40 +0,0 @@ -# permissions to do leader election. -apiVersion: rbac.authorization.k8s.io/v1 -kind: Role -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: leader-election-role -rules: -- apiGroups: - - "" - resources: - - configmaps - verbs: - - get - - list - - watch - - create - - update - - patch - - delete -- apiGroups: - - coordination.k8s.io - resources: - - leases - verbs: - - get - - list - - watch - - create - - update - - patch - - delete -- apiGroups: - - "" - resources: - - events - verbs: - - create - - patch diff --git a/config/rbac/leader_election_role_binding.yaml b/config/rbac/leader_election_role_binding.yaml deleted file mode 100644 index ca6debb0..00000000 --- a/config/rbac/leader_election_role_binding.yaml +++ /dev/null @@ -1,15 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: RoleBinding -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: leader-election-rolebinding -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: Role - name: leader-election-role -subjects: -- kind: ServiceAccount - name: controller-manager - namespace: system diff --git a/config/rbac/role.yaml b/config/rbac/role.yaml index 0de108eb..c4e3993e 100644 --- a/config/rbac/role.yaml +++ b/config/rbac/role.yaml @@ -13,16 +13,6 @@ rules: - get - list - watch -- apiGroups: - - "" - resources: - - pods - verbs: - - get - - list - - patch - - update - - watch - apiGroups: - '*' resources: diff --git a/config/webhook/audit-service.yaml b/config/webhook/audit-service.yaml index e984a91d..60aa332e 100644 --- a/config/webhook/audit-service.yaml +++ b/config/webhook/audit-service.yaml @@ -14,4 +14,3 @@ spec: selector: control-plane: controller-manager app.kubernetes.io/name: gitops-reverser - role: leader diff --git a/config/webhook/service.yaml b/config/webhook/service.yaml index 3a39ec12..8f892bce 100644 --- a/config/webhook/service.yaml +++ b/config/webhook/service.yaml @@ -14,4 +14,3 @@ spec: selector: control-plane: controller-manager app.kubernetes.io/name: gitops-reverser - role: leader diff --git a/docs/design/https-server-alignment-and-service-plan.md b/docs/design/https-server-alignment-and-service-plan.md new file mode 100644 index 00000000..87606454 --- /dev/null +++ b/docs/design/https-server-alignment-and-service-plan.md @@ -0,0 +1,229 @@ +# HTTPS Server Alignment And Service Plan + +## Goal + +Improve consistency across the three HTTPS surfaces: + +1. admission webhook server +2. audit ingress server +3. metrics server + +With Service topology now simplified after removing the leader-only Service. + +## Current Operating Mode + +Run with **a single pod** for the current phase. + +## Single-Replica Checklist + +- [ ] `replicaCount: 1` is the chart default for this phase. +- [ ] HA-specific behavior is disabled/ignored by default. +- [ ] Leader-only Service has been removed from active topology. +- [ ] HA reintroduction is explicitly deferred to the planned rewrite. + +## Current Constraints + +- Leader-only routing is no longer part of active Service topology. +- Kubernetes Service selectors are per-Service, not per-port. + +## Decision On "One Service, Three Ports" + +### Can we do it technically? + +Yes, Kubernetes supports one Service exposing multiple ports. + +### Should we do it here? + +Yes, this is now the active direction for the single-pod phase. + +### Recommended topology (single pod) + +Use **one Service with three ports**: + +1. admission HTTPS +2. audit HTTPS +3. metrics HTTPS + +This minimizes moving parts for the interim single-pod phase. + +## Alignment Plan + +## 1. Unify server config model + +- Introduce a shared internal server config shape for: + - bind address/port + - cert path/name/key + - read/write/idle timeout + - TLS enabled/insecure mode guard +- Map flags into this model for all three servers. + +## 2. Unify TLS/cert watcher bootstrap + +- Add one helper that: + - validates cert config + - creates optional certwatcher + - wires `GetCertificate` + - applies shared TLS defaults (minimum version + HTTP/2 policy) +- Use same helper for metrics, admission, and audit. + +## 3. Unify server lifecycle wiring + +- Keep all servers manager-managed. +- Reuse one runnable pattern for startup/shutdown + timeout. +- Standardize startup/shutdown logs and error paths. + +## 4. Align Helm values and args + +- Keep existing keys for compatibility, but normalize structure: + - `webhook.server` + - `auditIngress` + - `controllerManager.metrics` +- Ensure timeout and cert naming is consistent across all three blocks. + +## 5. Simplify deployment model now + +- Default chart/config to single replica. +- Keep leader-only Service removed from this phase. +- Keep optional leader-election code path only if low-cost; otherwise disable in defaults. + +## 6. Service simplification (single service) + +- Merge admission, audit, and metrics onto one Service with three target ports. +- Update cert SANs and docs accordingly. + +## 7. Tests and rollout checks + +- Unit: + - shared server config parsing + - shared TLS helper behavior + - service template rendering for single Service with three ports +- E2E: + - admission and audit reachable on same Service + - metrics reachable on same Service +- Validation sequence: + - `make build` + - `make lint` + - `make test-e2e` + +## Target Settings Design (Markdown-Only) + +This section defines the intended end-state configuration model without implementing it yet. + +### End-State Overview + +The chart should converge on: + +- One Pod replica by default for this phase. +- One Service exposing three HTTPS ports (admission, audit, metrics). +- One shared server settings shape reused by all three listeners. +- Per-surface overrides only where behavior is genuinely different. +- Per-server TLS can be enabled/disabled independently. + +### Proposed Helm Values Shape + +```yaml +replicaCount: 1 + +network: + service: + enabled: true + type: ClusterIP + ports: + admission: 443 + audit: 8444 + metrics: 8443 + +servers: + defaults: + enableHTTP2: false + timeouts: + read: 15s + write: 30s + idle: 60s + tls: + enabled: true + certPath: "" + certName: tls.crt + certKey: tls.key + minVersion: VersionTLS12 + + admission: + enabled: true + bindAddress: :9443 + timeouts: {} # optional override + tls: + enabled: true # may be set false for local/dev scenarios + secretName: "" # optional if cert-manager manages mount/secret + + audit: + enabled: true + bindAddress: :9444 + maxRequestBodyBytes: 10485760 + timeouts: {} + tls: + enabled: true + secretName: "" + + metrics: + enabled: true + bindAddress: :8080 + secure: true + timeouts: {} + tls: + enabled: true + secretName: "" +``` + +If `servers..tls.enabled` is omitted, inherit from `servers.defaults.tls.enabled`. + +### Settings Responsibilities + +| Area | Purpose | Notes | +|---|---|---| +| `servers.defaults` | Shared defaults for all HTTPS listeners | Single source of truth for TLS + timeout defaults, including TLS default on/off | +| `servers.admission` | Admission-specific listener settings | Keeps webhook behavior settings separate under `webhook.validating` | +| `servers.audit` | Audit ingress listener settings | Retains audit payload controls like `maxRequestBodyBytes` | +| `servers.metrics` | Metrics listener settings | Supports secure metrics endpoint consistently, but can be intentionally downgraded per environment | +| `network.service` | Cluster Service exposure | Owns externally reachable ports only, not container bind ports | + +### Compatibility Mapping (Current -> Target) + +| Current key | Target key | Migration intent | +|---|---|---| +| `webhook.server.port` | `servers.admission.bindAddress` | Keep old key as compatibility alias initially | +| `webhook.server.certPath/certName/certKey` | `servers.admission.tls.*` (or inherited defaults) | Prefer inherited defaults unless explicitly overridden | +| `auditIngress.port` | `servers.audit.bindAddress` | Preserve behavior, normalize naming | +| `auditIngress.tls.*` | `servers.audit.tls.*` | Direct move | +| `auditIngress.timeouts.*` | `servers.audit.timeouts.*` | Direct move | +| `controllerManager.metrics.bindAddress` | `servers.metrics.bindAddress` | Unify metrics with same server model | +| `controllerManager.enableHTTP2` | `servers.defaults.enableHTTP2` | Single flag for all listeners in this phase | +| `controllerManager.metrics.secure` (if present) | `servers.metrics.tls.enabled` | Keep compatibility alias during migration | + +### CLI Args/Runtime Mapping Direction + +Desired runtime model: + +- Parse Helm values into one internal server settings struct per surface. +- Apply shared defaulting/validation once. +- Generate listener-specific runtime config from the same code path. + +Resulting behavior goals: + +- Same TLS validation rules for all listeners. +- Same timeout parsing and error messages for all listeners. +- Same startup/shutdown lifecycle pattern for all listeners. +- If TLS is disabled for a listener, skip cert watcher/bootstrap for that listener and run plain HTTP on its bind address. + +### TLS Disable Guardrails + +- Keep TLS enabled by default for all listeners. +- Treat TLS-disabled mode as non-production convenience for local/dev/test. +- Emit a startup warning whenever any listener runs with TLS disabled. +- Admission/audit TLS disable should be opt-in only and clearly visible in rendered values. + +### Rollout Notes For Settings Refactor + +- Keep legacy keys supported during transition. +- Emit clear deprecation warnings when legacy keys are used. +- Switch docs/examples to target keys first; keep compatibility notes adjacent. +- Remove deprecated keys only after at least one stable release carrying warnings. diff --git a/internal/leader/leader.go b/internal/leader/leader.go deleted file mode 100644 index d106306d..00000000 --- a/internal/leader/leader.go +++ /dev/null @@ -1,137 +0,0 @@ -/* -SPDX-License-Identifier: Apache-2.0 - -Copyright 2025 ConfigButler - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -/* -Package leader provides leader election functionality for the GitOps Reverser controller. -It manages pod labeling to identify the active leader instance in a multi-replica deployment. -*/ -package leader - -// +kubebuilder:rbac:groups="",resources=pods,verbs=get;list;watch;update;patch - -import ( - "context" - "os" - - "github.com/go-logr/logr" - corev1 "k8s.io/api/core/v1" - "k8s.io/apimachinery/pkg/api/errors" - "k8s.io/apimachinery/pkg/types" - "sigs.k8s.io/controller-runtime/pkg/client" -) - -const ( - leaderLabelKey = "role" - leaderLabelValue = "leader" -) - -// PodLabeler is a Runnable that adds a label to the pod when it becomes the leader -// and removes it when it stops being the leader. -// It implements the LeaderElectionRunnable interface so it only runs on the leader. -type PodLabeler struct { - Client client.Client - Log logr.Logger - PodName string - Namespace string -} - -// NeedLeaderElection implements the LeaderElectionRunnable interface. -// This ensures the PodLabeler only runs on the elected leader. -func (p *PodLabeler) NeedLeaderElection() bool { - return true -} - -// Start adds the leader label to the pod and blocks until the context is canceled. -// This method is only called on the elected leader pod when NeedLeaderElection returns true. -func (p *PodLabeler) Start(ctx context.Context) error { - log := p.Log.WithValues("pod", p.PodName, "namespace", p.Namespace) - log.Info("🎯 PodLabeler.Start() called - This pod is the leader, adding leader label.") - - if err := p.addLabel(ctx, log); err != nil { - log.Error(err, "❌ Failed to add leader label") - return err - } - - log.Info("✅ Leader label added successfully") - - // The context is canceled when the manager stops. - <-ctx.Done() - - log.Info("Leader is shutting down, removing leader label.") - // Use a new context for the cleanup operation. - if err := p.removeLabel(context.Background(), log); err != nil { - log.Error(err, "failed to remove leader label on shutdown") - // Don't return error on shutdown, just log it. - } - return nil -} - -func (p *PodLabeler) addLabel(ctx context.Context, log logr.Logger) error { - pod, err := p.getPod(ctx) - if err != nil { - return err - } - - if pod.Labels == nil { - pod.Labels = make(map[string]string) - } - - if val, ok := pod.Labels[leaderLabelKey]; ok && val == leaderLabelValue { - log.Info("Pod already has leader label") - return nil - } - - pod.Labels[leaderLabelKey] = leaderLabelValue - return p.Client.Update(ctx, pod) -} - -func (p *PodLabeler) removeLabel(ctx context.Context, log logr.Logger) error { - pod, err := p.getPod(ctx) - if err != nil { - if errors.IsNotFound(err) { - log.Info("Pod not found, cannot remove leader label.") - return nil - } - return err - } - - if _, ok := pod.Labels[leaderLabelKey]; !ok { - log.Info("Pod does not have leader label, nothing to remove.") - return nil - } - - delete(pod.Labels, leaderLabelKey) - return p.Client.Update(ctx, pod) -} - -func (p *PodLabeler) getPod(ctx context.Context) (*corev1.Pod, error) { - pod := &corev1.Pod{} - key := types.NamespacedName{Name: p.PodName, Namespace: p.Namespace} - err := p.Client.Get(ctx, key, pod) - return pod, err -} - -// GetPodName returns the pod name from the POD_NAME environment variable. -func GetPodName() string { - return os.Getenv("POD_NAME") -} - -// GetPodNamespace returns the pod namespace from the POD_NAMESPACE environment variable. -func GetPodNamespace() string { - return os.Getenv("POD_NAMESPACE") -} diff --git a/internal/leader/leader_test.go b/internal/leader/leader_test.go deleted file mode 100644 index 40a2d7e8..00000000 --- a/internal/leader/leader_test.go +++ /dev/null @@ -1,688 +0,0 @@ -/* -SPDX-License-Identifier: Apache-2.0 - -Copyright 2025 ConfigButler - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package leader - -import ( - "context" - "testing" - "time" - - "github.com/stretchr/testify/assert" - "github.com/stretchr/testify/require" - corev1 "k8s.io/api/core/v1" - "k8s.io/apimachinery/pkg/api/errors" - metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/runtime" - "k8s.io/apimachinery/pkg/types" - "sigs.k8s.io/controller-runtime/pkg/client/fake" - "sigs.k8s.io/controller-runtime/pkg/log/zap" -) - -func TestPodLabeler_Start_AddLabel(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{}, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - // Create a context that will be canceled after a short time - ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond) - defer cancel() - - // Execute - err = labeler.Start(ctx) - require.NoError(t, err) - - // Verify the label was added - updatedPod := &corev1.Pod{} - err = client.Get(context.Background(), types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - // The label should have been added and then removed during shutdown - // Since we can't easily test the intermediate state, we verify the cleanup happened - assert.NotContains(t, updatedPod.Labels, leaderLabelKey) -} - -func TestPodLabeler_Start_PodNotFound(t *testing.T) { - // Setup - no pod in the fake client - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - client := fake.NewClientBuilder().WithScheme(scheme).Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "non-existent-pod", - Namespace: "test-namespace", - } - - ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond) - defer cancel() - - // Execute - err = labeler.Start(ctx) - require.Error(t, err) - assert.True(t, errors.IsNotFound(err)) -} - -func TestPodLabeler_addLabel_NewLabel(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{}, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - // Execute - ctx := context.Background() - err = labeler.addLabel(ctx, logger) - require.NoError(t, err) - - // Verify - updatedPod := &corev1.Pod{} - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - assert.Equal(t, leaderLabelValue, updatedPod.Labels[leaderLabelKey]) -} - -func TestPodLabeler_addLabel_ExistingLabel(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{ - leaderLabelKey: leaderLabelValue, // Already has the leader label - }, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - // Execute - ctx := context.Background() - err = labeler.addLabel(ctx, logger) - require.NoError(t, err) - - // Verify the label is still there (no error should occur) - updatedPod := &corev1.Pod{} - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - assert.Equal(t, leaderLabelValue, updatedPod.Labels[leaderLabelKey]) -} - -func TestPodLabeler_addLabel_NilLabels(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: nil, // Nil labels map - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - // Execute - ctx := context.Background() - err = labeler.addLabel(ctx, logger) - require.NoError(t, err) - - // Verify - updatedPod := &corev1.Pod{} - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - assert.NotNil(t, updatedPod.Labels) - assert.Equal(t, leaderLabelValue, updatedPod.Labels[leaderLabelKey]) -} - -func TestPodLabeler_removeLabel_ExistingLabel(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{ - leaderLabelKey: leaderLabelValue, - "other-label": "other-value", - }, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - // Execute - ctx := context.Background() - err = labeler.removeLabel(ctx, logger) - require.NoError(t, err) - - // Verify - updatedPod := &corev1.Pod{} - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - assert.NotContains(t, updatedPod.Labels, leaderLabelKey) - assert.Equal(t, "other-value", updatedPod.Labels["other-label"]) // Other labels preserved -} - -func TestPodLabeler_removeLabel_NoLabel(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{ - "other-label": "other-value", - }, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - // Execute - ctx := context.Background() - err = labeler.removeLabel(ctx, logger) - require.NoError(t, err) - - // Verify - should be no-op - updatedPod := &corev1.Pod{} - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - assert.NotContains(t, updatedPod.Labels, leaderLabelKey) - assert.Equal(t, "other-value", updatedPod.Labels["other-label"]) -} - -func TestPodLabeler_removeLabel_PodNotFound(t *testing.T) { - // Setup - no pod in the fake client - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - client := fake.NewClientBuilder().WithScheme(scheme).Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "non-existent-pod", - Namespace: "test-namespace", - } - - // Execute - ctx := context.Background() - err = labeler.removeLabel(ctx, logger) - require.NoError(t, err) // Should not error when pod is not found during cleanup -} - -func TestPodLabeler_getPod_Success(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - expectedPod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{ - "test-label": "test-value", - }, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(expectedPod). - Build() - - labeler := &PodLabeler{ - Client: client, - PodName: "test-pod", - Namespace: "test-namespace", - } - - // Execute - ctx := context.Background() - pod, err := labeler.getPod(ctx) - require.NoError(t, err) - assert.NotNil(t, pod) - assert.Equal(t, "test-pod", pod.Name) - assert.Equal(t, "test-namespace", pod.Namespace) - assert.Equal(t, "test-value", pod.Labels["test-label"]) -} - -func TestPodLabeler_getPod_NotFound(t *testing.T) { - // Setup - no pod in the fake client - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - client := fake.NewClientBuilder().WithScheme(scheme).Build() - - labeler := &PodLabeler{ - Client: client, - PodName: "non-existent-pod", - Namespace: "test-namespace", - } - - // Execute - ctx := context.Background() - pod, err := labeler.getPod(ctx) - require.Error(t, err) - assert.True(t, errors.IsNotFound(err)) - assert.NotNil(t, pod) // getPod always returns a Pod object, even when not found -} - -func TestGetPodName(t *testing.T) { - // Test with environment variable set - t.Setenv("POD_NAME", "test-pod-name") - - podName := GetPodName() - assert.Equal(t, "test-pod-name", podName) -} - -func TestGetPodName_Empty(t *testing.T) { - // Test with environment variable unset - t.Setenv("POD_NAME", "") - - podName := GetPodName() - assert.Empty(t, podName) -} - -func TestGetPodNamespace(t *testing.T) { - // Test with environment variable set - t.Setenv("POD_NAMESPACE", "test-namespace") - - podNamespace := GetPodNamespace() - assert.Equal(t, "test-namespace", podNamespace) -} - -func TestGetPodNamespace_Empty(t *testing.T) { - // Test with environment variable unset - t.Setenv("POD_NAMESPACE", "") - - podNamespace := GetPodNamespace() - assert.Empty(t, podNamespace) -} - -func TestLeaderLabelConstants(t *testing.T) { - // Verify the constants are set correctly - assert.Equal(t, "role", leaderLabelKey) - assert.Equal(t, "leader", leaderLabelValue) -} - -func TestPodLabeler_ConcurrentOperations(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{}, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - ctx := context.Background() - - // Execute concurrent add operations - done := make(chan error, 2) - - go func() { - done <- labeler.addLabel(ctx, logger) - }() - - go func() { - done <- labeler.addLabel(ctx, logger) - }() - - // Wait for both operations to complete - err1 := <-done - err2 := <-done - - // Both should succeed (or at least one should succeed) - assert.True(t, err1 == nil || err2 == nil, "At least one add operation should succeed") - - // Verify final state - updatedPod := &corev1.Pod{} - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - assert.Equal(t, leaderLabelValue, updatedPod.Labels[leaderLabelKey]) -} - -func TestPodLabeler_AddRemoveCycle(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{}, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - ctx := context.Background() - - // Add label - err = labeler.addLabel(ctx, logger) - require.NoError(t, err) - - // Verify label was added - updatedPod := &corev1.Pod{} - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - assert.Equal(t, leaderLabelValue, updatedPod.Labels[leaderLabelKey]) - - // Remove label - err = labeler.removeLabel(ctx, logger) - require.NoError(t, err) - - // Verify label was removed - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - assert.NotContains(t, updatedPod.Labels, leaderLabelKey) - - // Add label again - err = labeler.addLabel(ctx, logger) - require.NoError(t, err) - - // Verify label was added again - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - assert.Equal(t, leaderLabelValue, updatedPod.Labels[leaderLabelKey]) -} - -func TestPodLabeler_WithExistingLabels(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{ - "app": "my-app", - "version": "v1.0.0", - "environment": "production", - }, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - ctx := context.Background() - - // Add leader label - err = labeler.addLabel(ctx, logger) - require.NoError(t, err) - - // Verify all labels are preserved - updatedPod := &corev1.Pod{} - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - expectedLabels := map[string]string{ - "app": "my-app", - "version": "v1.0.0", - "environment": "production", - leaderLabelKey: leaderLabelValue, - } - - assert.Equal(t, expectedLabels, updatedPod.Labels) - - // Remove leader label - err = labeler.removeLabel(ctx, logger) - require.NoError(t, err) - - // Verify only leader label was removed - err = client.Get(ctx, types.NamespacedName{ - Name: "test-pod", - Namespace: "test-namespace", - }, updatedPod) - require.NoError(t, err) - - expectedLabelsAfterRemoval := map[string]string{ - "app": "my-app", - "version": "v1.0.0", - "environment": "production", - } - - assert.Equal(t, expectedLabelsAfterRemoval, updatedPod.Labels) - assert.NotContains(t, updatedPod.Labels, leaderLabelKey) -} - -func TestPodLabeler_ContextCancellation(t *testing.T) { - // Setup - scheme := runtime.NewScheme() - err := corev1.AddToScheme(scheme) - require.NoError(t, err) - - pod := &corev1.Pod{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-pod", - Namespace: "test-namespace", - Labels: map[string]string{}, - }, - } - - client := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(pod). - Build() - - logger := zap.New(zap.UseDevMode(true)) - labeler := &PodLabeler{ - Client: client, - Log: logger, - PodName: "test-pod", - Namespace: "test-namespace", - } - - // Create a context that gets canceled immediately - ctx, cancel := context.WithCancel(context.Background()) - cancel() // Cancel immediately - - // Execute - Start should handle the canceled context gracefully - err = labeler.Start(ctx) - require.NoError(t, err) // Should not error, just exit cleanly -} diff --git a/test/e2e/e2e_test.go b/test/e2e/e2e_test.go index 445dad26..bdb38daa 100644 --- a/test/e2e/e2e_test.go +++ b/test/e2e/e2e_test.go @@ -184,7 +184,7 @@ var _ = Describe("Manager", Ordered, func() { podOutput, err := utils.Run(cmd) g.Expect(err).NotTo(HaveOccurred(), "Failed to retrieve controller-manager pod information") podNames := utils.GetNonEmptyLines(podOutput) - g.Expect(podNames).To(HaveLen(2), "expected 2 controller pods running for HA") + g.Expect(podNames).To(HaveLen(1), "expected exactly 1 controller pod running") controllerPodName = podNames[0] // Use first pod for logging g.Expect(controllerPodName).To(ContainSubstring("controller-manager")) @@ -202,36 +202,8 @@ var _ = Describe("Manager", Ordered, func() { Eventually(verifyControllerUp).Should(Succeed()) }) - It("should identify leader pod with role=leader label", func() { - By("verifying that exactly one pod has the role=leader label") - verifyLeaderLabel := func(g Gomega) { - cmd := exec.Command("kubectl", "get", - "pods", "-l", "control-plane=controller-manager,role=leader", - "-o", "go-template={{ range .items }}"+ - "{{ if not .metadata.deletionTimestamp }}"+ - "{{ .metadata.name }}"+ - "{{ \"\\n\" }}{{ end }}{{ end }}", - "-n", namespace, - ) - - podOutput, err := utils.Run(cmd) - g.Expect(err).NotTo(HaveOccurred(), "Failed to retrieve leader pod information") - leaderPods := utils.GetNonEmptyLines(podOutput) - g.Expect(leaderPods).To(HaveLen(1), "expected exactly 1 leader pod") - - leaderPodName := leaderPods[0] - g.Expect(leaderPodName).To(ContainSubstring("controller-manager")) - - // Update controllerPodName to use the leader pod for subsequent tests - controllerPodName = leaderPodName - - By(fmt.Sprintf("Leader pod identified: %s", leaderPodName)) - } - Eventually(verifyLeaderLabel, 30*time.Second).Should(Succeed()) - }) - - It("should route webhook traffic only to leader pod", func() { - By("verifying webhook service selects only the leader pod") + It("should route webhook traffic to the running controller pod", func() { + By("verifying webhook service selects the running controller pod") verifyWebhookService := func(g Gomega) { // Get webhook service endpoints cmd := exec.Command("kubectl", "get", "endpoints", @@ -252,18 +224,16 @@ var _ = Describe("Manager", Ordered, func() { } } - // Should only have one endpoint (the leader pod) - g.Expect(podNames).To(HaveLen(1), "webhook service should route to exactly 1 pod (leader)") + // Should only have one endpoint in single-pod mode. + g.Expect(podNames).To(HaveLen(1), "webhook service should route to exactly 1 pod") + g.Expect(podNames[0]).To(Equal(controllerPodName), "webhook should route to controller pod") - // Verify it's the leader pod - g.Expect(podNames[0]).To(Equal(controllerPodName), "webhook should route to leader pod") - - By(fmt.Sprintf("✅ Webhook service correctly routes to leader pod: %s", controllerPodName)) + By(fmt.Sprintf("✅ Webhook service correctly routes to pod: %s", controllerPodName)) } Eventually(verifyWebhookService, 30*time.Second).Should(Succeed()) }) - It("should expose both admission and audit leader-only services", func() { + It("should expose both admission and audit services", func() { By("verifying admission webhook service exists") cmd := exec.Command("kubectl", "get", "svc", "gitops-reverser-webhook-service", "-n", namespace) _, err := utils.Run(cmd) @@ -274,7 +244,7 @@ var _ = Describe("Manager", Ordered, func() { _, err = utils.Run(cmd) Expect(err).NotTo(HaveOccurred(), "Audit webhook service should exist") - By("verifying audit service routes only to leader pod") + By("verifying audit service routes to the controller pod") Eventually(func(g Gomega) { endpointsCmd := exec.Command("kubectl", "get", "endpoints", "gitops-reverser-audit-webhook-service", "-n", namespace, @@ -292,8 +262,8 @@ var _ = Describe("Manager", Ordered, func() { } } - g.Expect(podNames).To(HaveLen(1), "audit service should route to exactly 1 pod (leader)") - g.Expect(podNames[0]).To(Equal(controllerPodName), "audit service should route to leader pod") + g.Expect(podNames).To(HaveLen(1), "audit service should route to exactly 1 pod") + g.Expect(podNames[0]).To(Equal(controllerPodName), "audit service should route to controller pod") }, 30*time.Second).Should(Succeed()) }) @@ -346,10 +316,10 @@ var _ = Describe("Manager", Ordered, func() { func(v float64) bool { return v > 0 }, "process metrics should exist") - By("verifying metrics from both controller pods") + By("verifying metrics from the controller pod") podCount, err := queryPrometheus("count(up{job='gitops-reverser-metrics'})") Expect(err).NotTo(HaveOccurred()) - Expect(podCount).To(Equal(2.0), "Should scrape from 2 controller pods") + Expect(podCount).To(Equal(1.0), "Should scrape from 1 controller pod") fmt.Printf("✅ Metrics collection verified from %.0f pods\n", podCount) fmt.Printf("📊 Inspect metrics: %s\n", getPrometheusURL()) @@ -373,24 +343,13 @@ var _ = Describe("Manager", Ordered, func() { func(v float64) bool { return v > baselineEvents }, "webhook events should increment") - By("verifying only leader pod received webhook events") - leaderEvents, err := queryPrometheus( - "sum(gitopsreverser_events_received_total{role='leader'}) or vector(0)", - ) - Expect(err).NotTo(HaveOccurred()) - Expect(leaderEvents).To(BeNumerically(">", baselineEvents), - "Leader should have processed webhook events") - fmt.Printf("✅ Leader processed %.0f events\n", leaderEvents-baselineEvents) - - By("confirming follower pod has no new webhook events") - followerEvents, err := queryPrometheus( - "sum(gitopsreverser_events_received_total{role!='leader'}) or vector(0)", - ) + By("verifying webhook events were received") + currentEvents, err := queryPrometheus("sum(gitopsreverser_events_received_total) or vector(0)") Expect(err).NotTo(HaveOccurred()) - Expect(followerEvents).To(Equal(0.0), - "Follower should not process webhook events") + Expect(currentEvents).To(BeNumerically(">", baselineEvents), "Controller should process webhook events") + fmt.Printf("✅ Controller processed %.0f events\n", currentEvents-baselineEvents) - fmt.Printf("✅ Webhook routing validated - only leader receives events\n") + fmt.Printf("✅ Webhook routing validated\n") fmt.Printf("📊 Inspect metrics: %s\n", getPrometheusURL()) By("cleaning up webhook test resources") @@ -604,7 +563,7 @@ var _ = Describe("Manager", Ordered, func() { By("waiting for controller reconciliation of ConfigMap event") verifyReconciliationLogs := func(g Gomega) { - // Get controller logs from all pods (leader will have the reconciliation logs) + // Get controller logs from all pods (single-pod mode still uses label selector). cmd := exec.Command("kubectl", "logs", "-l", "control-plane=controller-manager", "-n", namespace, "--tail=500", "--prefix=true") output, err := utils.Run(cmd) @@ -612,7 +571,7 @@ var _ = Describe("Manager", Ordered, func() { // Check for git commit operation in logs g.Expect(output).To(ContainSubstring("git commit"), - "Should see git commit operation in logs from leader pod") + "Should see git commit operation in controller logs") } Eventually(verifyReconciliationLogs).Should(Succeed()) From 044f04cea03d7075740ee13f6759acdbf6b3839f Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Wed, 11 Feb 2026 22:21:54 +0000 Subject: [PATCH 05/32] feat: Simplify the services and server configurations --- charts/gitops-reverser/README.md | 45 +-- .../templates/certificates.yaml | 16 +- .../gitops-reverser/templates/configmap.yaml | 4 +- .../gitops-reverser/templates/deployment.yaml | 64 ++-- .../gitops-reverser/templates/services.yaml | 59 +-- .../templates/validating-webhook.yaml | 2 +- charts/gitops-reverser/values.yaml | 85 +++-- cmd/main.go | 338 +++++++++++------- cmd/main_audit_server_test.go | 15 +- .../audit_webhook_service_fixed_ip_patch.yaml | 10 - config/default/kustomization.yaml | 23 +- config/default/manager_metrics_patch.yaml | 4 - config/default/manager_webhook_patch.yaml | 23 +- config/default/metrics_service.yaml | 18 - .../webhook_service_fixed_ip_patch.yaml | 11 - config/manager/manager.yaml | 1 + config/webhook/audit-service.yaml | 16 - config/webhook/kustomization.yaml | 7 +- config/webhook/service.yaml | 14 +- .../webhook/webhook_service_name_patch.yaml | 9 + ...https-server-alignment-and-service-plan.md | 119 +++--- test/e2e/e2e_test.go | 61 ++-- test/e2e/helpers.go | 4 +- test/e2e/kind/README.md | 2 +- test/e2e/kind/audit/webhook-config.yaml | 2 +- test/e2e/kind/start-cluster.sh | 15 +- test/e2e/prometheus/deployment.yaml | 34 +- test/e2e/scripts/setup-prometheus.sh | 4 + 28 files changed, 522 insertions(+), 483 deletions(-) delete mode 100644 config/default/audit_webhook_service_fixed_ip_patch.yaml delete mode 100644 config/default/manager_metrics_patch.yaml delete mode 100644 config/default/metrics_service.yaml delete mode 100644 config/default/webhook_service_fixed_ip_patch.yaml delete mode 100644 config/webhook/audit-service.yaml create mode 100644 config/webhook/webhook_service_name_patch.yaml diff --git a/charts/gitops-reverser/README.md b/charts/gitops-reverser/README.md index 7268557b..ad6b8a1a 100644 --- a/charts/gitops-reverser/README.md +++ b/charts/gitops-reverser/README.md @@ -81,16 +81,11 @@ The chart deploys 1 replica by default: │ Kubernetes API Server │ └──────────────┬──────────────────────────┘ │ - │ webhook requests + │ webhook + audit + metrics requests ▼ ┌──────────────────────────────────────────┐ -│ gitops-reverser-webhook (Service) │ -│ Admission webhook: /process-validating-webhook │ -└──────────────┬───────────────────────────┘ - │ -┌──────────────────────────────────────────┐ -│ gitops-reverser-audit (Service) │ -│ Audit webhook: /audit-webhook/{clusterID} │ +│ gitops-reverser (Service) │ +│ Ports: admission(443), audit(9444), metrics(8080) | └──────────────┬───────────────────────────┘ │ ▼ @@ -103,7 +98,7 @@ The chart deploys 1 replica by default: **Key Features:** - **Single-pod operation**: Minimal moving parts while HA work is deferred -- **Dedicated audit service**: Separates audit ingress from admission webhook traffic +- **Single Service topology**: Admission, audit, and metrics on one Service - **Pod anti-affinity**: Pods spread across different nodes - **Pod disruption budget**: Ensures at least 1 pod available during maintenance @@ -184,13 +179,21 @@ webhook: | `replicaCount` | Number of controller replicas | `1` | | `image.repository` | Container image repository | `ghcr.io/configbutler/gitops-reverser` | | `webhook.validating.failurePolicy` | Webhook failure policy (Ignore/Fail) | `Ignore` | -| `auditIngress.enabled` | Enable dedicated audit HTTPS ingress server | `true` | -| `auditIngress.port` | Dedicated audit container port | `9444` | -| `auditIngress.maxRequestBodyBytes` | Max accepted audit request size | `10485760` | -| `auditIngress.timeouts.read` | Audit server read timeout | `15s` | -| `auditIngress.timeouts.write` | Audit server write timeout | `30s` | -| `auditIngress.timeouts.idle` | Audit server idle timeout | `60s` | -| `auditIngress.tls.secretName` | Secret name for audit TLS cert/key | `-audit-server-tls-cert` | +| `servers.admission.tls.enabled` | Serve admission webhook with TLS (disable only for local/testing) | `true` | +| `servers.audit.enabled` | Enable dedicated audit ingress listener | `true` | +| `servers.audit.port` | Audit container port | `9444` | +| `servers.audit.tls.enabled` | Serve audit ingress with TLS | `true` | +| `servers.audit.maxRequestBodyBytes` | Max accepted audit request size | `10485760` | +| `servers.audit.timeouts.read` | Audit server read timeout | `15s` | +| `servers.audit.timeouts.write` | Audit server write timeout | `30s` | +| `servers.audit.timeouts.idle` | Audit server idle timeout | `60s` | +| `servers.audit.tls.secretName` | Secret name for audit TLS cert/key | `-audit-server-tls-cert` | +| `servers.metrics.bindAddress` | Metrics listener bind address | `:8080` | +| `servers.metrics.tls.enabled` | Serve metrics with TLS | `true` | +| `service.clusterIP` | Optional fixed ClusterIP for single controller Service | `""` | +| `service.ports.admission` | Service port for admission webhook | `443` | +| `service.ports.audit` | Service port for audit ingress | `9444` | +| `service.ports.metrics` | Service port for metrics | `8080` | | `certificates.certManager.enabled` | Use cert-manager for certificates | `true` | | `podDisruptionBudget.enabled` | Enable PodDisruptionBudget | `true` | | `resources.requests.cpu` | CPU request | `10m` | @@ -204,7 +207,7 @@ See [`values.yaml`](values.yaml) for complete configuration options. Source clusters must target: -`https://:443/audit-webhook/` +`https://:9444/audit-webhook/` The bare path `/audit-webhook` is rejected. Use a non-empty cluster ID segment. @@ -258,8 +261,8 @@ kubectl logs -n gitops-reverser-system -l app.kubernetes.io/name=gitops-reverser ### Access Metrics ```bash -kubectl port-forward -n gitops-reverser-system svc/gitops-reverser-metrics-service 8080:8080 -curl http://localhost:8080/metrics +kubectl port-forward -n gitops-reverser-system svc/gitops-reverser 8080:8080 +curl -k https://localhost:8080/metrics ``` ## Upgrading @@ -285,7 +288,7 @@ helm upgrade gitops-reverser \ If upgrading from earlier chart versions: -- Default replicas changed from 1 to 2 (adjust `replicaCount` if needed) +- Single-replica is the default during the current simplified topology phase - Leader election now enabled by default (required for HA) - Health probe port changed to 8081 - Certificate secret names are auto-generated @@ -381,7 +384,7 @@ webhook: Create certificate secret manually: ```bash -kubectl create secret tls webhook-server-cert \ +kubectl create secret tls gitops-reverser-webhook-server-tls-cert \ --cert=path/to/tls.crt \ --key=path/to/tls.key \ -n gitops-reverser-system diff --git a/charts/gitops-reverser/templates/certificates.yaml b/charts/gitops-reverser/templates/certificates.yaml index 3c147d96..d36c2681 100644 --- a/charts/gitops-reverser/templates/certificates.yaml +++ b/charts/gitops-reverser/templates/certificates.yaml @@ -21,8 +21,8 @@ metadata: {{- include "gitops-reverser.labels" . | nindent 4 }} spec: dnsNames: - - {{ include "gitops-reverser.fullname" . }}-webhook.{{ .Release.Namespace }}.svc - - {{ include "gitops-reverser.fullname" . }}-webhook.{{ .Release.Namespace }}.svc.cluster.local + - {{ include "gitops-reverser.fullname" . }}.{{ .Release.Namespace }}.svc + - {{ include "gitops-reverser.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local issuerRef: kind: {{ .Values.certificates.certManager.issuer.kind }} name: {{ .Values.certificates.certManager.issuer.name }} @@ -33,7 +33,7 @@ spec: - server auth privateKey: rotationPolicy: Always -{{- if .Values.auditIngress.enabled }} +{{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} --- apiVersion: cert-manager.io/v1 kind: Certificate @@ -43,17 +43,17 @@ metadata: labels: {{- include "gitops-reverser.labels" . | nindent 4 }} spec: -{{- if .Values.auditIngress.clusterIP }} +{{- if .Values.service.clusterIP }} ipAddresses: - - {{ .Values.auditIngress.clusterIP }} + - {{ .Values.service.clusterIP }} {{- end }} dnsNames: - - {{ include "gitops-reverser.fullname" . }}-audit.{{ .Release.Namespace }}.svc - - {{ include "gitops-reverser.fullname" . }}-audit.{{ .Release.Namespace }}.svc.cluster.local + - {{ include "gitops-reverser.fullname" . }}.{{ .Release.Namespace }}.svc + - {{ include "gitops-reverser.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local issuerRef: kind: {{ .Values.certificates.certManager.issuer.kind }} name: {{ .Values.certificates.certManager.issuer.name }} - secretName: {{ .Values.auditIngress.tls.secretName | default (printf "%s-audit-server-tls-cert" (include "gitops-reverser.fullname" .)) }} + secretName: {{ .Values.servers.audit.tls.secretName | default (printf "%s-audit-server-tls-cert" (include "gitops-reverser.fullname" .)) }} usages: - digital signature - key encipherment diff --git a/charts/gitops-reverser/templates/configmap.yaml b/charts/gitops-reverser/templates/configmap.yaml index d2b167b1..619a24b9 100644 --- a/charts/gitops-reverser/templates/configmap.yaml +++ b/charts/gitops-reverser/templates/configmap.yaml @@ -12,9 +12,9 @@ data: health: healthProbeBindAddress: {{ .Values.controllerManager.healthProbe.bindAddress }} metrics: - bindAddress: {{ .Values.controllerManager.metrics.bindAddress }} + bindAddress: {{ .Values.servers.metrics.bindAddress }} webhook: - port: {{ .Values.webhook.server.port }} + port: {{ .Values.servers.admission.port }} {{- if .Values.logging }} logging: level: {{ .Values.logging.level | default "info" }} diff --git a/charts/gitops-reverser/templates/deployment.yaml b/charts/gitops-reverser/templates/deployment.yaml index 56be5c60..1a311a31 100644 --- a/charts/gitops-reverser/templates/deployment.yaml +++ b/charts/gitops-reverser/templates/deployment.yaml @@ -40,22 +40,32 @@ spec: imagePullPolicy: {{ .Values.image.pullPolicy }} args: - --health-probe-bind-address=:8081 - - --metrics-bind-address=:8080 - - --metrics-secure=false - - --webhook-cert-path={{ .Values.webhook.server.certPath }} - - --webhook-cert-name={{ .Values.webhook.server.certName }} - - --webhook-cert-key={{ .Values.webhook.server.certKey }} - - --audit-ingress-enabled={{ .Values.auditIngress.enabled }} - - --audit-listen-address=0.0.0.0 - - --audit-port={{ .Values.auditIngress.port }} - - --audit-max-request-body-bytes={{ .Values.auditIngress.maxRequestBodyBytes }} - - --audit-read-timeout={{ .Values.auditIngress.timeouts.read }} - - --audit-write-timeout={{ .Values.auditIngress.timeouts.write }} - - --audit-idle-timeout={{ .Values.auditIngress.timeouts.idle }} - {{- if .Values.auditIngress.enabled }} - - --audit-cert-path={{ .Values.auditIngress.tls.certPath }} - - --audit-cert-name={{ .Values.auditIngress.tls.certName }} - - --audit-cert-key={{ .Values.auditIngress.tls.certKey }} + - --metrics-bind-address={{ .Values.servers.metrics.bindAddress }} + {{- if not .Values.servers.metrics.tls.enabled }} + - --metrics-insecure + {{- end }} + - --metrics-cert-path={{ .Values.servers.metrics.tls.certPath }} + - --metrics-cert-name={{ .Values.servers.metrics.tls.certName }} + - --metrics-cert-key={{ .Values.servers.metrics.tls.certKey }} + {{- if not .Values.servers.admission.tls.enabled }} + - --webhook-insecure + {{- end }} + - --webhook-cert-path={{ .Values.servers.admission.tls.certPath }} + - --webhook-cert-name={{ .Values.servers.admission.tls.certName }} + - --webhook-cert-key={{ .Values.servers.admission.tls.certKey }} + {{- if and .Values.servers.audit.enabled (not .Values.servers.audit.tls.enabled) }} + - --audit-insecure + {{- end }} + - --audit-listen-address={{ .Values.servers.audit.listenAddress }} + - --audit-port={{ .Values.servers.audit.port }} + - --audit-max-request-body-bytes={{ .Values.servers.audit.maxRequestBodyBytes }} + - --audit-read-timeout={{ .Values.servers.audit.timeouts.read }} + - --audit-write-timeout={{ .Values.servers.audit.timeouts.write }} + - --audit-idle-timeout={{ .Values.servers.audit.timeouts.idle }} + {{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} + - --audit-cert-path={{ .Values.servers.audit.tls.certPath }} + - --audit-cert-name={{ .Values.servers.audit.tls.certName }} + - --audit-cert-key={{ .Values.servers.audit.tls.certKey }} {{- end }} {{- if .Values.logging.level }} - --zap-log-level={{ .Values.logging.level }} @@ -74,15 +84,15 @@ spec: {{- end }} ports: - name: webhook-server - containerPort: {{ .Values.webhook.server.port }} + containerPort: {{ .Values.servers.admission.port }} protocol: TCP - {{- if .Values.auditIngress.enabled }} + {{- if .Values.servers.audit.enabled }} - name: audit-server - containerPort: {{ .Values.auditIngress.port }} + containerPort: {{ .Values.servers.audit.port }} protocol: TCP {{- end }} - name: metrics - containerPort: {{ .Values.controllerManager.metrics.port }} + containerPort: {{ .Values.servers.metrics.port }} protocol: TCP env: - name: POD_NAME @@ -117,12 +127,14 @@ spec: {{- end }} - name: tmp-dir mountPath: /tmp + {{- if .Values.servers.admission.tls.enabled }} - name: cert - mountPath: {{ .Values.webhook.server.certPath }} + mountPath: {{ .Values.servers.admission.tls.certPath }} readOnly: true - {{- if .Values.auditIngress.enabled }} + {{- end }} + {{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} - name: audit-cert - mountPath: {{ .Values.auditIngress.tls.certPath }} + mountPath: {{ .Values.servers.audit.tls.certPath }} readOnly: true {{- end }} {{- with .Values.volumeMounts }} @@ -135,14 +147,16 @@ spec: {{- end }} - name: tmp-dir emptyDir: {} + {{- if .Values.servers.admission.tls.enabled }} - name: cert secret: secretName: {{ include "gitops-reverser.fullname" . }}-webhook-server-tls-cert defaultMode: 420 - {{- if .Values.auditIngress.enabled }} + {{- end }} + {{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} - name: audit-cert secret: - secretName: {{ .Values.auditIngress.tls.secretName | default (printf "%s-audit-server-tls-cert" (include "gitops-reverser.fullname" .)) }} + secretName: {{ .Values.servers.audit.tls.secretName | default (printf "%s-audit-server-tls-cert" (include "gitops-reverser.fullname" .)) }} defaultMode: 420 {{- end }} {{- with .Values.volumes }} diff --git a/charts/gitops-reverser/templates/services.yaml b/charts/gitops-reverser/templates/services.yaml index 25eb7b65..ebedc56f 100644 --- a/charts/gitops-reverser/templates/services.yaml +++ b/charts/gitops-reverser/templates/services.yaml @@ -1,61 +1,32 @@ -# We don't have a 'normal' API yet (a very simple one is running for our health probing): so there is no need yet for a generic service ---- apiVersion: v1 kind: Service metadata: - name: {{ include "gitops-reverser.fullname" . }}-webhook + name: {{ include "gitops-reverser.fullname" . }} namespace: {{ .Release.Namespace }} labels: {{- include "gitops-reverser.labels" . | nindent 4 }} - app.kubernetes.io/component: webhook + app.kubernetes.io/component: controller + prometheus.io/scrape: "true" + prometheus.io/port: "{{ .Values.service.ports.metrics }}" spec: type: ClusterIP + {{- if .Values.service.clusterIP }} + clusterIP: {{ .Values.service.clusterIP }} + {{- end }} ports: - name: webhook-server - port: 443 - targetPort: {{ .Values.webhook.server.port }} + port: {{ .Values.service.ports.admission }} + targetPort: {{ .Values.servers.admission.port }} protocol: TCP - selector: - {{- include "gitops-reverser.selectorLabels" . | nindent 4 }} ---- -{{- if .Values.auditIngress.enabled }} -apiVersion: v1 -kind: Service -metadata: - name: {{ include "gitops-reverser.fullname" . }}-audit - namespace: {{ .Release.Namespace }} - labels: - {{- include "gitops-reverser.labels" . | nindent 4 }} - app.kubernetes.io/component: audit-ingress -spec: - type: ClusterIP - {{- if .Values.auditIngress.clusterIP }} - clusterIP: {{ .Values.auditIngress.clusterIP }} - {{- end }} - ports: + {{- if .Values.servers.audit.enabled }} - name: audit-server - port: 443 - targetPort: {{ .Values.auditIngress.port }} + port: {{ .Values.service.ports.audit }} + targetPort: {{ .Values.servers.audit.port }} protocol: TCP - selector: - {{- include "gitops-reverser.selectorLabels" . | nindent 4 }} ---- -{{- end }} -apiVersion: v1 -kind: Service -metadata: - name: {{ include "gitops-reverser.fullname" . }}-metrics - namespace: {{ .Release.Namespace }} - labels: - {{- include "gitops-reverser.labels" . | nindent 4 }} - prometheus.io/scrape: "true" - prometheus.io/port: "{{ .Values.controllerManager.metrics.port }}" -spec: - type: ClusterIP - ports: + {{- end }} - name: metrics - port: {{ .Values.controllerManager.metrics.port }} - targetPort: {{ .Values.controllerManager.metrics.port }} + port: {{ .Values.service.ports.metrics }} + targetPort: {{ .Values.servers.metrics.port }} protocol: TCP selector: {{- include "gitops-reverser.selectorLabels" . | nindent 4 }} diff --git a/charts/gitops-reverser/templates/validating-webhook.yaml b/charts/gitops-reverser/templates/validating-webhook.yaml index ae4807a1..b6253e1d 100644 --- a/charts/gitops-reverser/templates/validating-webhook.yaml +++ b/charts/gitops-reverser/templates/validating-webhook.yaml @@ -14,7 +14,7 @@ webhooks: - v1 clientConfig: service: - name: {{ include "gitops-reverser.fullname" . }}-webhook + name: {{ include "gitops-reverser.fullname" . }} namespace: {{ .Release.Namespace }} path: /process-validating-webhook {{- if not .Values.certificates.certManager.enabled }} diff --git a/charts/gitops-reverser/values.yaml b/charts/gitops-reverser/values.yaml index 6f8998c3..3a13e677 100644 --- a/charts/gitops-reverser/values.yaml +++ b/charts/gitops-reverser/values.yaml @@ -47,23 +47,51 @@ controllerManager: # Health probe configuration healthProbe: bindAddress: :8081 - # Metrics configuration - metrics: - port: 8080 - bindAddress: 127.0.0.1:8080 # Enable HTTP/2 (disabled by default for security) enableHTTP2: false -# Webhook configuration -webhook: - enabled: true - # Webhook server configuration - server: +# HTTPS servers +servers: + admission: + enabled: true port: 9443 - certPath: "/tmp/k8s-webhook-server/serving-certs" - certName: "tls.crt" - certKey: "tls.key" + tls: + # Controls webhook TLS wiring in the controller process. + # Keep enabled for normal Kubernetes webhook operation. + enabled: true + certPath: "/tmp/k8s-webhook-server/serving-certs" + certName: "tls.crt" + certKey: "tls.key" + audit: + enabled: true + listenAddress: 0.0.0.0 + port: 9444 + tls: + # Serve audit ingress over HTTPS when true, HTTP when false. + enabled: true + certPath: "/tmp/k8s-audit-webhook-server/serving-certs" + certName: "tls.crt" + certKey: "tls.key" + secretName: "" + timeouts: + read: "15s" + write: "30s" + idle: "60s" + maxRequestBodyBytes: 10485760 + + metrics: + bindAddress: :8080 + port: 8080 + tls: + # Serve metrics over HTTPS when true, HTTP when false. + enabled: true + certPath: "" + certName: "tls.crt" + certKey: "tls.key" + +# Webhook behavior +webhook: audit: # Set to true if you want to write events to /var/run/audit-dumps debugDumps: false @@ -102,24 +130,6 @@ certificates: kind: Issuer create: true -# Dedicated audit ingress server configuration -auditIngress: - enabled: true - # Dedicated audit HTTPS listener port in the controller container - port: 9444 - # Optional fixed ClusterIP (useful for Kind/bootstrap environments before DNS is ready) - clusterIP: "" - tls: - certPath: "/tmp/k8s-audit-webhook-server/serving-certs" - certName: "tls.crt" - certKey: "tls.key" - secretName: "" - timeouts: - read: "15s" - write: "30s" - idle: "60s" - maxRequestBodyBytes: 10485760 - # RBAC configuration rbac: create: true @@ -167,8 +177,19 @@ monitoring: path: /metrics # Port name on the metrics Service (see templates/services.yaml) port: metrics - # Plain HTTP by default (--metrics-secure=false) - scheme: http + # Must match the effective metrics transport: + # - https when servers.metrics.tls.enabled=true + # - http when servers.metrics.tls.enabled=false + scheme: https + +# Service exposure +service: + # Optional fixed ClusterIP (useful for Kind/bootstrap environments before DNS is ready) + clusterIP: "" + ports: + admission: 443 + audit: 9444 + metrics: 8080 # Logging configuration logging: diff --git a/cmd/main.go b/cmd/main.go index 33fb4619..6b2315c2 100644 --- a/cmd/main.go +++ b/cmd/main.go @@ -93,7 +93,9 @@ func main() { // Log metrics configuration for debugging setupLog.Info("Metrics configuration", "metrics-bind-address", cfg.metricsAddr, - "metrics-secure", cfg.secureMetrics) + "metrics-insecure", cfg.metricsInsecure, + "webhook-insecure", cfg.webhookInsecure, + "audit-insecure", cfg.auditInsecure) // Initialize metrics setupCtx := ctrl.SetupSignalHandler() @@ -105,10 +107,11 @@ func main() { // Servers and cert watchers webhookServer, webhookCertWatcher := initWebhookServer( + !cfg.webhookInsecure, cfg.webhookCertPath, cfg.webhookCertName, cfg.webhookCertKey, tlsOpts, ) metricsServerOptions, metricsCertWatcher := buildMetricsServerOptions( - cfg.metricsAddr, cfg.secureMetrics, + cfg.metricsAddr, !cfg.metricsInsecure, cfg.metricsCertPath, cfg.metricsCertName, cfg.metricsCertKey, tlsOpts, ) @@ -207,24 +210,21 @@ func main() { fatalIfErr(err, "unable to create audit handler") var auditCertWatcher *certwatcher.CertWatcher - if cfg.auditIngressEnabled { - auditRunnable, watcher, initErr := initAuditServerRunnable(cfg, tlsOpts, auditHandler) - fatalIfErr(initErr, "unable to initialize audit ingress server") - auditCertWatcher = watcher - fatalIfErr(mgr.Add(auditRunnable), "unable to add audit ingress server runnable") - - if cfg.auditDumpPath != "" { - setupLog.Info("Audit ingress server configured with file dumping", - "http-path", "/audit-webhook/{clusterID}", - "dump-path", cfg.auditDumpPath, - "address", buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort)) - } else { - setupLog.Info("Audit ingress server configured", - "http-path", "/audit-webhook/{clusterID}", - "address", buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort)) - } + + auditRunnable, watcher, initErr := initAuditServerRunnable(cfg, tlsOpts, auditHandler) + fatalIfErr(initErr, "unable to initialize audit ingress server") + auditCertWatcher = watcher + fatalIfErr(mgr.Add(auditRunnable), "unable to add audit ingress server runnable") + + if cfg.auditDumpPath != "" { + setupLog.Info("Audit ingress server configured with file dumping", + "http-path", "/audit-webhook/{clusterID}", + "dump-path", cfg.auditDumpPath, + "address", buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort)) } else { - setupLog.Info("Audit ingress server disabled by flag", "flag", "--audit-ingress-enabled=false") + setupLog.Info("Audit ingress server configured", + "http-path", "/audit-webhook/{clusterID}", + "address", buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort)) } // NOTE: Old git.Worker has been replaced by WorkerManager + BranchWorker architecture @@ -275,15 +275,16 @@ type appConfig struct { webhookCertName string webhookCertKey string probeAddr string - secureMetrics bool + metricsInsecure bool + webhookInsecure bool enableHTTP2 bool auditDumpPath string - auditIngressEnabled bool auditListenAddress string auditPort int auditCertPath string auditCertName string auditCertKey string + auditInsecure bool auditMaxRequestBodyBytes int64 auditReadTimeout time.Duration auditWriteTimeout time.Duration @@ -307,40 +308,29 @@ func parseFlagsWithArgs(fs *flag.FlagSet, args []string) (appConfig, error) { fs.StringVar(&cfg.metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+ "Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.") fs.StringVar(&cfg.probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") - fs.BoolVar(&cfg.secureMetrics, "metrics-secure", true, - "If set, the metrics endpoint is served securely via HTTPS. Use --metrics-secure=false to use HTTP instead.") - fs.StringVar( - &cfg.webhookCertPath, - "webhook-cert-path", - "", - "The directory that contains the webhook certificate.", - ) - fs.StringVar(&cfg.webhookCertName, "webhook-cert-name", "tls.crt", "The name of the webhook certificate file.") - fs.StringVar(&cfg.webhookCertKey, "webhook-cert-key", "tls.key", "The name of the webhook key file.") - fs.StringVar(&cfg.metricsCertPath, "metrics-cert-path", "", - "The directory that contains the metrics server certificate.") - fs.StringVar( + fs.BoolVar(&cfg.metricsInsecure, "metrics-insecure", false, + "If set, the metrics endpoint is served via HTTP instead of HTTPS.") + bindServerCertFlags(fs, "webhook", "webhook", &cfg.webhookCertPath, &cfg.webhookCertName, &cfg.webhookCertKey) + bindServerCertFlags( + fs, + "metrics", + "metrics server", + &cfg.metricsCertPath, &cfg.metricsCertName, - "metrics-cert-name", - "tls.crt", - "The name of the metrics server certificate file.", + &cfg.metricsCertKey, ) - fs.StringVar(&cfg.metricsCertKey, "metrics-cert-key", "tls.key", "The name of the metrics server key file.") + fs.BoolVar(&cfg.webhookInsecure, "webhook-insecure", false, + "If set, webhook server certificate watching and TLS wiring are disabled for local test/play usage.") fs.BoolVar(&cfg.enableHTTP2, "enable-http2", false, "If set, HTTP/2 will be enabled for the metrics and webhook servers") fs.StringVar(&cfg.auditDumpPath, "audit-dump-path", "", "Directory to write audit events for debugging. If empty, audit event file dumping is disabled.") - fs.BoolVar(&cfg.auditIngressEnabled, "audit-ingress-enabled", true, - "Enable the dedicated HTTPS audit ingress server.") fs.StringVar(&cfg.auditListenAddress, "audit-listen-address", "0.0.0.0", "IP address for the dedicated audit ingress HTTPS server.") fs.IntVar(&cfg.auditPort, "audit-port", defaultAuditPort, "Port for the dedicated audit ingress HTTPS server.") - fs.StringVar(&cfg.auditCertPath, "audit-cert-path", "", - "The directory that contains the audit ingress TLS certificate and key.") - fs.StringVar(&cfg.auditCertName, "audit-cert-name", "tls.crt", - "The name of the audit ingress TLS certificate file.") - fs.StringVar(&cfg.auditCertKey, "audit-cert-key", "tls.key", - "The name of the audit ingress TLS key file.") + bindServerCertFlags(fs, "audit", "audit ingress TLS", &cfg.auditCertPath, &cfg.auditCertName, &cfg.auditCertKey) + fs.BoolVar(&cfg.auditInsecure, "audit-insecure", false, + "If set, the audit ingress endpoint is served via HTTP instead of HTTPS.") fs.Int64Var(&cfg.auditMaxRequestBodyBytes, "audit-max-request-body-bytes", defaultAuditMaxBodyBytes, "Maximum request body size in bytes accepted by the audit ingress handler.") fs.DurationVar(&cfg.auditReadTimeout, "audit-read-timeout", defaultAuditReadTimeout, @@ -360,21 +350,29 @@ func parseFlagsWithArgs(fs *flag.FlagSet, args []string) (appConfig, error) { if err := fs.Parse(args); err != nil { return appConfig{}, err } - if cfg.auditPort <= 0 { - return appConfig{}, fmt.Errorf("audit-port must be > 0, got %d", cfg.auditPort) - } - if cfg.auditMaxRequestBodyBytes <= 0 { - return appConfig{}, fmt.Errorf("audit-max-request-body-bytes must be > 0, got %d", cfg.auditMaxRequestBodyBytes) - } - if cfg.auditReadTimeout <= 0 { - return appConfig{}, fmt.Errorf("audit-read-timeout must be > 0, got %s", cfg.auditReadTimeout) - } - if cfg.auditWriteTimeout <= 0 { - return appConfig{}, fmt.Errorf("audit-write-timeout must be > 0, got %s", cfg.auditWriteTimeout) - } - if cfg.auditIdleTimeout <= 0 { - return appConfig{}, fmt.Errorf("audit-idle-timeout must be > 0, got %s", cfg.auditIdleTimeout) + applyAuditCertFallbacks(&cfg) + if err := validateAuditConfig(cfg); err != nil { + return appConfig{}, err } + + return cfg, nil +} + +func bindServerCertFlags( + fs *flag.FlagSet, + prefix string, + component string, + certPath, certName, certKey *string, +) { + fs.StringVar(certPath, fmt.Sprintf("%s-cert-path", prefix), "", + fmt.Sprintf("The directory that contains the %s certificate.", component)) + fs.StringVar(certName, fmt.Sprintf("%s-cert-name", prefix), "tls.crt", + fmt.Sprintf("The name of the %s certificate file.", component)) + fs.StringVar(certKey, fmt.Sprintf("%s-cert-key", prefix), "tls.key", + fmt.Sprintf("The name of the %s key file.", component)) +} + +func applyAuditCertFallbacks(cfg *appConfig) { if cfg.auditCertPath == "" { cfg.auditCertPath = cfg.webhookCertPath } @@ -384,8 +382,25 @@ func parseFlagsWithArgs(fs *flag.FlagSet, args []string) (appConfig, error) { if cfg.auditCertKey == "" { cfg.auditCertKey = cfg.webhookCertKey } +} - return cfg, nil +func validateAuditConfig(cfg appConfig) error { + if cfg.auditPort <= 0 { + return fmt.Errorf("audit-port must be > 0, got %d", cfg.auditPort) + } + if cfg.auditMaxRequestBodyBytes <= 0 { + return fmt.Errorf("audit-max-request-body-bytes must be > 0, got %d", cfg.auditMaxRequestBodyBytes) + } + if cfg.auditReadTimeout <= 0 { + return fmt.Errorf("audit-read-timeout must be > 0, got %s", cfg.auditReadTimeout) + } + if cfg.auditWriteTimeout <= 0 { + return fmt.Errorf("audit-write-timeout must be > 0, got %s", cfg.auditWriteTimeout) + } + if cfg.auditIdleTimeout <= 0 { + return fmt.Errorf("audit-idle-timeout must be > 0, got %s", cfg.auditIdleTimeout) + } + return nil } // fatalIfErr logs and exits the process if err is not nil. @@ -417,27 +432,16 @@ func buildTLSOptions(enableHTTP2 bool) []func(*tls.Config) { // initWebhookServer initializes the webhook server and, if configured, a cert watcher. func initWebhookServer( + tlsEnabled bool, certPath, certName, certKey string, baseTLS []func(*tls.Config), ) (webhook.Server, *certwatcher.CertWatcher) { - webhookTLSOpts := append([]func(*tls.Config){}, baseTLS...) - var webhookCertWatcher *certwatcher.CertWatcher - - if len(certPath) > 0 { - setupLog.Info("Initializing webhook certificate watcher using provided certificates", - "webhook-cert-path", certPath, //nolint:lll // Structured log with many fields - "webhook-cert-name", certName, "webhook-cert-key", certKey) - - var err error - webhookCertWatcher, err = certwatcher.New( - filepath.Join(certPath, certName), - filepath.Join(certPath, certKey), - ) - fatalIfErr(err, "Failed to initialize webhook certificate watcher") - - webhookTLSOpts = append(webhookTLSOpts, func(config *tls.Config) { - config.GetCertificate = webhookCertWatcher.GetCertificate - }) + webhookTLSOpts, webhookCertWatcher, err := buildTLSRuntime( + tlsEnabled, false, "webhook", certPath, certName, certKey, baseTLS, + ) + fatalIfErr(err, "failed to initialize webhook TLS runtime") + if !tlsEnabled { + setupLog.Info("Webhook insecure mode enabled; skipping webhook certificate watcher wiring") } server := webhook.NewServer(webhook.Options{TLSOpts: webhookTLSOpts}) @@ -451,10 +455,15 @@ func buildMetricsServerOptions( certPath, certName, certKey string, baseTLS []func(*tls.Config), ) (metricsserver.Options, *certwatcher.CertWatcher) { + tlsOpts, metricsCertWatcher, err := buildTLSRuntime( + secureMetrics, false, "metrics", certPath, certName, certKey, baseTLS, + ) + fatalIfErr(err, "failed to initialize metrics TLS runtime") + opts := metricsserver.Options{ BindAddress: metricsAddr, SecureServing: secureMetrics, - TLSOpts: baseTLS, + TLSOpts: tlsOpts, } if secureMetrics { @@ -465,29 +474,18 @@ func buildMetricsServerOptions( opts.FilterProvider = filters.WithAuthenticationAndAuthorization } - var metricsCertWatcher *certwatcher.CertWatcher - if len(certPath) > 0 { - setupLog.Info("Initializing metrics certificate watcher using provided certificates", - "metrics-cert-path", certPath, //nolint:lll // Structured log with many fields - "metrics-cert-name", certName, "metrics-cert-key", certKey) - - var err error - metricsCertWatcher, err = certwatcher.New( - filepath.Join(certPath, certName), - filepath.Join(certPath, certKey), - ) - fatalIfErr(err, "to initialize metrics certificate watcher", "error", err) - - opts.TLSOpts = append(opts.TLSOpts, func(config *tls.Config) { - config.GetCertificate = metricsCertWatcher.GetCertificate - }) - } - return opts, metricsCertWatcher } type auditServerRunnable struct { - server *http.Server + server *http.Server + tlsEnabled bool +} + +type serverTimeouts struct { + read time.Duration + write time.Duration + idle time.Duration } func (r *auditServerRunnable) Start(ctx context.Context) error { @@ -504,7 +502,12 @@ func (r *auditServerRunnable) Start(ctx context.Context) error { } }() - err := r.server.ListenAndServeTLS("", "") + var err error + if r.tlsEnabled { + err = r.server.ListenAndServeTLS("", "") + } else { + err = r.server.ListenAndServe() + } <-shutdownDone if errors.Is(err, http.ErrServerClosed) { return nil @@ -517,41 +520,34 @@ func initAuditServerRunnable( baseTLS []func(*tls.Config), handler http.Handler, ) (*auditServerRunnable, *certwatcher.CertWatcher, error) { - if strings.TrimSpace(cfg.auditCertPath) == "" { - return nil, nil, errors.New("audit-cert-path is required when audit ingress is enabled") - } - - certWatcher, err := certwatcher.New( - filepath.Join(cfg.auditCertPath, cfg.auditCertName), - filepath.Join(cfg.auditCertPath, cfg.auditCertKey), + tlsEnabled := !cfg.auditInsecure + tlsOpts, certWatcher, err := buildTLSRuntime( + tlsEnabled, true, "audit ingress", cfg.auditCertPath, cfg.auditCertName, cfg.auditCertKey, baseTLS, ) if err != nil { - return nil, nil, fmt.Errorf("failed to initialize audit ingress certificate watcher: %w", err) + return nil, nil, err } - tlsOpts := append([]func(*tls.Config){}, baseTLS...) - tlsOpts = append(tlsOpts, func(config *tls.Config) { - config.GetCertificate = certWatcher.GetCertificate - }) - - serverTLS := &tls.Config{ - MinVersion: tls.VersionTLS12, - } - for _, opt := range tlsOpts { - opt(serverTLS) + var serverTLS *tls.Config + if tlsEnabled { + serverTLS = buildServerTLSConfig(tlsOpts) + } else { + setupLog.Info("Audit ingress TLS disabled; serving plain HTTP for audit ingress") } mux := buildAuditServeMux(handler) - server := &http.Server{ - Addr: buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort), - Handler: mux, - TLSConfig: serverTLS, - ReadTimeout: cfg.auditReadTimeout, - WriteTimeout: cfg.auditWriteTimeout, - IdleTimeout: cfg.auditIdleTimeout, - } + server := buildHTTPServer( + buildAuditServerAddress(cfg.auditListenAddress, cfg.auditPort), + mux, + serverTLS, + serverTimeouts{ + read: cfg.auditReadTimeout, + write: cfg.auditWriteTimeout, + idle: cfg.auditIdleTimeout, + }, + ) - return &auditServerRunnable{server: server}, certWatcher, nil + return &auditServerRunnable{server: server, tlsEnabled: tlsEnabled}, certWatcher, nil } func buildAuditServeMux(handler http.Handler) *http.ServeMux { @@ -568,6 +564,68 @@ func buildAuditServerAddress(listenAddress string, port int) string { return net.JoinHostPort(listenAddress, strconv.Itoa(port)) } +func buildServerTLSConfig(tlsOpts []func(*tls.Config)) *tls.Config { + serverTLS := &tls.Config{MinVersion: tls.VersionTLS12} + for _, opt := range tlsOpts { + opt(serverTLS) + } + return serverTLS +} + +func buildTLSRuntime( + tlsEnabled bool, + requireCert bool, + component string, + certPath, certName, certKey string, + baseTLS []func(*tls.Config), +) ([]func(*tls.Config), *certwatcher.CertWatcher, error) { + tlsOpts := append([]func(*tls.Config){}, baseTLS...) + if !tlsEnabled { + return tlsOpts, nil, nil + } + + if strings.TrimSpace(certPath) == "" { + if requireCert { + return nil, nil, fmt.Errorf("%s-cert-path is required when %s TLS is enabled", component, component) + } + return tlsOpts, nil, nil + } + + setupLog.Info("Initializing certificate watcher using provided certificates", + "component", component, + "cert-path", certPath, + "cert-name", certName, + "cert-key", certKey) + + certWatcher, err := newCertWatcher(certPath, certName, certKey) + if err != nil { + return nil, nil, fmt.Errorf("failed to initialize %s certificate watcher: %w", component, err) + } + + tlsOpts = append(tlsOpts, func(config *tls.Config) { + config.GetCertificate = certWatcher.GetCertificate + }) + return tlsOpts, certWatcher, nil +} + +func buildHTTPServer(addr string, handler http.Handler, tlsConfig *tls.Config, timeouts serverTimeouts) *http.Server { + return &http.Server{ + Addr: addr, + Handler: handler, + TLSConfig: tlsConfig, + ReadTimeout: timeouts.read, + WriteTimeout: timeouts.write, + IdleTimeout: timeouts.idle, + } +} + +func newCertWatcher(certPath, certName, certKey string) (*certwatcher.CertWatcher, error) { + return certwatcher.New( + filepath.Join(certPath, certName), + filepath.Join(certPath, certKey), + ) +} + // newManager creates a new controller-runtime Manager with common options. func newManager( metricsOptions metricsserver.Options, @@ -592,17 +650,21 @@ func addCertWatchersToManager( mgr ctrl.Manager, metricsCertWatcher, webhookCertWatcher, auditCertWatcher *certwatcher.CertWatcher, ) { - if metricsCertWatcher != nil { - setupLog.Info("Adding metrics certificate watcher to manager") - fatalIfErr(mgr.Add(metricsCertWatcher), "unable to add metrics certificate watcher to manager") - } - if webhookCertWatcher != nil { - setupLog.Info("Adding webhook certificate watcher to manager") - fatalIfErr(mgr.Add(webhookCertWatcher), "unable to add webhook certificate watcher to manager") - } - if auditCertWatcher != nil { - setupLog.Info("Adding audit ingress certificate watcher to manager") - fatalIfErr(mgr.Add(auditCertWatcher), "unable to add audit ingress certificate watcher to manager") + watchers := []struct { + component string + watcher *certwatcher.CertWatcher + }{ + {component: "metrics", watcher: metricsCertWatcher}, + {component: "webhook", watcher: webhookCertWatcher}, + {component: "audit ingress", watcher: auditCertWatcher}, + } + + for _, item := range watchers { + if item.watcher == nil { + continue + } + setupLog.Info("Adding certificate watcher to manager", "component", item.component) + fatalIfErr(mgr.Add(item.watcher), "unable to add certificate watcher to manager", "component", item.component) } } diff --git a/cmd/main_audit_server_test.go b/cmd/main_audit_server_test.go index e1a5915f..a644eab9 100644 --- a/cmd/main_audit_server_test.go +++ b/cmd/main_audit_server_test.go @@ -33,7 +33,9 @@ func TestParseFlagsWithArgs_Defaults(t *testing.T) { cfg, err := parseFlagsWithArgs(fs, []string{}) require.NoError(t, err) - assert.True(t, cfg.auditIngressEnabled) + assert.False(t, cfg.webhookInsecure) + assert.False(t, cfg.metricsInsecure) + assert.False(t, cfg.auditInsecure) assert.Equal(t, "0.0.0.0", cfg.auditListenAddress) assert.Equal(t, 9444, cfg.auditPort) assert.Equal(t, int64(10485760), cfg.auditMaxRequestBodyBytes) @@ -42,6 +44,17 @@ func TestParseFlagsWithArgs_Defaults(t *testing.T) { assert.Equal(t, 60*time.Second, cfg.auditIdleTimeout) } +func TestParseFlagsWithArgs_AuditUnsecure(t *testing.T) { + fs := flag.NewFlagSet("test-audit-insecure", flag.ContinueOnError) + args := []string{ + "--audit-insecure", + } + + cfg, err := parseFlagsWithArgs(fs, args) + require.NoError(t, err) + assert.True(t, cfg.auditInsecure) +} + func TestParseFlagsWithArgs_CustomAuditValues(t *testing.T) { fs := flag.NewFlagSet("test-custom", flag.ContinueOnError) args := []string{ diff --git a/config/default/audit_webhook_service_fixed_ip_patch.yaml b/config/default/audit_webhook_service_fixed_ip_patch.yaml deleted file mode 100644 index f7fe3197..00000000 --- a/config/default/audit_webhook_service_fixed_ip_patch.yaml +++ /dev/null @@ -1,10 +0,0 @@ -# Patch to set fixed ClusterIP for dedicated audit webhook service. -# This is required because kube-apiserver starts before CoreDNS -# and cannot rely on Service DNS resolution during bootstrap. -apiVersion: v1 -kind: Service -metadata: - name: audit-webhook-service - namespace: system -spec: - clusterIP: 10.96.200.200 diff --git a/config/default/kustomization.yaml b/config/default/kustomization.yaml index a73d8138..231903e2 100644 --- a/config/default/kustomization.yaml +++ b/config/default/kustomization.yaml @@ -25,27 +25,14 @@ resources: - ../certmanager # [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'. #- ../prometheus -# [METRICS] Expose the controller manager metrics service. -- metrics_service.yaml # [NETWORK POLICY] Protect the /metrics endpoint and Webhook Server with NetworkPolicy. # Only Pod(s) running a namespace labeled with 'metrics: enabled' will be able to gather the metrics. # Only CR(s) which requires webhooks and are applied on namespaces labeled with 'webhooks: enabled' will # be able to communicate with the Webhook Server. #- ../network-policy -# Uncomment the patches line if you enable Metrics +# Patches are only kept for optional feature wiring. patches: -# [METRICS] The following patch will enable the metrics endpoint using HTTPS and the port :8443. -# More info: https://book.kubebuilder.io/reference/metrics -- path: manager_metrics_patch.yaml - target: - kind: Deployment - -# [AUDIT-WEBHOOK] Set fixed ClusterIP for dedicated audit webhook service so kube-apiserver can connect before CoreDNS. -- path: audit_webhook_service_fixed_ip_patch.yaml - target: - kind: Service - name: audit-webhook-service # Uncomment the patches line if you enable Metrics and CertManager # [METRICS-WITH-CERTS] To enable metrics protected with certManager, uncomment the following line. @@ -126,7 +113,7 @@ replacements: - source: # Uncomment the following block if you have any webhook kind: Service version: v1 - name: webhook-service + name: service fieldPath: .metadata.name # Name of the service targets: - select: @@ -144,7 +131,7 @@ replacements: - source: kind: Service version: v1 - name: webhook-service + name: service fieldPath: .metadata.namespace # Namespace of the service targets: - select: @@ -163,7 +150,7 @@ replacements: - source: kind: Service version: v1 - name: audit-webhook-service + name: service fieldPath: .metadata.name # Name of the dedicated audit service targets: - select: @@ -181,7 +168,7 @@ replacements: - source: kind: Service version: v1 - name: audit-webhook-service + name: service fieldPath: .metadata.namespace # Namespace of the dedicated audit service targets: - select: diff --git a/config/default/manager_metrics_patch.yaml b/config/default/manager_metrics_patch.yaml deleted file mode 100644 index 2aaef653..00000000 --- a/config/default/manager_metrics_patch.yaml +++ /dev/null @@ -1,4 +0,0 @@ -# This patch adds the args to allow exposing the metrics endpoint using HTTPS -- op: add - path: /spec/template/spec/containers/0/args/0 - value: --metrics-bind-address=:8443 diff --git a/config/default/manager_webhook_patch.yaml b/config/default/manager_webhook_patch.yaml index eb372030..2e716303 100644 --- a/config/default/manager_webhook_patch.yaml +++ b/config/default/manager_webhook_patch.yaml @@ -22,31 +22,10 @@ name: webhook-server protocol: TCP -# Add the dedicated audit ingress server arguments. -- op: add - path: /spec/template/spec/containers/0/args/- - value: --audit-ingress-enabled=true -- op: add - path: /spec/template/spec/containers/0/args/- - value: --audit-listen-address=0.0.0.0 -- op: add - path: /spec/template/spec/containers/0/args/- - value: --audit-port=9444 +# Add the dedicated audit ingress certificate path. - op: add path: /spec/template/spec/containers/0/args/- value: --audit-cert-path=/tmp/k8s-audit-webhook-server/serving-certs -- op: add - path: /spec/template/spec/containers/0/args/- - value: --audit-max-request-body-bytes=10485760 -- op: add - path: /spec/template/spec/containers/0/args/- - value: --audit-read-timeout=15s -- op: add - path: /spec/template/spec/containers/0/args/- - value: --audit-write-timeout=30s -- op: add - path: /spec/template/spec/containers/0/args/- - value: --audit-idle-timeout=60s # Add the volumeMount for dedicated audit webhook certificates. - op: add diff --git a/config/default/metrics_service.yaml b/config/default/metrics_service.yaml deleted file mode 100644 index 4dff8901..00000000 --- a/config/default/metrics_service.yaml +++ /dev/null @@ -1,18 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - labels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: controller-manager-metrics-service - namespace: system -spec: - ports: - - name: https - port: 8443 - protocol: TCP - targetPort: 8443 - selector: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser diff --git a/config/default/webhook_service_fixed_ip_patch.yaml b/config/default/webhook_service_fixed_ip_patch.yaml deleted file mode 100644 index 3e647bfe..00000000 --- a/config/default/webhook_service_fixed_ip_patch.yaml +++ /dev/null @@ -1,11 +0,0 @@ -# Patch to set fixed ClusterIP for webhook service -# This is required for audit webhook to work because kube-apiserver -# starts before CoreDNS, so DNS resolution (.svc.cluster.local) fails. -# Using a fixed IP allows kube-apiserver to connect on startup. -apiVersion: v1 -kind: Service -metadata: - name: webhook-service - namespace: system -spec: - clusterIP: 10.96.200.200 \ No newline at end of file diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml index 6437af9d..c9ee57d8 100644 --- a/config/manager/manager.yaml +++ b/config/manager/manager.yaml @@ -61,6 +61,7 @@ spec: - command: - /manager args: + - --metrics-bind-address=:8443 - --health-probe-bind-address=:8081 image: controller:latest name: manager diff --git a/config/webhook/audit-service.yaml b/config/webhook/audit-service.yaml deleted file mode 100644 index 60aa332e..00000000 --- a/config/webhook/audit-service.yaml +++ /dev/null @@ -1,16 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: audit-webhook-service - namespace: system -spec: - ports: - - port: 443 - protocol: TCP - targetPort: 9444 - selector: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser diff --git a/config/webhook/kustomization.yaml b/config/webhook/kustomization.yaml index 992a7085..7bfe5442 100644 --- a/config/webhook/kustomization.yaml +++ b/config/webhook/kustomization.yaml @@ -1,7 +1,12 @@ resources: - manifests.yaml - service.yaml -- audit-service.yaml configurations: - kustomizeconfig.yaml + +patches: +- path: webhook_service_name_patch.yaml + target: + kind: ValidatingWebhookConfiguration + name: validating-webhook-configuration diff --git a/config/webhook/service.yaml b/config/webhook/service.yaml index 8f892bce..4973d28e 100644 --- a/config/webhook/service.yaml +++ b/config/webhook/service.yaml @@ -4,13 +4,23 @@ metadata: labels: app.kubernetes.io/name: gitops-reverser app.kubernetes.io/managed-by: kustomize - name: webhook-service + name: service namespace: system spec: + clusterIP: 10.96.200.200 # This is required because kube-apiserver starts before CoreDNS ports: - - port: 443 + - name: webhook-server + port: 443 protocol: TCP targetPort: 9443 + - name: audit-server + port: 9444 + protocol: TCP + targetPort: 9444 + - name: metrics + port: 8443 + protocol: TCP + targetPort: 8443 selector: control-plane: controller-manager app.kubernetes.io/name: gitops-reverser diff --git a/config/webhook/webhook_service_name_patch.yaml b/config/webhook/webhook_service_name_patch.yaml new file mode 100644 index 00000000..fc2aa479 --- /dev/null +++ b/config/webhook/webhook_service_name_patch.yaml @@ -0,0 +1,9 @@ +apiVersion: admissionregistration.k8s.io/v1 +kind: ValidatingWebhookConfiguration +metadata: + name: validating-webhook-configuration +webhooks: +- name: gitops-reverser.configbutler.ai + clientConfig: + service: + name: service diff --git a/docs/design/https-server-alignment-and-service-plan.md b/docs/design/https-server-alignment-and-service-plan.md index 87606454..260f70b5 100644 --- a/docs/design/https-server-alignment-and-service-plan.md +++ b/docs/design/https-server-alignment-and-service-plan.md @@ -20,6 +20,7 @@ Run with **a single pod** for the current phase. - [ ] HA-specific behavior is disabled/ignored by default. - [ ] Leader-only Service has been removed from active topology. - [ ] HA reintroduction is explicitly deferred to the planned rewrite. +- [ ] Service exposure is consolidated to a single Service named only `{{ include "gitops-reverser.fullname" . }}`. ## Current Constraints @@ -46,6 +47,51 @@ Use **one Service with three ports**: This minimizes moving parts for the interim single-pod phase. +### Service naming decision + +Use a single Service with the base release fullname only: + +- target name: `{{ include "gitops-reverser.fullname" . }}` +- avoid suffixes such as `-webhook`, `-audit`, `-metrics` for the primary Service +- keep distinct named ports for routing/monitoring clarity + +## Single Service Necessity Analysis + +### Is a single Service still needed? + +Yes for this phase, and there is no strong technical reason to keep separate Services right now. + +### Why this still makes sense now + +- Current services all select the same controller Pod labels, so they do not provide workload isolation. +- Single-replica mode removes the previous leader-vs-all selector split that justified separate routing. +- Operationally, one stable Service name simplifies client configuration and day-2 debugging. +- The design already requires one endpoint surface with different ports, which matches Service named ports well. + +### What currently depends on split service names (implementation impact) + +- `charts/gitops-reverser/templates/validating-webhook.yaml` references the `-webhook` service name. +- `charts/gitops-reverser/templates/certificates.yaml` SANs include `-webhook` and `-audit` DNS names. +- `charts/gitops-reverser/templates/servicemonitor.yaml` and e2e checks currently expect a dedicated metrics service identity. +- `test/e2e/e2e_test.go` asserts `gitops-reverser-webhook-service` and `gitops-reverser-audit-webhook-service`. + +### Conclusion + +- Keep the plan to converge to **one Service**. +- Use one canonical Service name only (release fullname). +- Keep multiple ports; do not keep multiple Services unless HA/service-isolation requirements return. + +## No-Compatibility Decision + +For this refactor, use a direct switch without migration compatibility measures. + +Rationale: + +- The old settings layout is already causing conceptual drift (`webhook.server`, `auditIngress`, `controllerManager.metrics`). +- A compatibility layer would preserve that drift and increase implementation/testing complexity. +- The project is intentionally converging on one topology and one config model for this phase. +- A hard cut keeps behavior deterministic and easier to reason about during rapid iteration. + ## Alignment Plan ## 1. Unify server config model @@ -55,7 +101,9 @@ This minimizes moving parts for the interim single-pod phase. - cert path/name/key - read/write/idle timeout - TLS enabled/insecure mode guard +- Define baseline defaults in source code, not in Helm values. - Map flags into this model for all three servers. +- Keep one parser/defaulting path for all listeners (no per-listener parsing forks). ## 2. Unify TLS/cert watcher bootstrap @@ -65,20 +113,19 @@ This minimizes moving parts for the interim single-pod phase. - wires `GetCertificate` - applies shared TLS defaults (minimum version + HTTP/2 policy) - Use same helper for metrics, admission, and audit. +- Keep TLS-off behavior in the same helper path (no duplicate conditional logic per server). ## 3. Unify server lifecycle wiring - Keep all servers manager-managed. - Reuse one runnable pattern for startup/shutdown + timeout. - Standardize startup/shutdown logs and error paths. +- Build servers through one reusable constructor/builder function that accepts a typed server config. ## 4. Align Helm values and args -- Keep existing keys for compatibility, but normalize structure: - - `webhook.server` - - `auditIngress` - - `controllerManager.metrics` -- Ensure timeout and cert naming is consistent across all three blocks. +- Replace legacy split keys with one canonical settings structure. +- Ensure timeout and cert naming is consistent across all three listeners. ## 5. Simplify deployment model now @@ -89,7 +136,8 @@ This minimizes moving parts for the interim single-pod phase. ## 6. Service simplification (single service) - Merge admission, audit, and metrics onto one Service with three target ports. -- Update cert SANs and docs accordingly. +- Name the Service as release fullname only (no role suffix). +- Update validating webhook client config, cert SANs, ServiceMonitor selector/port, and docs accordingly. ## 7. Tests and rollout checks @@ -118,6 +166,7 @@ The chart should converge on: - One shared server settings shape reused by all three listeners. - Per-surface overrides only where behavior is genuinely different. - Per-server TLS can be enabled/disabled independently. +- Defaults are centralized in source code; Helm values provide explicit overrides only. ### Proposed Helm Values Shape @@ -127,6 +176,7 @@ replicaCount: 1 network: service: enabled: true + name: "" # defaults to {{ include "gitops-reverser.fullname" . }} type: ClusterIP ports: admission: 443 @@ -134,23 +184,11 @@ network: metrics: 8443 servers: - defaults: - enableHTTP2: false - timeouts: - read: 15s - write: 30s - idle: 60s - tls: - enabled: true - certPath: "" - certName: tls.crt - certKey: tls.key - minVersion: VersionTLS12 - admission: enabled: true bindAddress: :9443 - timeouts: {} # optional override + enableHTTP2: false # optional override + timeouts: {} # optional override tls: enabled: true # may be set false for local/dev scenarios secretName: "" # optional if cert-manager manages mount/secret @@ -159,6 +197,7 @@ servers: enabled: true bindAddress: :9444 maxRequestBodyBytes: 10485760 + enableHTTP2: false timeouts: {} tls: enabled: true @@ -167,37 +206,35 @@ servers: metrics: enabled: true bindAddress: :8080 - secure: true + enableHTTP2: false timeouts: {} tls: enabled: true secretName: "" ``` - -If `servers..tls.enabled` is omitted, inherit from `servers.defaults.tls.enabled`. +If `servers..tls.enabled` (or timeout/http2 overrides) is omitted, source-code defaults apply. ### Settings Responsibilities | Area | Purpose | Notes | |---|---|---| -| `servers.defaults` | Shared defaults for all HTTPS listeners | Single source of truth for TLS + timeout defaults, including TLS default on/off | | `servers.admission` | Admission-specific listener settings | Keeps webhook behavior settings separate under `webhook.validating` | | `servers.audit` | Audit ingress listener settings | Retains audit payload controls like `maxRequestBodyBytes` | | `servers.metrics` | Metrics listener settings | Supports secure metrics endpoint consistently, but can be intentionally downgraded per environment | -| `network.service` | Cluster Service exposure | Owns externally reachable ports only, not container bind ports | +| `network.service` | Cluster Service exposure | Owns service name and externally reachable ports (not container bind ports) | +| Source code defaults | Runtime baseline behavior | Holds canonical defaults for timeouts, TLS baseline, and HTTP/2 policy | -### Compatibility Mapping (Current -> Target) +### Key Mapping (Current -> Target, No Compatibility Layer) -| Current key | Target key | Migration intent | -|---|---|---| -| `webhook.server.port` | `servers.admission.bindAddress` | Keep old key as compatibility alias initially | -| `webhook.server.certPath/certName/certKey` | `servers.admission.tls.*` (or inherited defaults) | Prefer inherited defaults unless explicitly overridden | -| `auditIngress.port` | `servers.audit.bindAddress` | Preserve behavior, normalize naming | -| `auditIngress.tls.*` | `servers.audit.tls.*` | Direct move | -| `auditIngress.timeouts.*` | `servers.audit.timeouts.*` | Direct move | -| `controllerManager.metrics.bindAddress` | `servers.metrics.bindAddress` | Unify metrics with same server model | -| `controllerManager.enableHTTP2` | `servers.defaults.enableHTTP2` | Single flag for all listeners in this phase | -| `controllerManager.metrics.secure` (if present) | `servers.metrics.tls.enabled` | Keep compatibility alias during migration | +| Current key | Target key | +|---|---| +| `webhook.server.port` | `servers.admission.bindAddress` | +| `webhook.server.certPath/certName/certKey` | `servers.admission.tls.*` (or source-code defaults) | +| `auditIngress.port` | `servers.audit.bindAddress` | +| `auditIngress.tls.*` | `servers.audit.tls.*` | +| `auditIngress.timeouts.*` | `servers.audit.timeouts.*` | +| `controllerManager.metrics.bindAddress` | `servers.metrics.bindAddress` | +| `controllerManager.enableHTTP2` | `servers..enableHTTP2` or source-code default | ### CLI Args/Runtime Mapping Direction @@ -206,6 +243,7 @@ Desired runtime model: - Parse Helm values into one internal server settings struct per surface. - Apply shared defaulting/validation once. - Generate listener-specific runtime config from the same code path. +- Construct `http.Server` instances via shared functions (for example `buildHTTPServer`, `buildTLSConfig`, `buildServerRunnable`) instead of per-listener copies. Resulting behavior goals: @@ -213,6 +251,7 @@ Resulting behavior goals: - Same timeout parsing and error messages for all listeners. - Same startup/shutdown lifecycle pattern for all listeners. - If TLS is disabled for a listener, skip cert watcher/bootstrap for that listener and run plain HTTP on its bind address. +- No triple repetition of server setup code for admission/audit/metrics. ### TLS Disable Guardrails @@ -223,7 +262,7 @@ Resulting behavior goals: ### Rollout Notes For Settings Refactor -- Keep legacy keys supported during transition. -- Emit clear deprecation warnings when legacy keys are used. -- Switch docs/examples to target keys first; keep compatibility notes adjacent. -- Remove deprecated keys only after at least one stable release carrying warnings. +- Use a clean-cut switch to the new settings model. +- Do not ship compatibility aliases or legacy key fallbacks. +- Update chart docs/examples and templates in the same change set. +- Fail fast on invalid/unknown legacy settings to avoid ambiguous runtime behavior. diff --git a/test/e2e/e2e_test.go b/test/e2e/e2e_test.go index bdb38daa..9de90180 100644 --- a/test/e2e/e2e_test.go +++ b/test/e2e/e2e_test.go @@ -203,67 +203,70 @@ var _ = Describe("Manager", Ordered, func() { }) It("should route webhook traffic to the running controller pod", func() { - By("verifying webhook service selects the running controller pod") + By("verifying controller service selects the running controller pod") verifyWebhookService := func(g Gomega) { - // Get webhook service endpoints + // Get controller service endpoints cmd := exec.Command("kubectl", "get", "endpoints", - "gitops-reverser-webhook-service", "-n", namespace, + controllerServiceName, "-n", namespace, "-o", "jsonpath={.subsets[*].addresses[*].targetRef.name}") output, err := utils.Run(cmd) - g.Expect(err).NotTo(HaveOccurred(), "Failed to get webhook service endpoints") + g.Expect(err).NotTo(HaveOccurred(), "Failed to get controller service endpoints") // Filter out kubectl deprecation warnings from output lines := utils.GetNonEmptyLines(output) - var podNames []string + podSet := map[string]struct{}{} for _, line := range lines { // Skip warning lines if !strings.HasPrefix(line, "Warning:") && !strings.Contains(line, "deprecated") && strings.Contains(line, "controller-manager") { - podNames = append(podNames, line) + podSet[line] = struct{}{} } } + var podNames []string + for podName := range podSet { + podNames = append(podNames, podName) + } // Should only have one endpoint in single-pod mode. - g.Expect(podNames).To(HaveLen(1), "webhook service should route to exactly 1 pod") - g.Expect(podNames[0]).To(Equal(controllerPodName), "webhook should route to controller pod") + g.Expect(podNames).To(HaveLen(1), "controller service should route to exactly 1 pod") + g.Expect(podNames[0]).To(Equal(controllerPodName), "controller service should route to controller pod") - By(fmt.Sprintf("✅ Webhook service correctly routes to pod: %s", controllerPodName)) + By(fmt.Sprintf("✅ Controller service correctly routes to pod: %s", controllerPodName)) } Eventually(verifyWebhookService, 30*time.Second).Should(Succeed()) }) - It("should expose both admission and audit services", func() { - By("verifying admission webhook service exists") - cmd := exec.Command("kubectl", "get", "svc", "gitops-reverser-webhook-service", "-n", namespace) + It("should expose admission and audit ports on one controller service", func() { + By("verifying controller service exists") + cmd := exec.Command("kubectl", "get", "svc", controllerServiceName, "-n", namespace) _, err := utils.Run(cmd) - Expect(err).NotTo(HaveOccurred(), "Admission webhook service should exist") - - By("verifying audit webhook service exists") - cmd = exec.Command("kubectl", "get", "svc", "gitops-reverser-audit-webhook-service", "-n", namespace) - _, err = utils.Run(cmd) - Expect(err).NotTo(HaveOccurred(), "Audit webhook service should exist") + Expect(err).NotTo(HaveOccurred(), "Controller service should exist") - By("verifying audit service routes to the controller pod") + By("verifying controller service routes to the controller pod") Eventually(func(g Gomega) { endpointsCmd := exec.Command("kubectl", "get", "endpoints", - "gitops-reverser-audit-webhook-service", "-n", namespace, + controllerServiceName, "-n", namespace, "-o", "jsonpath={.subsets[*].addresses[*].targetRef.name}") output, endpointsErr := utils.Run(endpointsCmd) - g.Expect(endpointsErr).NotTo(HaveOccurred(), "Failed to get audit service endpoints") + g.Expect(endpointsErr).NotTo(HaveOccurred(), "Failed to get controller service endpoints") lines := utils.GetNonEmptyLines(output) - var podNames []string + podSet := map[string]struct{}{} for _, line := range lines { if !strings.HasPrefix(line, "Warning:") && !strings.Contains(line, "deprecated") && strings.Contains(line, "controller-manager") { - podNames = append(podNames, line) + podSet[line] = struct{}{} } } + var podNames []string + for podName := range podSet { + podNames = append(podNames, podName) + } - g.Expect(podNames).To(HaveLen(1), "audit service should route to exactly 1 pod") - g.Expect(podNames[0]).To(Equal(controllerPodName), "audit service should route to controller pod") + g.Expect(podNames).To(HaveLen(1), "controller service should route to exactly 1 pod") + g.Expect(podNames[0]).To(Equal(controllerPodName), "controller service should route to controller pod") }, 30*time.Second).Should(Succeed()) }) @@ -282,14 +285,14 @@ var _ = Describe("Manager", Ordered, func() { }) It("should ensure the metrics endpoint is serving metrics", func() { - By("validating that the metrics service is available") - cmd := exec.Command("kubectl", "get", "service", metricsServiceName, "-n", namespace) + By("validating that the controller service is available for metrics") + cmd := exec.Command("kubectl", "get", "service", controllerServiceName, "-n", namespace) _, err := utils.Run(cmd) - Expect(err).NotTo(HaveOccurred(), "Metrics service should exist") + Expect(err).NotTo(HaveOccurred(), "Controller service should exist") By("waiting for the metrics endpoint to be ready") verifyMetricsEndpointReady := func(g Gomega) { - cmd := exec.Command("kubectl", "get", "endpoints", metricsServiceName, "-n", namespace) + cmd := exec.Command("kubectl", "get", "endpoints", controllerServiceName, "-n", namespace) output, err := utils.Run(cmd) g.Expect(err).NotTo(HaveOccurred()) g.Expect(output).To(ContainSubstring("8443"), "Metrics endpoint is not ready") diff --git a/test/e2e/helpers.go b/test/e2e/helpers.go index 2bbd15f2..f37e2828 100644 --- a/test/e2e/helpers.go +++ b/test/e2e/helpers.go @@ -44,8 +44,8 @@ import ( const namespace = "sut" const metricWaitDefaultTimeout = 30 * time.Second -// metricsServiceName is the name of the metrics service of the project. -const metricsServiceName = "gitops-reverser-controller-manager-metrics-service" +// controllerServiceName is the single Service name used by the controller. +const controllerServiceName = "gitops-reverser-service" // promAPI is the Prometheus API client instance var promAPI v1.API //nolint:gochecknoglobals // Shared across test functions diff --git a/test/e2e/kind/README.md b/test/e2e/kind/README.md index df684afd..d63c922d 100644 --- a/test/e2e/kind/README.md +++ b/test/e2e/kind/README.md @@ -116,7 +116,7 @@ The audit webhook tracks metrics with labels: 2. **Verify audit webhook service exists**: ```bash - kubectl get svc -n gitops-reverser-system gitops-reverser-audit-webhook-service + kubectl get svc -n gitops-reverser-system gitops-reverser-service ``` 3. **Check if kube-apiserver can reach the webhook**: diff --git a/test/e2e/kind/audit/webhook-config.yaml b/test/e2e/kind/audit/webhook-config.yaml index 489e7d94..4d3b4f2d 100644 --- a/test/e2e/kind/audit/webhook-config.yaml +++ b/test/e2e/kind/audit/webhook-config.yaml @@ -9,7 +9,7 @@ clusters: # IMPORTANT: Use fixed ClusterIP instead of DNS name # kube-apiserver starts before CoreDNS, so DNS resolution fails at startup # The ClusterIP is set in config/default/audit_webhook_service_fixed_ip_patch.yaml - server: https://10.96.200.200:443/audit-webhook/kind-e2e + server: https://10.96.200.200:9444/audit-webhook/kind-e2e # Skip TLS verification for testing (webhook uses self-signed cert from cert-manager) insecure-skip-tls-verify: true contexts: diff --git a/test/e2e/kind/start-cluster.sh b/test/e2e/kind/start-cluster.sh index eba395a1..11094a99 100755 --- a/test/e2e/kind/start-cluster.sh +++ b/test/e2e/kind/start-cluster.sh @@ -24,16 +24,17 @@ echo "✅ Generated configuration:" cat "$CONFIG_FILE" echo "" -# Check if cluster already exists +# Recreate cluster on every run so kube-apiserver always picks up current +# audit webhook policy/config files from the mounted directory. if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then - echo "✅ Cluster '$CLUSTER_NAME' already exists. Skipping creation." - kind export kubeconfig --name "$CLUSTER_NAME" -else - echo "🚀 Creating Kind cluster '$CLUSTER_NAME' with audit webhook support..." - kind create cluster --name "$CLUSTER_NAME" --config "$CONFIG_FILE" --wait 5m - echo "✅ Kind cluster created successfully" + echo "♻️ Recreating existing Kind cluster '$CLUSTER_NAME' to refresh audit webhook config..." + kind delete cluster --name "$CLUSTER_NAME" fi +echo "🚀 Creating Kind cluster '$CLUSTER_NAME' with audit webhook support..." +kind create cluster --name "$CLUSTER_NAME" --config "$CONFIG_FILE" --wait 5m +echo "✅ Kind cluster created successfully" + echo "📋 Configuring kubeconfig for cluster '$CLUSTER_NAME'..." kind export kubeconfig --name "$CLUSTER_NAME" diff --git a/test/e2e/prometheus/deployment.yaml b/test/e2e/prometheus/deployment.yaml index afada271..c3614e01 100644 --- a/test/e2e/prometheus/deployment.yaml +++ b/test/e2e/prometheus/deployment.yaml @@ -17,39 +17,15 @@ data: scrape_timeout: 4s scrape_configs: - # Scrape gitops-reverser metrics from 'sut' namespace + # Scrape gitops-reverser metrics from the single controller Service in 'sut' - job_name: 'gitops-reverser-metrics' scheme: https tls_config: insecure_skip_verify: true # Self-signed certs in e2e bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - - kubernetes_sd_configs: - - role: endpoints - namespaces: - names: - - sut # Target test namespace - - relabel_configs: - # Keep only the metrics service - - source_labels: [__meta_kubernetes_service_name] - action: keep - regex: gitops-reverser-controller-manager-metrics-service - - # Keep only HTTPS port - - source_labels: [__meta_kubernetes_endpoint_port_name] - action: keep - regex: https - - # Add pod name label for per-pod metrics - - source_labels: [__meta_kubernetes_pod_name] - target_label: pod - action: replace - - # Add role label from pod labels (leader/follower) - - source_labels: [__meta_kubernetes_pod_label_role] - target_label: role - action: replace + static_configs: + - targets: + - gitops-reverser-service.sut.svc.cluster.local:8443 --- apiVersion: apps/v1 kind: Deployment @@ -127,4 +103,4 @@ spec: ports: - name: http port: 19090 - targetPort: http \ No newline at end of file + targetPort: http diff --git a/test/e2e/scripts/setup-prometheus.sh b/test/e2e/scripts/setup-prometheus.sh index 2aafb328..46565591 100755 --- a/test/e2e/scripts/setup-prometheus.sh +++ b/test/e2e/scripts/setup-prometheus.sh @@ -15,4 +15,8 @@ kubectl apply -f test/e2e/prometheus/rbac.yaml echo "Deploying Prometheus..." kubectl apply -f test/e2e/prometheus/deployment.yaml +echo "Restarting Prometheus deployment to pick up ConfigMap changes..." +kubectl rollout restart deployment/prometheus -n prometheus-e2e +kubectl rollout status deployment/prometheus -n prometheus-e2e --timeout=120s + echo "✅ Prometheus manifests deployed" From c05d84f6f2b12353f16308ffa4407ddf397e1f90 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 08:32:49 +0000 Subject: [PATCH 06/32] feat: Spring cleaning of /config --- Makefile | 6 +- .../templates/certificates.yaml | 4 +- .../templates/validating-webhook.yaml | 2 +- charts/gitops-reverser/values.yaml | 4 +- config/README.md | 12 + .../certificate-audit-webhook.yaml | 19 -- config/certmanager/certificate-metrics.yaml | 22 -- config/certmanager/certificate-webhook.yaml | 22 -- config/certmanager/issuer.yaml | 13 - config/certmanager/kustomization.yaml | 8 - config/certmanager/kustomizeconfig.yaml | 8 - config/certs/certificates.yaml | 37 +++ config/certs/issuer.yaml | 12 + config/certs/kustomization.yaml | 5 + .../clusterwatchrules.configbutler.ai.yaml | 253 +++++++++++++++++ .../bases/gitproviders.configbutler.ai.yaml | 168 +++++++++++ .../crd/bases/gittargets.configbutler.ai.yaml | 148 ++++++++++ .../crd/bases/watchrules.configbutler.ai.yaml | 232 +++++++++++++++ config/crd/kustomization.yaml | 24 +- config/crd/kustomizeconfig.yaml | 19 -- .../default/cert_metrics_manager_patch.yaml | 30 -- config/default/kustomization.yaml | 264 ------------------ config/default/manager_webhook_patch.yaml | 60 ---- config/{manager => }/kustomization.yaml | 10 +- config/manager.yaml | 99 +++++++ config/manager/manager.yaml | 112 -------- config/namespace.yaml | 8 + .../network-policy/allow-metrics-traffic.yaml | 27 -- .../network-policy/allow-webhook-traffic.yaml | 27 -- config/network-policy/kustomization.yaml | 3 - config/prometheus/kustomization.yaml | 11 - config/prometheus/monitor.yaml | 27 -- config/prometheus/monitor_tls_patch.yaml | 19 -- ...> gitops-reverser-controller-manager.yaml} | 6 +- ... => gitops-reverser-demo-jane-access.yaml} | 2 +- config/rbac/gitops-reverser-manager-role.yaml | 54 ++++ ... gitops-reverser-manager-rolebinding.yaml} | 10 +- config/rbac/gitprovider_admin_role.yaml | 27 -- config/rbac/gitprovider_editor_role.yaml | 33 --- config/rbac/gitprovider_viewer_role.yaml | 29 -- config/rbac/gittarget_admin_role.yaml | 27 -- config/rbac/gittarget_editor_role.yaml | 33 --- config/rbac/gittarget_viewer_role.yaml | 29 -- config/rbac/kustomization.yaml | 44 +-- config/rbac/metrics_auth_role.yaml | 17 -- config/rbac/metrics_auth_role_binding.yaml | 12 - config/rbac/metrics_reader_role.yaml | 9 - config/rbac/watchrule_admin_role.yaml | 27 -- config/rbac/watchrule_editor_role.yaml | 33 --- config/rbac/watchrule_viewer_role.yaml | 29 -- config/samples/clusterwatchrule.yaml | 33 --- config/samples/gitprovider.yaml | 16 -- config/samples/gittarget.yaml | 13 - config/samples/kustomization.yaml | 7 - config/samples/watchrule.yaml | 39 --- config/service.yaml | 27 ++ config/webhook.yaml | 30 ++ config/webhook/kustomization.yaml | 12 - config/webhook/kustomizeconfig.yaml | 22 -- config/webhook/service.yaml | 26 -- .../webhook/webhook_service_name_patch.yaml | 9 - ...onfig-kustomize-simplification-findings.md | 202 ++++++++++++++ test/e2e/helpers.go | 9 - test/e2e/prometheus/deployment.yaml | 4 +- 64 files changed, 1328 insertions(+), 1257 deletions(-) create mode 100644 config/README.md delete mode 100644 config/certmanager/certificate-audit-webhook.yaml delete mode 100644 config/certmanager/certificate-metrics.yaml delete mode 100644 config/certmanager/certificate-webhook.yaml delete mode 100644 config/certmanager/issuer.yaml delete mode 100644 config/certmanager/kustomization.yaml delete mode 100644 config/certmanager/kustomizeconfig.yaml create mode 100644 config/certs/certificates.yaml create mode 100644 config/certs/issuer.yaml create mode 100644 config/certs/kustomization.yaml create mode 100644 config/crd/bases/clusterwatchrules.configbutler.ai.yaml create mode 100644 config/crd/bases/gitproviders.configbutler.ai.yaml create mode 100644 config/crd/bases/gittargets.configbutler.ai.yaml create mode 100644 config/crd/bases/watchrules.configbutler.ai.yaml delete mode 100644 config/crd/kustomizeconfig.yaml delete mode 100644 config/default/cert_metrics_manager_patch.yaml delete mode 100644 config/default/kustomization.yaml delete mode 100644 config/default/manager_webhook_patch.yaml rename config/{manager => }/kustomization.yaml (71%) create mode 100644 config/manager.yaml delete mode 100644 config/manager/manager.yaml create mode 100644 config/namespace.yaml delete mode 100644 config/network-policy/allow-metrics-traffic.yaml delete mode 100644 config/network-policy/allow-webhook-traffic.yaml delete mode 100644 config/network-policy/kustomization.yaml delete mode 100644 config/prometheus/kustomization.yaml delete mode 100644 config/prometheus/monitor.yaml delete mode 100644 config/prometheus/monitor_tls_patch.yaml rename config/rbac/{service_account.yaml => gitops-reverser-controller-manager.yaml} (70%) rename config/rbac/{test_user_role_binding.yaml => gitops-reverser-demo-jane-access.yaml} (85%) create mode 100644 config/rbac/gitops-reverser-manager-role.yaml rename config/rbac/{role_binding.yaml => gitops-reverser-manager-rolebinding.yaml} (66%) delete mode 100644 config/rbac/gitprovider_admin_role.yaml delete mode 100644 config/rbac/gitprovider_editor_role.yaml delete mode 100644 config/rbac/gitprovider_viewer_role.yaml delete mode 100644 config/rbac/gittarget_admin_role.yaml delete mode 100644 config/rbac/gittarget_editor_role.yaml delete mode 100644 config/rbac/gittarget_viewer_role.yaml delete mode 100644 config/rbac/metrics_auth_role.yaml delete mode 100644 config/rbac/metrics_auth_role_binding.yaml delete mode 100644 config/rbac/metrics_reader_role.yaml delete mode 100644 config/rbac/watchrule_admin_role.yaml delete mode 100644 config/rbac/watchrule_editor_role.yaml delete mode 100644 config/rbac/watchrule_viewer_role.yaml delete mode 100644 config/samples/clusterwatchrule.yaml delete mode 100644 config/samples/gitprovider.yaml delete mode 100644 config/samples/gittarget.yaml delete mode 100644 config/samples/kustomization.yaml delete mode 100644 config/samples/watchrule.yaml create mode 100644 config/service.yaml create mode 100644 config/webhook.yaml delete mode 100644 config/webhook/kustomization.yaml delete mode 100644 config/webhook/kustomizeconfig.yaml delete mode 100644 config/webhook/service.yaml delete mode 100644 config/webhook/webhook_service_name_patch.yaml create mode 100644 docs/config-kustomize-simplification-findings.md diff --git a/Makefile b/Makefile index 6b9ed0e2..5e70363b 100644 --- a/Makefile +++ b/Makefile @@ -160,12 +160,12 @@ uninstall: manifests ## Uninstall CRDs from the K8s cluster specified in ~/.kube .PHONY: deploy deploy: manifests ## Deploy controller to the K8s cluster specified in ~/.kube/config. - cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG} - $(KUSTOMIZE) build config/default | $(KUBECTL) apply -f - + cd config && $(KUSTOMIZE) edit set image controller=${IMG} + $(KUSTOMIZE) build config | $(KUBECTL) apply -f - .PHONY: undeploy undeploy: ## Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion. - $(KUSTOMIZE) build config/default | $(KUBECTL) delete --ignore-not-found=true -f - + $(KUSTOMIZE) build config | $(KUBECTL) delete --ignore-not-found=true -f - ##@ Dependencies diff --git a/charts/gitops-reverser/templates/certificates.yaml b/charts/gitops-reverser/templates/certificates.yaml index d36c2681..debf58ca 100644 --- a/charts/gitops-reverser/templates/certificates.yaml +++ b/charts/gitops-reverser/templates/certificates.yaml @@ -15,7 +15,7 @@ spec: apiVersion: cert-manager.io/v1 kind: Certificate metadata: - name: {{ include "gitops-reverser.fullname" . }}-serving-cert + name: {{ include "gitops-reverser.fullname" . }}-webhook-server-cert namespace: {{ .Release.Namespace }} labels: {{- include "gitops-reverser.labels" . | nindent 4 }} @@ -38,7 +38,7 @@ spec: apiVersion: cert-manager.io/v1 kind: Certificate metadata: - name: {{ include "gitops-reverser.fullname" . }}-audit-serving-cert + name: {{ include "gitops-reverser.fullname" . }}-audit-server-cert namespace: {{ .Release.Namespace }} labels: {{- include "gitops-reverser.labels" . | nindent 4 }} diff --git a/charts/gitops-reverser/templates/validating-webhook.yaml b/charts/gitops-reverser/templates/validating-webhook.yaml index b6253e1d..494bf1b3 100644 --- a/charts/gitops-reverser/templates/validating-webhook.yaml +++ b/charts/gitops-reverser/templates/validating-webhook.yaml @@ -7,7 +7,7 @@ metadata: {{- include "gitops-reverser.labels" . | nindent 4 }} {{- if .Values.certificates.certManager.enabled }} annotations: - cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "gitops-reverser.fullname" . }}-serving-cert + cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "gitops-reverser.fullname" . }}-webhook-server-cert {{- end }} webhooks: - admissionReviewVersions: diff --git a/charts/gitops-reverser/values.yaml b/charts/gitops-reverser/values.yaml index 3a13e677..ea8e2658 100644 --- a/charts/gitops-reverser/values.yaml +++ b/charts/gitops-reverser/values.yaml @@ -59,7 +59,7 @@ servers: # Controls webhook TLS wiring in the controller process. # Keep enabled for normal Kubernetes webhook operation. enabled: true - certPath: "/tmp/k8s-webhook-server/serving-certs" + certPath: "/tmp/k8s-webhook-server/webhook-server-certs" certName: "tls.crt" certKey: "tls.key" @@ -70,7 +70,7 @@ servers: tls: # Serve audit ingress over HTTPS when true, HTTP when false. enabled: true - certPath: "/tmp/k8s-audit-webhook-server/serving-certs" + certPath: "/tmp/k8s-audit-webhook-server/webhook-server-certs" certName: "tls.crt" certKey: "tls.key" secretName: "" diff --git a/config/README.md b/config/README.md new file mode 100644 index 00000000..1194fdf7 --- /dev/null +++ b/config/README.md @@ -0,0 +1,12 @@ +# config_raw + +This folder is a static, rendered snapshot of `kustomize build config/default`. + +Goals: +- Keep manifests simple and explicit. +- Avoid patches/replacements/transformer indirection. +- Make side-by-side comparison with `config/` easy. + +Notes: +- These files are intentionally environment-specific to the current render profile. +- Update by re-rendering from `config/default` when source config changes. diff --git a/config/certmanager/certificate-audit-webhook.yaml b/config/certmanager/certificate-audit-webhook.yaml deleted file mode 100644 index e62724ae..00000000 --- a/config/certmanager/certificate-audit-webhook.yaml +++ /dev/null @@ -1,19 +0,0 @@ -apiVersion: cert-manager.io/v1 -kind: Certificate -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: audit-serving-cert - namespace: system -spec: - # SERVICE_NAME and SERVICE_NAMESPACE are replaced in config/default/kustomization.yaml - dnsNames: - - AUDIT_SERVICE_NAME.SERVICE_NAMESPACE.svc - - AUDIT_SERVICE_NAME.SERVICE_NAMESPACE.svc.cluster.local - issuerRef: - kind: Issuer - name: selfsigned-issuer - secretName: audit-webhook-server-cert - privateKey: - rotationPolicy: Always diff --git a/config/certmanager/certificate-metrics.yaml b/config/certmanager/certificate-metrics.yaml deleted file mode 100644 index fb45c2c7..00000000 --- a/config/certmanager/certificate-metrics.yaml +++ /dev/null @@ -1,22 +0,0 @@ -# The following manifests contain a self-signed issuer CR and a metrics certificate CR. -# More document can be found at https://docs.cert-manager.io -apiVersion: cert-manager.io/v1 -kind: Certificate -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: metrics-certs # this name should match the one appeared in kustomizeconfig.yaml - namespace: system -spec: - dnsNames: - # SERVICE_NAME and SERVICE_NAMESPACE will be substituted by kustomize - # replacements in the config/default/kustomization.yaml file. - - SERVICE_NAME.SERVICE_NAMESPACE.svc - - SERVICE_NAME.SERVICE_NAMESPACE.svc.cluster.local - issuerRef: - kind: Issuer - name: selfsigned-issuer - secretName: metrics-server-cert - privateKey: - rotationPolicy: Always \ No newline at end of file diff --git a/config/certmanager/certificate-webhook.yaml b/config/certmanager/certificate-webhook.yaml deleted file mode 100644 index 01f0a793..00000000 --- a/config/certmanager/certificate-webhook.yaml +++ /dev/null @@ -1,22 +0,0 @@ -# The following manifests contain a self-signed issuer CR and a certificate CR. -# More document can be found at https://docs.cert-manager.io -apiVersion: cert-manager.io/v1 -kind: Certificate -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: serving-cert # this name should match the one appeared in kustomizeconfig.yaml - namespace: system -spec: - # SERVICE_NAME and SERVICE_NAMESPACE will be substituted by kustomize - # replacements in the config/default/kustomization.yaml file. - dnsNames: - - SERVICE_NAME.SERVICE_NAMESPACE.svc - - SERVICE_NAME.SERVICE_NAMESPACE.svc.cluster.local - issuerRef: - kind: Issuer - name: selfsigned-issuer - secretName: webhook-server-cert - privateKey: - rotationPolicy: Always \ No newline at end of file diff --git a/config/certmanager/issuer.yaml b/config/certmanager/issuer.yaml deleted file mode 100644 index 52d4dc75..00000000 --- a/config/certmanager/issuer.yaml +++ /dev/null @@ -1,13 +0,0 @@ -# The following manifest contains a self-signed issuer CR. -# More information can be found at https://docs.cert-manager.io -# WARNING: Targets CertManager v1.0. Check https://cert-manager.io/docs/installation/upgrading/ for breaking changes. -apiVersion: cert-manager.io/v1 -kind: Issuer -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: selfsigned-issuer - namespace: system -spec: - selfSigned: {} diff --git a/config/certmanager/kustomization.yaml b/config/certmanager/kustomization.yaml deleted file mode 100644 index c24e7097..00000000 --- a/config/certmanager/kustomization.yaml +++ /dev/null @@ -1,8 +0,0 @@ -resources: -- issuer.yaml -- certificate-webhook.yaml -- certificate-audit-webhook.yaml -- certificate-metrics.yaml - -configurations: -- kustomizeconfig.yaml diff --git a/config/certmanager/kustomizeconfig.yaml b/config/certmanager/kustomizeconfig.yaml deleted file mode 100644 index cf6f89e8..00000000 --- a/config/certmanager/kustomizeconfig.yaml +++ /dev/null @@ -1,8 +0,0 @@ -# This configuration is for teaching kustomize how to update name ref substitution -nameReference: -- kind: Issuer - group: cert-manager.io - fieldSpecs: - - kind: Certificate - group: cert-manager.io - path: spec/issuerRef/name diff --git a/config/certs/certificates.yaml b/config/certs/certificates.yaml new file mode 100644 index 00000000..baa79522 --- /dev/null +++ b/config/certs/certificates.yaml @@ -0,0 +1,37 @@ +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + labels: + app.kubernetes.io/managed-by: kustomize + app.kubernetes.io/name: gitops-reverser + name: gitops-reverser-webhook-server-cert + namespace: sut +spec: + dnsNames: + - gitops-reverser-service.sut.svc + - gitops-reverser-service.sut.svc.cluster.local + issuerRef: + kind: Issuer + name: gitops-reverser-selfsigned-issuer + privateKey: + rotationPolicy: Always + secretName: webhook-server-cert +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + labels: + app.kubernetes.io/managed-by: kustomize + app.kubernetes.io/name: gitops-reverser + name: gitops-reverser-audit-server-cert + namespace: sut +spec: + dnsNames: + - gitops-reverser-service.sut.svc + - gitops-reverser-service.sut.svc.cluster.local + issuerRef: + kind: Issuer + name: gitops-reverser-selfsigned-issuer + privateKey: + rotationPolicy: Always + secretName: audit-webhook-server-cert diff --git a/config/certs/issuer.yaml b/config/certs/issuer.yaml new file mode 100644 index 00000000..e760ddd4 --- /dev/null +++ b/config/certs/issuer.yaml @@ -0,0 +1,12 @@ +apiVersion: cert-manager.io/v1 +kind: Issuer +metadata: + labels: + app.kubernetes.io/managed-by: kustomize + app.kubernetes.io/name: gitops-reverser + name: gitops-reverser-selfsigned-issuer + namespace: sut +spec: + selfSigned: {} + + diff --git a/config/certs/kustomization.yaml b/config/certs/kustomization.yaml new file mode 100644 index 00000000..58816ce6 --- /dev/null +++ b/config/certs/kustomization.yaml @@ -0,0 +1,5 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +resources: + - issuer.yaml + - certificates.yaml diff --git a/config/crd/bases/clusterwatchrules.configbutler.ai.yaml b/config/crd/bases/clusterwatchrules.configbutler.ai.yaml new file mode 100644 index 00000000..42eea57f --- /dev/null +++ b/config/crd/bases/clusterwatchrules.configbutler.ai.yaml @@ -0,0 +1,253 @@ +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: clusterwatchrules.configbutler.ai +spec: + group: configbutler.ai + names: + kind: ClusterWatchRule + listKind: ClusterWatchRuleList + plural: clusterwatchrules + singular: clusterwatchrule + scope: Cluster + versions: + - additionalPrinterColumns: + - jsonPath: .spec.destinationRef.name + name: Destination + type: string + - jsonPath: .status.conditions[?(@.type=="Ready")].status + name: Ready + type: string + - jsonPath: .metadata.creationTimestamp + name: Age + type: date + name: v1alpha1 + schema: + openAPIV3Schema: + description: |- + ClusterWatchRule watches resources across the entire cluster. + It provides the ability to audit both cluster-scoped resources (Nodes, ClusterRoles, CRDs) + and namespaced resources across multiple namespaces with per-rule filtering. + + Security model: + - ClusterWatchRule is cluster-scoped and requires cluster-admin permissions + - Referenced GitRepoConfig must have accessPolicy.allowClusterRules set to true + - Each rule can independently specify Cluster or Namespaced scope + - Namespaced rules can optionally filter by namespace labels + + Use cases: + - Audit cluster infrastructure (Nodes, PersistentVolumes, StorageClasses) + - Audit RBAC changes (ClusterRoles, ClusterRoleBindings) + - Audit CRD installations and updates + - Audit resources across multiple namespaces (e.g., all production namespaces) + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of ClusterWatchRule. + properties: + rules: + description: |- + Rules define which resources to watch. + Multiple rules create a logical OR - a resource matching ANY rule is watched. + Each rule can specify cluster-scoped or namespaced resources. + items: + description: |- + ClusterResourceRule defines which resources to watch with scope control. + Each rule can independently specify whether it watches cluster-scoped or + namespaced resources, with optional namespace filtering for namespaced resources. + properties: + apiGroups: + description: |- + APIGroups to match. Empty string ("") matches the core API group. + If empty, matches all API groups. + Wildcards supported: "*" matches all groups. + Examples: + - [""] matches core API (nodes, namespaces) + - ["rbac.authorization.k8s.io"] matches RBAC resources + - ["*"] or [] matches all groups + items: + type: string + type: array + apiVersions: + description: |- + APIVersions to match. If empty, matches all versions. + Wildcards supported: "*" matches all versions. + Examples: + - ["v1"] matches only v1 version + - ["*"] or [] matches all versions + items: + type: string + type: array + operations: + description: |- + Operations to watch. If empty, watches all operations (CREATE, UPDATE, DELETE). + Supports: CREATE, UPDATE, DELETE, or * (wildcard for all operations). + Examples: + - ["CREATE", "UPDATE"] watches only creation and updates + - ["*"] or [] watches all operations + items: + description: OperationType specifies the type of operation + that triggers a watch event. + enum: + - CREATE + - UPDATE + - DELETE + - '*' + type: string + type: array + resources: + description: |- + Resources to match (plural names like "nodes", "clusterroles"). + This field is required and determines which resource types trigger this rule. + Wildcard semantics follow Kubernetes admission webhook patterns: + - "*" matches all resources + - "nodes" matches exactly nodes + - "pods" matches exactly pods (for namespaced scope) + items: + type: string + minItems: 1 + type: array + scope: + allOf: + - enum: + - Cluster + - Namespaced + - enum: + - Cluster + - Namespaced + description: |- + Scope defines whether this rule watches Cluster-scoped or Namespaced resources. + - "Cluster": For cluster-scoped resources (Nodes, ClusterRoles, CRDs, etc.). + The namespaceSelector field is ignored for cluster-scoped rules. + - "Namespaced": For namespaced resources (Pods, Deployments, Secrets, etc.). + Optionally filtered by namespaceSelector. + If namespaceSelector is omitted, watches resources in ALL namespaces. + type: string + required: + - resources + - scope + type: object + minItems: 1 + type: array + targetRef: + description: |- + TargetRef references the GitTarget to use. + Must specify namespace. + properties: + group: + default: configbutler.ai + description: |- + API Group of the referent. + Kind of the referrer. + enum: + - configbutler.ai + type: string + kind: + default: GitTarget + description: |- + Kind of the referrer. + Optional because this reference currently only supports a single kind (GitTarget). + Keeping it optional allows users to omit it while still benefiting from CRD defaulting. + enum: + - GitTarget + type: string + name: + type: string + namespace: + description: Required because ClusterWatchRule has no namespace. + type: string + required: + - name + - namespace + type: object + required: + - rules + - targetRef + type: object + status: + description: status defines the observed state of ClusterWatchRule. + properties: + conditions: + description: Conditions represent the latest available observations + of the ClusterWatchRule's state. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + lastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + observedGeneration: + description: |- + observedGeneration represents the .metadata.generation that the condition was set based upon. + For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date + with respect to the current state of the instance. + format: int64 + minimum: 0 + type: integer + reason: + description: |- + reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: status of the condition, one of True, False, Unknown. + enum: + - "True" + - "False" + - Unknown + type: string + type: + description: type of condition in CamelCase or in foo.example.com/CamelCase. + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/config/crd/bases/gitproviders.configbutler.ai.yaml b/config/crd/bases/gitproviders.configbutler.ai.yaml new file mode 100644 index 00000000..19c6d198 --- /dev/null +++ b/config/crd/bases/gitproviders.configbutler.ai.yaml @@ -0,0 +1,168 @@ +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: gitproviders.configbutler.ai +spec: + group: configbutler.ai + names: + kind: GitProvider + listKind: GitProviderList + plural: gitproviders + singular: gitprovider + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: GitProvider is the Schema for the gitproviders API. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of GitProvider + properties: + allowedBranches: + description: AllowedBranches restricts which branches can be written + to. + items: + type: string + type: array + push: + description: Push defines the strategy for pushing commits (batching). + properties: + interval: + description: |- + Interval is the maximum time to wait before pushing queued commits. + Defaults to "1m". + type: string + maxCommits: + description: |- + MaxCommits is the maximum number of commits to queue before pushing. + Defaults to 20. + type: integer + type: object + secretRef: + description: SecretRef for authentication credentials (may be nil + for public repos) + properties: + group: + default: "" + description: Group of the referent. + type: string + kind: + default: Secret + description: Kind of the referent. + enum: + - Secret + type: string + name: + description: Name of the Secret. + minLength: 1 + type: string + required: + - name + type: object + url: + description: URL of the repository (HTTP/SSH) + type: string + required: + - allowedBranches + - url + type: object + status: + description: status defines the observed state of GitProvider + properties: + conditions: + description: |- + conditions represent the current state of the GitProvider resource. + Each condition has a unique type and reflects the status of a specific aspect of the resource. + + Standard condition types include: + - "Available": the resource is fully functional + - "Progressing": the resource is being created or updated + - "Degraded": the resource failed to reach or maintain its desired state + + The status of each condition is one of True, False, or Unknown. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + lastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + observedGeneration: + description: |- + observedGeneration represents the .metadata.generation that the condition was set based upon. + For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date + with respect to the current state of the instance. + format: int64 + minimum: 0 + type: integer + reason: + description: |- + reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: status of the condition, one of True, False, Unknown. + enum: + - "True" + - "False" + - Unknown + type: string + type: + description: type of condition in CamelCase or in foo.example.com/CamelCase. + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/config/crd/bases/gittargets.configbutler.ai.yaml b/config/crd/bases/gittargets.configbutler.ai.yaml new file mode 100644 index 00000000..9dbb4d1c --- /dev/null +++ b/config/crd/bases/gittargets.configbutler.ai.yaml @@ -0,0 +1,148 @@ +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: gittargets.configbutler.ai +spec: + group: configbutler.ai + names: + kind: GitTarget + listKind: GitTargetList + plural: gittargets + singular: gittarget + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: GitTarget is the Schema for the gittargets API. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of GitTarget + properties: + branch: + description: |- + Branch to use for this target. + Must be one of the allowed branches in the provider. + type: string + path: + description: Path within the repository to write resources to. + type: string + providerRef: + description: ProviderRef references the GitProvider or Flux GitRepository. + properties: + group: + default: configbutler.ai + description: API Group of the referent. + type: string + kind: + default: GitProvider + description: |- + Kind of the referent. + NOTE: Support for reading from Flux GitRepository is not yet implemented! + type: string + name: + description: Name of the referent. + type: string + required: + - name + type: object + required: + - branch + - providerRef + type: object + status: + description: status defines the observed state of GitTarget + properties: + conditions: + description: Conditions represent the latest available observations + of an object's state + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + lastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + observedGeneration: + description: |- + observedGeneration represents the .metadata.generation that the condition was set based upon. + For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date + with respect to the current state of the instance. + format: int64 + minimum: 0 + type: integer + reason: + description: |- + reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: status of the condition, one of True, False, Unknown. + enum: + - "True" + - "False" + - Unknown + type: string + type: + description: type of condition in CamelCase or in foo.example.com/CamelCase. + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + lastCommit: + description: LastCommit is the SHA of the last commit processed. + type: string + lastPushTime: + description: LastPushTime is the timestamp of the last successful + push. + format: date-time + type: string + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/config/crd/bases/watchrules.configbutler.ai.yaml b/config/crd/bases/watchrules.configbutler.ai.yaml new file mode 100644 index 00000000..6fc2e877 --- /dev/null +++ b/config/crd/bases/watchrules.configbutler.ai.yaml @@ -0,0 +1,232 @@ +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: watchrules.configbutler.ai +spec: + group: configbutler.ai + names: + kind: WatchRule + listKind: WatchRuleList + plural: watchrules + singular: watchrule + scope: Namespaced + versions: + - additionalPrinterColumns: + - jsonPath: .spec.destinationRef.name + name: Destination + type: string + - jsonPath: .status.conditions[?(@.type=="Ready")].status + name: Ready + type: string + - jsonPath: .metadata.creationTimestamp + name: Age + type: date + name: v1alpha1 + schema: + openAPIV3Schema: + description: |- + WatchRule watches namespaced resources within its own namespace. + It provides fine-grained control over which resources trigger Git commits, + with filtering by operation type, API group, version, and labels. + + Security model: + - WatchRule is namespace-scoped and can only watch resources in its own namespace + - Use ClusterWatchRule for watching cluster-scoped resources (Nodes, ClusterRoles, etc.) + - RBAC controls who can create/modify WatchRules per namespace + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of WatchRule + properties: + rules: + description: |- + Rules define which resources to watch within this namespace. + Multiple rules create a logical OR - a resource matching ANY rule is watched. + Each rule can specify operations, API groups, versions, and resource types. + items: + description: |- + ResourceRule defines a set of namespaced resources to watch. + This follows Kubernetes admission control semantics but simplified for our use case. + All fields except Resources are optional and default to matching all when not specified. + properties: + apiGroups: + description: |- + APIGroups to match. Empty string ("") matches the core API group. + If empty, matches all API groups. + Wildcards supported: "*" matches all groups. + Examples: + - [""] matches core API (pods, services, configmaps) + - ["apps"] matches apps API group (deployments, statefulsets) + - ["", "apps"] matches both core and apps groups + - ["*"] or [] matches all groups + items: + type: string + type: array + apiVersions: + description: |- + APIVersions to match. If empty, matches all versions. + Wildcards supported: "*" matches all versions. + Examples: + - ["v1"] matches only v1 version + - ["v1", "v1beta1"] matches both versions + - ["*"] or [] matches all versions + items: + type: string + type: array + operations: + description: |- + Operations to watch. If empty, watches all operations (CREATE, UPDATE, DELETE). + Supports: CREATE, UPDATE, DELETE, or * (wildcard for all operations). + Examples: + - ["CREATE", "UPDATE"] watches only creation and updates, ignoring deletions + - ["*"] or [] watches all operations + items: + description: OperationType specifies the type of operation + that triggers a watch event. + enum: + - CREATE + - UPDATE + - DELETE + - '*' + type: string + type: array + resources: + description: |- + Resources to match (plural names like "pods", "configmaps"). + This field is required and determines which resource types trigger this rule. + Wildcard semantics follow Kubernetes admission webhook patterns: + - "*" matches all resources + - "pods" matches exactly pods (case-insensitive) + - "pods/*" matches all pod subresources (e.g., pods/log, pods/status) + - "pods/log" matches specific subresource + + For custom resources, use exact group-qualified names: + - "myapps.example.com" matches MyApp CRD + + Note: Prefix/suffix wildcards like "pod*" or "*.example.com" are NOT supported. + Use exact matches or the "*" wildcard for broad matching. + items: + type: string + minItems: 1 + type: array + required: + - resources + type: object + minItems: 1 + type: array + targetRef: + description: |- + TargetRef references the GitTarget to use. + Must be in the same namespace. + properties: + group: + default: configbutler.ai + description: API Group of the referent. + type: string + kind: + default: GitTarget + description: |- + Kind of the referent. + Optional because this reference currently only supports a single kind (GitTarget). + Keeping it optional allows users to omit it while still benefiting from CRD defaulting. + enum: + - GitTarget + type: string + name: + type: string + required: + - name + type: object + required: + - rules + - targetRef + type: object + status: + description: status defines the observed state of WatchRule + properties: + conditions: + description: Conditions represent the latest available observations + of an object's state + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + lastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + observedGeneration: + description: |- + observedGeneration represents the .metadata.generation that the condition was set based upon. + For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date + with respect to the current state of the instance. + format: int64 + minimum: 0 + type: integer + reason: + description: |- + reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: status of the condition, one of True, False, Unknown. + enum: + - "True" + - "False" + - Unknown + type: string + type: + description: type of condition in CamelCase or in foo.example.com/CamelCase. + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} + + diff --git a/config/crd/kustomization.yaml b/config/crd/kustomization.yaml index 378fe339..67a0ce69 100644 --- a/config/crd/kustomization.yaml +++ b/config/crd/kustomization.yaml @@ -1,19 +1,7 @@ -# This kustomization.yaml is not intended to be run by itself, -# since it depends on service name and namespace that are out of this kustomize package. -# It should be run by config/default +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization resources: - - bases/configbutler.ai_watchrules.yaml - - bases/configbutler.ai_clusterwatchrules.yaml - - bases/configbutler.ai_gitproviders.yaml - - bases/configbutler.ai_gittargets.yaml -# +kubebuilder:scaffold:crdkustomizeresource - -patches: [] -# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix. -# patches here are for enabling the conversion webhook for each CRD -# +kubebuilder:scaffold:crdkustomizewebhookpatch - -# [WEBHOOK] To enable webhook, uncomment the following section -# the following config is for teaching kustomize how to do kustomization for CRDs. -#configurations: -#- kustomizeconfig.yaml + - bases/clusterwatchrules.configbutler.ai.yaml + - bases/gitproviders.configbutler.ai.yaml + - bases/gittargets.configbutler.ai.yaml + - bases/watchrules.configbutler.ai.yaml diff --git a/config/crd/kustomizeconfig.yaml b/config/crd/kustomizeconfig.yaml deleted file mode 100644 index ec5c150a..00000000 --- a/config/crd/kustomizeconfig.yaml +++ /dev/null @@ -1,19 +0,0 @@ -# This file is for teaching kustomize how to substitute name and namespace reference in CRD -nameReference: -- kind: Service - version: v1 - fieldSpecs: - - kind: CustomResourceDefinition - version: v1 - group: apiextensions.k8s.io - path: spec/conversion/webhook/clientConfig/service/name - -namespace: -- kind: CustomResourceDefinition - version: v1 - group: apiextensions.k8s.io - path: spec/conversion/webhook/clientConfig/service/namespace - create: false - -varReference: -- path: metadata/annotations diff --git a/config/default/cert_metrics_manager_patch.yaml b/config/default/cert_metrics_manager_patch.yaml deleted file mode 100644 index d9750155..00000000 --- a/config/default/cert_metrics_manager_patch.yaml +++ /dev/null @@ -1,30 +0,0 @@ -# This patch adds the args, volumes, and ports to allow the manager to use the metrics-server certs. - -# Add the volumeMount for the metrics-server certs -- op: add - path: /spec/template/spec/containers/0/volumeMounts/- - value: - mountPath: /tmp/k8s-metrics-server/metrics-certs - name: metrics-certs - readOnly: true - -# Add the --metrics-cert-path argument for the metrics server -- op: add - path: /spec/template/spec/containers/0/args/- - value: --metrics-cert-path=/tmp/k8s-metrics-server/metrics-certs - -# Add the metrics-server certs volume configuration -- op: add - path: /spec/template/spec/volumes/- - value: - name: metrics-certs - secret: - secretName: metrics-server-cert - optional: false - items: - - key: ca.crt - path: ca.crt - - key: tls.crt - path: tls.crt - - key: tls.key - path: tls.key diff --git a/config/default/kustomization.yaml b/config/default/kustomization.yaml deleted file mode 100644 index 231903e2..00000000 --- a/config/default/kustomization.yaml +++ /dev/null @@ -1,264 +0,0 @@ -# Adds namespace to all resources. -namespace: sut - -# Value of this field is prepended to the -# names of all resources, e.g. a deployment named -# "wordpress" becomes "alices-wordpress". -# Note that it should also match with the prefix (text before '-') of the namespace -# field above. -namePrefix: gitops-reverser- - -# Labels to add to all resources and selectors. -#labels: -#- includeSelectors: true -# pairs: -# someName: someValue - -resources: -- ../crd -- ../rbac -- ../manager -# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in -# crd/kustomization.yaml -- ../webhook -# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required. -- ../certmanager -# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'. -#- ../prometheus -# [NETWORK POLICY] Protect the /metrics endpoint and Webhook Server with NetworkPolicy. -# Only Pod(s) running a namespace labeled with 'metrics: enabled' will be able to gather the metrics. -# Only CR(s) which requires webhooks and are applied on namespaces labeled with 'webhooks: enabled' will -# be able to communicate with the Webhook Server. -#- ../network-policy - -# Patches are only kept for optional feature wiring. -patches: - -# Uncomment the patches line if you enable Metrics and CertManager -# [METRICS-WITH-CERTS] To enable metrics protected with certManager, uncomment the following line. -# This patch will protect the metrics with certManager self-signed certs. -#- path: cert_metrics_manager_patch.yaml -# target: -# kind: Deployment - -# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in -# crd/kustomization.yaml -- path: manager_webhook_patch.yaml - target: - kind: Deployment - -# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix. -# Uncomment the following replacements to add the cert-manager CA injection annotations -replacements: -# - source: # Uncomment the following block to enable certificates for metrics -# kind: Service -# version: v1 -# name: controller-manager-metrics-service -# fieldPath: metadata.name -# targets: -# - select: -# kind: Certificate -# group: cert-manager.io -# version: v1 -# name: metrics-certs -# fieldPaths: -# - spec.dnsNames.0 -# - spec.dnsNames.1 -# options: -# delimiter: '.' -# index: 0 -# create: true -# - select: # Uncomment the following to set the Service name for TLS config in Prometheus ServiceMonitor -# kind: ServiceMonitor -# group: monitoring.coreos.com -# version: v1 -# name: controller-manager-metrics-monitor -# fieldPaths: -# - spec.endpoints.0.tlsConfig.serverName -# options: -# delimiter: '.' -# index: 0 -# create: true - -# - source: -# kind: Service -# version: v1 -# name: controller-manager-metrics-service -# fieldPath: metadata.namespace -# targets: -# - select: -# kind: Certificate -# group: cert-manager.io -# version: v1 -# name: metrics-certs -# fieldPaths: -# - spec.dnsNames.0 -# - spec.dnsNames.1 -# options: -# delimiter: '.' -# index: 1 -# create: true -# - select: # Uncomment the following to set the Service namespace for TLS in Prometheus ServiceMonitor -# kind: ServiceMonitor -# group: monitoring.coreos.com -# version: v1 -# name: controller-manager-metrics-monitor -# fieldPaths: -# - spec.endpoints.0.tlsConfig.serverName -# options: -# delimiter: '.' -# index: 1 -# create: true - - - source: # Uncomment the following block if you have any webhook - kind: Service - version: v1 - name: service - fieldPath: .metadata.name # Name of the service - targets: - - select: - kind: Certificate - group: cert-manager.io - version: v1 - name: serving-cert - fieldPaths: - - .spec.dnsNames.0 - - .spec.dnsNames.1 - options: - delimiter: '.' - index: 0 - create: true - - source: - kind: Service - version: v1 - name: service - fieldPath: .metadata.namespace # Namespace of the service - targets: - - select: - kind: Certificate - group: cert-manager.io - version: v1 - name: serving-cert - fieldPaths: - - .spec.dnsNames.0 - - .spec.dnsNames.1 - options: - delimiter: '.' - index: 1 - create: true - - - source: - kind: Service - version: v1 - name: service - fieldPath: .metadata.name # Name of the dedicated audit service - targets: - - select: - kind: Certificate - group: cert-manager.io - version: v1 - name: audit-serving-cert - fieldPaths: - - .spec.dnsNames.0 - - .spec.dnsNames.1 - options: - delimiter: '.' - index: 0 - create: true - - source: - kind: Service - version: v1 - name: service - fieldPath: .metadata.namespace # Namespace of the dedicated audit service - targets: - - select: - kind: Certificate - group: cert-manager.io - version: v1 - name: audit-serving-cert - fieldPaths: - - .spec.dnsNames.0 - - .spec.dnsNames.1 - options: - delimiter: '.' - index: 1 - create: true - - - source: # Uncomment the following block if you have a ValidatingWebhook (--programmatic-validation) - kind: Certificate - group: cert-manager.io - version: v1 - name: serving-cert # This name should match the one in certificate.yaml - fieldPath: .metadata.namespace # Namespace of the certificate CR - targets: - - select: - kind: ValidatingWebhookConfiguration - fieldPaths: - - .metadata.annotations.[cert-manager.io/inject-ca-from] - options: - delimiter: '/' - index: 0 - create: true - - source: - kind: Certificate - group: cert-manager.io - version: v1 - name: serving-cert - fieldPath: .metadata.name - targets: - - select: - kind: ValidatingWebhookConfiguration - fieldPaths: - - .metadata.annotations.[cert-manager.io/inject-ca-from] - options: - delimiter: '/' - index: 1 - create: true - - - source: # Uncomment the following block if you have a DefaultingWebhook (--defaulting ) - kind: Certificate - group: cert-manager.io - version: v1 - name: serving-cert - fieldPath: .metadata.namespace # Namespace of the certificate CR - targets: - - select: - kind: MutatingWebhookConfiguration - fieldPaths: - - .metadata.annotations.[cert-manager.io/inject-ca-from] - options: - delimiter: '/' - index: 0 - create: true - - source: - kind: Certificate - group: cert-manager.io - version: v1 - name: serving-cert - fieldPath: .metadata.name - targets: - - select: - kind: MutatingWebhookConfiguration - fieldPaths: - - .metadata.annotations.[cert-manager.io/inject-ca-from] - options: - delimiter: '/' - index: 1 - create: true - -# - source: # Uncomment the following block if you have a ConversionWebhook (--conversion) -# kind: Certificate -# group: cert-manager.io -# version: v1 -# name: serving-cert -# fieldPath: .metadata.namespace # Namespace of the certificate CR -# targets: # Do not remove or uncomment the following scaffold marker; required to generate code for target CRD. -# +kubebuilder:scaffold:crdkustomizecainjectionns -# - source: -# kind: Certificate -# group: cert-manager.io -# version: v1 -# name: serving-cert -# fieldPath: .metadata.name -# targets: # Do not remove or uncomment the following scaffold marker; required to generate code for target CRD. -# +kubebuilder:scaffold:crdkustomizecainjectionname diff --git a/config/default/manager_webhook_patch.yaml b/config/default/manager_webhook_patch.yaml deleted file mode 100644 index 2e716303..00000000 --- a/config/default/manager_webhook_patch.yaml +++ /dev/null @@ -1,60 +0,0 @@ -# This patch ensures the webhook certificates are properly mounted in the manager container. -# It configures the necessary arguments, volumes, volume mounts, and container ports. - -# Add the --webhook-cert-path argument for configuring the webhook certificate path -- op: add - path: /spec/template/spec/containers/0/args/- - value: --webhook-cert-path=/tmp/k8s-webhook-server/serving-certs - -# Add the volumeMount for the webhook certificates -- op: add - path: /spec/template/spec/containers/0/volumeMounts/- - value: - mountPath: /tmp/k8s-webhook-server/serving-certs - name: webhook-certs - readOnly: true - -# Add the port configuration for the webhook server -- op: add - path: /spec/template/spec/containers/0/ports/- - value: - containerPort: 9443 - name: webhook-server - protocol: TCP - -# Add the dedicated audit ingress certificate path. -- op: add - path: /spec/template/spec/containers/0/args/- - value: --audit-cert-path=/tmp/k8s-audit-webhook-server/serving-certs - -# Add the volumeMount for dedicated audit webhook certificates. -- op: add - path: /spec/template/spec/containers/0/volumeMounts/- - value: - mountPath: /tmp/k8s-audit-webhook-server/serving-certs - name: audit-webhook-certs - readOnly: true - -# Add dedicated audit ingress container port. -- op: add - path: /spec/template/spec/containers/0/ports/- - value: - containerPort: 9444 - name: audit-server - protocol: TCP - -# Add the volume configuration for the webhook certificates -- op: add - path: /spec/template/spec/volumes/- - value: - name: webhook-certs - secret: - secretName: webhook-server-cert - -# Add the volume configuration for the dedicated audit ingress certificates. -- op: add - path: /spec/template/spec/volumes/- - value: - name: audit-webhook-certs - secret: - secretName: audit-webhook-server-cert diff --git a/config/manager/kustomization.yaml b/config/kustomization.yaml similarity index 71% rename from config/manager/kustomization.yaml rename to config/kustomization.yaml index d2e4a38c..beb057a5 100644 --- a/config/manager/kustomization.yaml +++ b/config/kustomization.yaml @@ -1,7 +1,13 @@ -resources: -- manager.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization +resources: +- namespace.yaml +- crd +- rbac +- service.yaml +- manager.yaml +- certs +- webhook.yaml images: - name: controller newName: example.com/gitops-reverser diff --git a/config/manager.yaml b/config/manager.yaml new file mode 100644 index 00000000..e6823fbd --- /dev/null +++ b/config/manager.yaml @@ -0,0 +1,99 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app.kubernetes.io/managed-by: kustomize + app.kubernetes.io/name: gitops-reverser + control-plane: controller-manager + name: gitops-reverser-controller-manager + namespace: sut +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: gitops-reverser + control-plane: controller-manager + template: + metadata: + annotations: + kubectl.kubernetes.io/default-container: manager + labels: + app.kubernetes.io/name: gitops-reverser + control-plane: controller-manager + spec: + containers: + - args: + - --metrics-bind-address=:8443 + - --metrics-insecure + - --health-probe-bind-address=:8081 + - --webhook-cert-path=/tmp/k8s-webhook-server/webhook-server-certs + - --audit-cert-path=/tmp/k8s-audit-webhook-server/webhook-server-certs + command: + - /manager + env: + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: POD_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + image: example.com/gitops-reverser:v0.0.1 + livenessProbe: + httpGet: + path: /healthz + port: 8081 + initialDelaySeconds: 15 + periodSeconds: 20 + name: manager + ports: + - containerPort: 9443 + name: webhook-server + protocol: TCP + - containerPort: 9444 + name: audit-server + protocol: TCP + readinessProbe: + httpGet: + path: /readyz + port: 8081 + initialDelaySeconds: 5 + periodSeconds: 10 + resources: + limits: + cpu: 500m + memory: 128Mi + requests: + cpu: 10m + memory: 64Mi + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + volumeMounts: + - mountPath: /tmp + name: tmp-dir + - mountPath: /tmp/k8s-webhook-server/webhook-server-certs + name: webhook-certs + readOnly: true + - mountPath: /tmp/k8s-audit-webhook-server/webhook-server-certs + name: audit-webhook-certs + readOnly: true + securityContext: + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault + serviceAccountName: gitops-reverser-controller-manager + terminationGracePeriodSeconds: 10 + volumes: + - emptyDir: {} + name: tmp-dir + - name: webhook-certs + secret: + secretName: webhook-server-cert + - name: audit-webhook-certs + secret: + secretName: audit-webhook-server-cert diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml deleted file mode 100644 index c9ee57d8..00000000 --- a/config/manager/manager.yaml +++ /dev/null @@ -1,112 +0,0 @@ -apiVersion: v1 -kind: Namespace -metadata: - labels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: system ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: controller-manager - namespace: system - labels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize -spec: - selector: - matchLabels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser - replicas: 1 - template: - metadata: - annotations: - kubectl.kubernetes.io/default-container: manager - labels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser - spec: - # TODO(user): Uncomment the following code to configure the nodeAffinity expression - # according to the platforms which are supported by your solution. - # It is considered best practice to support multiple architectures. You can - # build your manager image using the makefile target docker-buildx. - # affinity: - # nodeAffinity: - # requiredDuringSchedulingIgnoredDuringExecution: - # nodeSelectorTerms: - # - matchExpressions: - # - key: kubernetes.io/arch - # operator: In - # values: - # - amd64 - # - arm64 - # - ppc64le - # - s390x - # - key: kubernetes.io/os - # operator: In - # values: - # - linux - securityContext: - # Projects are configured by default to adhere to the "restricted" Pod Security Standards. - # This ensures that deployments meet the highest security requirements for Kubernetes. - # For more details, see: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted - runAsNonRoot: true - seccompProfile: - type: RuntimeDefault - containers: - - command: - - /manager - args: - - --metrics-bind-address=:8443 - - --health-probe-bind-address=:8081 - image: controller:latest - name: manager - env: - - name: POD_NAME - valueFrom: - fieldRef: - fieldPath: metadata.name - - name: POD_NAMESPACE - valueFrom: - fieldRef: - fieldPath: metadata.namespace - ports: [] - securityContext: - readOnlyRootFilesystem: true - allowPrivilegeEscalation: false - capabilities: - drop: - - "ALL" - livenessProbe: - httpGet: - path: /healthz - port: 8081 - initialDelaySeconds: 15 - periodSeconds: 20 - readinessProbe: - httpGet: - path: /readyz - port: 8081 - initialDelaySeconds: 5 - periodSeconds: 10 - # TODO(user): Configure the resources accordingly based on the project requirements. - # More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ - resources: - limits: - cpu: 500m - memory: 128Mi - requests: - cpu: 10m - memory: 64Mi - volumeMounts: - - name: tmp-dir - mountPath: /tmp - volumes: - - name: tmp-dir - emptyDir: {} - serviceAccountName: controller-manager - terminationGracePeriodSeconds: 10 diff --git a/config/namespace.yaml b/config/namespace.yaml new file mode 100644 index 00000000..14e972db --- /dev/null +++ b/config/namespace.yaml @@ -0,0 +1,8 @@ +apiVersion: v1 +kind: Namespace +metadata: + labels: + app.kubernetes.io/managed-by: kustomize + app.kubernetes.io/name: gitops-reverser + control-plane: controller-manager + name: sut diff --git a/config/network-policy/allow-metrics-traffic.yaml b/config/network-policy/allow-metrics-traffic.yaml deleted file mode 100644 index 35673a36..00000000 --- a/config/network-policy/allow-metrics-traffic.yaml +++ /dev/null @@ -1,27 +0,0 @@ -# This NetworkPolicy allows ingress traffic -# with Pods running on namespaces labeled with 'metrics: enabled'. Only Pods on those -# namespaces are able to gather data from the metrics endpoint. -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: allow-metrics-traffic - namespace: system -spec: - podSelector: - matchLabels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser - policyTypes: - - Ingress - ingress: - # This allows ingress traffic from any namespace with the label metrics: enabled - - from: - - namespaceSelector: - matchLabels: - metrics: enabled # Only from namespaces with this label - ports: - - port: 8443 - protocol: TCP diff --git a/config/network-policy/allow-webhook-traffic.yaml b/config/network-policy/allow-webhook-traffic.yaml deleted file mode 100644 index 169327bf..00000000 --- a/config/network-policy/allow-webhook-traffic.yaml +++ /dev/null @@ -1,27 +0,0 @@ -# This NetworkPolicy allows ingress traffic to your webhook server running -# as part of the controller-manager from specific namespaces and pods. CR(s) which uses webhooks -# will only work when applied in namespaces labeled with 'webhook: enabled' -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: allow-webhook-traffic - namespace: system -spec: - podSelector: - matchLabels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser - policyTypes: - - Ingress - ingress: - # This allows ingress traffic from any namespace with the label webhook: enabled - - from: - - namespaceSelector: - matchLabels: - webhook: enabled # Only from namespaces with this label - ports: - - port: 443 - protocol: TCP diff --git a/config/network-policy/kustomization.yaml b/config/network-policy/kustomization.yaml deleted file mode 100644 index 0872bee1..00000000 --- a/config/network-policy/kustomization.yaml +++ /dev/null @@ -1,3 +0,0 @@ -resources: -- allow-webhook-traffic.yaml -- allow-metrics-traffic.yaml diff --git a/config/prometheus/kustomization.yaml b/config/prometheus/kustomization.yaml deleted file mode 100644 index fdc5481b..00000000 --- a/config/prometheus/kustomization.yaml +++ /dev/null @@ -1,11 +0,0 @@ -resources: -- monitor.yaml - -# [PROMETHEUS-WITH-CERTS] The following patch configures the ServiceMonitor in ../prometheus -# to securely reference certificates created and managed by cert-manager. -# Additionally, ensure that you uncomment the [METRICS WITH CERTMANAGER] patch under config/default/kustomization.yaml -# to mount the "metrics-server-cert" secret in the Manager Deployment. -#patches: -# - path: monitor_tls_patch.yaml -# target: -# kind: ServiceMonitor diff --git a/config/prometheus/monitor.yaml b/config/prometheus/monitor.yaml deleted file mode 100644 index c21839b3..00000000 --- a/config/prometheus/monitor.yaml +++ /dev/null @@ -1,27 +0,0 @@ -# Prometheus Monitor Service (Metrics) -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - labels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: controller-manager-metrics-monitor - namespace: system -spec: - endpoints: - - path: /metrics - port: https # Ensure this is the name of the port that exposes HTTPS metrics - scheme: https - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token - tlsConfig: - # TODO(user): The option insecureSkipVerify: true is not recommended for production since it disables - # certificate verification, exposing the system to potential man-in-the-middle attacks. - # For production environments, it is recommended to use cert-manager for automatic TLS certificate management. - # To apply this configuration, enable cert-manager and use the patch located at config/prometheus/servicemonitor_tls_patch.yaml, - # which securely references the certificate from the 'metrics-server-cert' secret. - insecureSkipVerify: true - selector: - matchLabels: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser diff --git a/config/prometheus/monitor_tls_patch.yaml b/config/prometheus/monitor_tls_patch.yaml deleted file mode 100644 index 5bf84ce0..00000000 --- a/config/prometheus/monitor_tls_patch.yaml +++ /dev/null @@ -1,19 +0,0 @@ -# Patch for Prometheus ServiceMonitor to enable secure TLS configuration -# using certificates managed by cert-manager -- op: replace - path: /spec/endpoints/0/tlsConfig - value: - # SERVICE_NAME and SERVICE_NAMESPACE will be substituted by kustomize - serverName: SERVICE_NAME.SERVICE_NAMESPACE.svc - insecureSkipVerify: false - ca: - secret: - name: metrics-server-cert - key: ca.crt - cert: - secret: - name: metrics-server-cert - key: tls.crt - keySecret: - name: metrics-server-cert - key: tls.key diff --git a/config/rbac/service_account.yaml b/config/rbac/gitops-reverser-controller-manager.yaml similarity index 70% rename from config/rbac/service_account.yaml rename to config/rbac/gitops-reverser-controller-manager.yaml index 62fc0094..d726578e 100644 --- a/config/rbac/service_account.yaml +++ b/config/rbac/gitops-reverser-controller-manager.yaml @@ -2,7 +2,7 @@ apiVersion: v1 kind: ServiceAccount metadata: labels: - app.kubernetes.io/name: gitops-reverser app.kubernetes.io/managed-by: kustomize - name: controller-manager - namespace: system + app.kubernetes.io/name: gitops-reverser + name: gitops-reverser-controller-manager + namespace: sut diff --git a/config/rbac/test_user_role_binding.yaml b/config/rbac/gitops-reverser-demo-jane-access.yaml similarity index 85% rename from config/rbac/test_user_role_binding.yaml rename to config/rbac/gitops-reverser-demo-jane-access.yaml index 5c0a390f..9d02eb7c 100644 --- a/config/rbac/test_user_role_binding.yaml +++ b/config/rbac/gitops-reverser-demo-jane-access.yaml @@ -1,7 +1,7 @@ apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: - name: demo-jane-access + name: gitops-reverser-demo-jane-access roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole diff --git a/config/rbac/gitops-reverser-manager-role.yaml b/config/rbac/gitops-reverser-manager-role.yaml new file mode 100644 index 00000000..ca36f738 --- /dev/null +++ b/config/rbac/gitops-reverser-manager-role.yaml @@ -0,0 +1,54 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: gitops-reverser-manager-role +rules: +- apiGroups: + - "" + resources: + - namespaces + - secrets + verbs: + - get + - list + - watch +- apiGroups: + - '*' + resources: + - '*' + verbs: + - get + - list + - watch +- apiGroups: + - configbutler.ai + resources: + - clusterwatchrules + - gitproviders + - gittargets + - watchrules + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +- apiGroups: + - configbutler.ai + resources: + - clusterwatchrules/status + - gitproviders/status + - gittargets/status + - watchrules/status + verbs: + - get + - patch + - update +- apiGroups: + - configbutler.ai + resources: + - gitproviders/finalizers + verbs: + - update diff --git a/config/rbac/role_binding.yaml b/config/rbac/gitops-reverser-manager-rolebinding.yaml similarity index 66% rename from config/rbac/role_binding.yaml rename to config/rbac/gitops-reverser-manager-rolebinding.yaml index 75c9905c..5eb0f83e 100644 --- a/config/rbac/role_binding.yaml +++ b/config/rbac/gitops-reverser-manager-rolebinding.yaml @@ -2,14 +2,14 @@ apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: - app.kubernetes.io/name: gitops-reverser app.kubernetes.io/managed-by: kustomize - name: manager-rolebinding + app.kubernetes.io/name: gitops-reverser + name: gitops-reverser-manager-rolebinding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole - name: manager-role + name: gitops-reverser-manager-role subjects: - kind: ServiceAccount - name: controller-manager - namespace: system + name: gitops-reverser-controller-manager + namespace: sut diff --git a/config/rbac/gitprovider_admin_role.yaml b/config/rbac/gitprovider_admin_role.yaml deleted file mode 100644 index d4c77df4..00000000 --- a/config/rbac/gitprovider_admin_role.yaml +++ /dev/null @@ -1,27 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants full permissions ('*') over configbutler.ai. -# This role is intended for users authorized to modify roles and bindings within the cluster, -# enabling them to delegate specific permissions to other users or groups as needed. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: gitprovider-admin-role -rules: -- apiGroups: - - configbutler.ai - resources: - - gitproviders - verbs: - - '*' -- apiGroups: - - configbutler.ai - resources: - - gitproviders/status - verbs: - - get diff --git a/config/rbac/gitprovider_editor_role.yaml b/config/rbac/gitprovider_editor_role.yaml deleted file mode 100644 index 69d0d35e..00000000 --- a/config/rbac/gitprovider_editor_role.yaml +++ /dev/null @@ -1,33 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants permissions to create, update, and delete resources within the configbutler.ai. -# This role is intended for users who need to manage these resources -# but should not control RBAC or manage permissions for others. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: gitprovider-editor-role -rules: -- apiGroups: - - configbutler.ai - resources: - - gitproviders - verbs: - - create - - delete - - get - - list - - patch - - update - - watch -- apiGroups: - - configbutler.ai - resources: - - gitproviders/status - verbs: - - get diff --git a/config/rbac/gitprovider_viewer_role.yaml b/config/rbac/gitprovider_viewer_role.yaml deleted file mode 100644 index 027012d4..00000000 --- a/config/rbac/gitprovider_viewer_role.yaml +++ /dev/null @@ -1,29 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants read-only access to configbutler.ai resources. -# This role is intended for users who need visibility into these resources -# without permissions to modify them. It is ideal for monitoring purposes and limited-access viewing. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: gitprovider-viewer-role -rules: -- apiGroups: - - configbutler.ai - resources: - - gitproviders - verbs: - - get - - list - - watch -- apiGroups: - - configbutler.ai - resources: - - gitproviders/status - verbs: - - get diff --git a/config/rbac/gittarget_admin_role.yaml b/config/rbac/gittarget_admin_role.yaml deleted file mode 100644 index 0122ca4c..00000000 --- a/config/rbac/gittarget_admin_role.yaml +++ /dev/null @@ -1,27 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants full permissions ('*') over configbutler.ai. -# This role is intended for users authorized to modify roles and bindings within the cluster, -# enabling them to delegate specific permissions to other users or groups as needed. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: gittarget-admin-role -rules: -- apiGroups: - - configbutler.ai - resources: - - gittargets - verbs: - - '*' -- apiGroups: - - configbutler.ai - resources: - - gittargets/status - verbs: - - get diff --git a/config/rbac/gittarget_editor_role.yaml b/config/rbac/gittarget_editor_role.yaml deleted file mode 100644 index 6adbedb5..00000000 --- a/config/rbac/gittarget_editor_role.yaml +++ /dev/null @@ -1,33 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants permissions to create, update, and delete resources within the configbutler.ai. -# This role is intended for users who need to manage these resources -# but should not control RBAC or manage permissions for others. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: gittarget-editor-role -rules: -- apiGroups: - - configbutler.ai - resources: - - gittargets - verbs: - - create - - delete - - get - - list - - patch - - update - - watch -- apiGroups: - - configbutler.ai - resources: - - gittargets/status - verbs: - - get diff --git a/config/rbac/gittarget_viewer_role.yaml b/config/rbac/gittarget_viewer_role.yaml deleted file mode 100644 index b6285b70..00000000 --- a/config/rbac/gittarget_viewer_role.yaml +++ /dev/null @@ -1,29 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants read-only access to configbutler.ai resources. -# This role is intended for users who need visibility into these resources -# without permissions to modify them. It is ideal for monitoring purposes and limited-access viewing. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: gittarget-viewer-role -rules: -- apiGroups: - - configbutler.ai - resources: - - gittargets - verbs: - - get - - list - - watch -- apiGroups: - - configbutler.ai - resources: - - gittargets/status - verbs: - - get diff --git a/config/rbac/kustomization.yaml b/config/rbac/kustomization.yaml index e1fbf0f9..f88dc735 100644 --- a/config/rbac/kustomization.yaml +++ b/config/rbac/kustomization.yaml @@ -1,36 +1,10 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization resources: - # All RBAC will be applied under this service account in - # the deployment namespace. You may comment out this resource - # if your manager will use a service account that exists at - # runtime. Be sure to update RoleBinding and ClusterRoleBinding - # subjects if changing service account names. - - service_account.yaml - - role.yaml - - role_binding.yaml - - test_user_role_binding.yaml - # The following RBAC configurations are used to protect - # the metrics endpoint with authn/authz. These configurations - # ensure that only authorized users and service accounts - # can access the metrics endpoint. Comment the following - # permissions if you want to disable this protection. - # More info: https://book.kubebuilder.io/reference/metrics.html - - metrics_auth_role.yaml - - metrics_auth_role_binding.yaml - - metrics_reader_role.yaml - # For each CRD, "Admin", "Editor" and "Viewer" roles are scaffolded by - # default, aiding admins in cluster management. Those roles are - # not used by the gitops-reverser itself. You can comment the following lines - # if you do not want those helpers be installed with your Project. - - watchrule_admin_role.yaml - - watchrule_editor_role.yaml - - watchrule_viewer_role.yaml - # For each CRD, "Admin", "Editor" and "Viewer" roles are scaffolded by - # default, aiding admins in cluster management. Those roles are - # not used by the gitops-reverser itself. You can comment the following lines - # if you do not want those helpers be installed with your Project. - - gittarget_admin_role.yaml - - gittarget_editor_role.yaml - - gittarget_viewer_role.yaml - - gitprovider_admin_role.yaml - - gitprovider_editor_role.yaml - - gitprovider_viewer_role.yaml + # Service account used by the controller manager deployment. + - gitops-reverser-controller-manager.yaml + # Main runtime permissions for the controller. + - gitops-reverser-manager-role.yaml + - gitops-reverser-manager-rolebinding.yaml + # E2E-only helper: allows impersonated user jane@acme.com writes in tests. + - gitops-reverser-demo-jane-access.yaml diff --git a/config/rbac/metrics_auth_role.yaml b/config/rbac/metrics_auth_role.yaml deleted file mode 100644 index 32d2e4ec..00000000 --- a/config/rbac/metrics_auth_role.yaml +++ /dev/null @@ -1,17 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: metrics-auth-role -rules: -- apiGroups: - - authentication.k8s.io - resources: - - tokenreviews - verbs: - - create -- apiGroups: - - authorization.k8s.io - resources: - - subjectaccessreviews - verbs: - - create diff --git a/config/rbac/metrics_auth_role_binding.yaml b/config/rbac/metrics_auth_role_binding.yaml deleted file mode 100644 index e775d67f..00000000 --- a/config/rbac/metrics_auth_role_binding.yaml +++ /dev/null @@ -1,12 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: metrics-auth-rolebinding -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: metrics-auth-role -subjects: -- kind: ServiceAccount - name: controller-manager - namespace: system diff --git a/config/rbac/metrics_reader_role.yaml b/config/rbac/metrics_reader_role.yaml deleted file mode 100644 index 51a75db4..00000000 --- a/config/rbac/metrics_reader_role.yaml +++ /dev/null @@ -1,9 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: metrics-reader -rules: -- nonResourceURLs: - - "/metrics" - verbs: - - get diff --git a/config/rbac/watchrule_admin_role.yaml b/config/rbac/watchrule_admin_role.yaml deleted file mode 100644 index c98c941f..00000000 --- a/config/rbac/watchrule_admin_role.yaml +++ /dev/null @@ -1,27 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants full permissions ('*') over configbutler.ai. -# This role is intended for users authorized to modify roles and bindings within the cluster, -# enabling them to delegate specific permissions to other users or groups as needed. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: watchrule-admin-role -rules: -- apiGroups: - - configbutler.ai - resources: - - watchrules - verbs: - - '*' -- apiGroups: - - configbutler.ai - resources: - - watchrules/status - verbs: - - get diff --git a/config/rbac/watchrule_editor_role.yaml b/config/rbac/watchrule_editor_role.yaml deleted file mode 100644 index 02dd472f..00000000 --- a/config/rbac/watchrule_editor_role.yaml +++ /dev/null @@ -1,33 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants permissions to create, update, and delete resources within the configbutler.ai. -# This role is intended for users who need to manage these resources -# but should not control RBAC or manage permissions for others. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: watchrule-editor-role -rules: -- apiGroups: - - configbutler.ai - resources: - - watchrules - verbs: - - create - - delete - - get - - list - - patch - - update - - watch -- apiGroups: - - configbutler.ai - resources: - - watchrules/status - verbs: - - get diff --git a/config/rbac/watchrule_viewer_role.yaml b/config/rbac/watchrule_viewer_role.yaml deleted file mode 100644 index 48770b9e..00000000 --- a/config/rbac/watchrule_viewer_role.yaml +++ /dev/null @@ -1,29 +0,0 @@ -# This rule is not used by the project gitops-reverser itself. -# It is provided to allow the cluster admin to help manage permissions for users. -# -# Grants read-only access to configbutler.ai resources. -# This role is intended for users who need visibility into these resources -# without permissions to modify them. It is ideal for monitoring purposes and limited-access viewing. - -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: watchrule-viewer-role -rules: -- apiGroups: - - configbutler.ai - resources: - - watchrules - verbs: - - get - - list - - watch -- apiGroups: - - configbutler.ai - resources: - - watchrules/status - verbs: - - get diff --git a/config/samples/clusterwatchrule.yaml b/config/samples/clusterwatchrule.yaml deleted file mode 100644 index d8d08a2b..00000000 --- a/config/samples/clusterwatchrule.yaml +++ /dev/null @@ -1,33 +0,0 @@ -apiVersion: configbutler.ai/v1alpha1 -kind: ClusterWatchRule -metadata: - name: clusterwatchrule-sample -spec: - gitProviderRef: - name: sample - namespace: gitops-reverser-system - rules: - # Rule 1: Watch cluster-scoped resources (Nodes) - - scope: Cluster - operations: [CREATE, UPDATE, DELETE] - apiGroups: [""] - resources: [nodes] - - # Rule 2: Watch cluster-scoped RBAC resources - - scope: Cluster - apiGroups: [rbac.authorization.k8s.io] - resources: [clusterroles, clusterrolebindings] - - # Rule 3: Watch Deployments in ALL namespaces - - scope: Namespaced - apiGroups: [apps] - resources: [deployments] - # No namespaceSelector = all namespaces - - # Rule 4: Watch Secrets only in PCI-compliant namespaces - - scope: Namespaced - apiGroups: [""] - resources: [secrets] - namespaceSelector: - matchLabels: - compliance: pci diff --git a/config/samples/gitprovider.yaml b/config/samples/gitprovider.yaml deleted file mode 100644 index e67c71ed..00000000 --- a/config/samples/gitprovider.yaml +++ /dev/null @@ -1,16 +0,0 @@ -apiVersion: configbutler.ai/v1alpha1 -kind: GitProvider -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: sample -spec: - repoUrl: "http://gitea-http.gitea-e2e.svc.cluster.local:13000/testorg/testrepo.git" - allowedBranches: - - "main" - secretRef: - name: "git-creds" - push: - interval: "1m" - maxCommits: 20 diff --git a/config/samples/gittarget.yaml b/config/samples/gittarget.yaml deleted file mode 100644 index f7a18024..00000000 --- a/config/samples/gittarget.yaml +++ /dev/null @@ -1,13 +0,0 @@ -apiVersion: configbutler.ai/v1alpha1 -kind: GitTarget -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: sample - namespace: default -spec: - gitProviderRef: - name: gitrepoconfig-sample - branch: main - baseFolder: clusters/default diff --git a/config/samples/kustomization.yaml b/config/samples/kustomization.yaml deleted file mode 100644 index eacc5f8d..00000000 --- a/config/samples/kustomization.yaml +++ /dev/null @@ -1,7 +0,0 @@ -## Append samples of your project ## -resources: - - clusterwatchrule.yaml - - watchrule.yaml - - gittarget.yaml - - gitprovider.yaml -# +kubebuilder:scaffold:manifestskustomizesamples diff --git a/config/samples/watchrule.yaml b/config/samples/watchrule.yaml deleted file mode 100644 index 116bdaa3..00000000 --- a/config/samples/watchrule.yaml +++ /dev/null @@ -1,39 +0,0 @@ -apiVersion: configbutler.ai/v1alpha1 -kind: WatchRule -metadata: - name: watchrule-sample - namespace: default -spec: - # Reference to GitRepoConfig - # If namespace is not specified, defaults to WatchRule's namespace - gitRepoConfigRef: - name: gitrepoconfig-sample - # namespace: default # Optional - defaults to WatchRule's namespace - - # Optional: Filter resources by labels - # This example includes resources with app=production and excludes those with ignore label - objectSelector: - matchExpressions: - - key: app - operator: In - values: [production] - - key: gitops-reverser.io/ignore - operator: DoesNotExist - - # Define which resources to watch (logical OR - matching ANY rule triggers watch) - rules: - # Watch config resources on CREATE and UPDATE (ignore DELETE) - - operations: [CREATE, UPDATE] - apiGroups: [""] # Core API group - apiVersions: ["v1"] - resources: [configmaps, secrets] - - # Watch all operations for app resources - - operations: [CREATE, UPDATE, DELETE] - apiGroups: [apps] - apiVersions: ["v1"] - resources: [deployments, statefulsets] - - # Watch custom resources (all operations, all versions) - - apiGroups: [example.com] - resources: [myapps] diff --git a/config/service.yaml b/config/service.yaml new file mode 100644 index 00000000..1915c93c --- /dev/null +++ b/config/service.yaml @@ -0,0 +1,27 @@ +apiVersion: v1 +kind: Service +metadata: + labels: + app.kubernetes.io/managed-by: kustomize + app.kubernetes.io/name: gitops-reverser + name: gitops-reverser-service + namespace: sut +spec: + clusterIP: 10.96.200.200 # This is required because kube-apiserver starts before CoreDNS (so we use a fixed address) + ports: + - name: webhook-server + port: 443 + protocol: TCP + targetPort: 9443 + - name: audit-server + port: 9444 + protocol: TCP + targetPort: 9444 + - name: metrics + port: 8443 + protocol: TCP + targetPort: 8443 + selector: + app.kubernetes.io/name: gitops-reverser + control-plane: controller-manager + diff --git a/config/webhook.yaml b/config/webhook.yaml new file mode 100644 index 00000000..6ab477e2 --- /dev/null +++ b/config/webhook.yaml @@ -0,0 +1,30 @@ +apiVersion: admissionregistration.k8s.io/v1 +kind: ValidatingWebhookConfiguration +metadata: + annotations: + cert-manager.io/inject-ca-from: sut/gitops-reverser-webhook-server-cert + name: gitops-reverser-validating-webhook-configuration +webhooks: +- admissionReviewVersions: + - v1 + clientConfig: + service: + name: gitops-reverser-service + namespace: sut + path: /process-validating-webhook + failurePolicy: Ignore + name: gitops-reverser.configbutler.ai + rules: + - apiGroups: + - '*' + apiVersions: + - '*' + operations: + - CREATE + - UPDATE + - DELETE + resources: + - '*' + sideEffects: None + + diff --git a/config/webhook/kustomization.yaml b/config/webhook/kustomization.yaml deleted file mode 100644 index 7bfe5442..00000000 --- a/config/webhook/kustomization.yaml +++ /dev/null @@ -1,12 +0,0 @@ -resources: -- manifests.yaml -- service.yaml - -configurations: -- kustomizeconfig.yaml - -patches: -- path: webhook_service_name_patch.yaml - target: - kind: ValidatingWebhookConfiguration - name: validating-webhook-configuration diff --git a/config/webhook/kustomizeconfig.yaml b/config/webhook/kustomizeconfig.yaml deleted file mode 100644 index 206316e5..00000000 --- a/config/webhook/kustomizeconfig.yaml +++ /dev/null @@ -1,22 +0,0 @@ -# the following config is for teaching kustomize where to look at when substituting nameReference. -# It requires kustomize v2.1.0 or newer to work properly. -nameReference: -- kind: Service - version: v1 - fieldSpecs: - - kind: MutatingWebhookConfiguration - group: admissionregistration.k8s.io - path: webhooks/clientConfig/service/name - - kind: ValidatingWebhookConfiguration - group: admissionregistration.k8s.io - path: webhooks/clientConfig/service/name - -namespace: -- kind: MutatingWebhookConfiguration - group: admissionregistration.k8s.io - path: webhooks/clientConfig/service/namespace - create: true -- kind: ValidatingWebhookConfiguration - group: admissionregistration.k8s.io - path: webhooks/clientConfig/service/namespace - create: true diff --git a/config/webhook/service.yaml b/config/webhook/service.yaml deleted file mode 100644 index 4973d28e..00000000 --- a/config/webhook/service.yaml +++ /dev/null @@ -1,26 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - labels: - app.kubernetes.io/name: gitops-reverser - app.kubernetes.io/managed-by: kustomize - name: service - namespace: system -spec: - clusterIP: 10.96.200.200 # This is required because kube-apiserver starts before CoreDNS - ports: - - name: webhook-server - port: 443 - protocol: TCP - targetPort: 9443 - - name: audit-server - port: 9444 - protocol: TCP - targetPort: 9444 - - name: metrics - port: 8443 - protocol: TCP - targetPort: 8443 - selector: - control-plane: controller-manager - app.kubernetes.io/name: gitops-reverser diff --git a/config/webhook/webhook_service_name_patch.yaml b/config/webhook/webhook_service_name_patch.yaml deleted file mode 100644 index fc2aa479..00000000 --- a/config/webhook/webhook_service_name_patch.yaml +++ /dev/null @@ -1,9 +0,0 @@ -apiVersion: admissionregistration.k8s.io/v1 -kind: ValidatingWebhookConfiguration -metadata: - name: validating-webhook-configuration -webhooks: -- name: gitops-reverser.configbutler.ai - clientConfig: - service: - name: service diff --git a/docs/config-kustomize-simplification-findings.md b/docs/config-kustomize-simplification-findings.md new file mode 100644 index 00000000..97836cdd --- /dev/null +++ b/docs/config-kustomize-simplification-findings.md @@ -0,0 +1,202 @@ +# Config Kustomize Review: What Is Needed vs. What Can Be Simpler + +## Scope reviewed +- `config/default/kustomization.yaml` +- `config/default/manager_webhook_patch.yaml` +- `config/default/cert_metrics_manager_patch.yaml` +- `config/webhook/*` +- `config/certmanager/*` +- `cmd/main.go` +- `test/e2e/*` (especially namespace/cert assumptions) + +## Executive summary +- The certs are **already rendered into `sut`**, not `system`, when deploying via `config/default`. +- For webhook TLS + cert-manager CA injection, some kustomize wiring is genuinely required. +- There is also clear scaffolding/legacy complexity that can be reduced (especially commented replacement blocks and currently-unused metrics cert flow). + +## What is definitely useful / required + +### 1. `manager_webhook_patch.yaml` is required for current runtime behavior +Why: +- Your manager needs `--webhook-cert-path` and `--audit-cert-path`, plus mounted secrets and container ports (`9443`, `9444`). +- Without this patch, the cert secrets are not mounted where `cmd/main.go` expects them. + +References: +- `config/default/manager_webhook_patch.yaml:5` +- `config/default/manager_webhook_patch.yaml:27` +- `cmd/main.go:365` +- `cmd/main.go:522` + +### 2. Webhook kustomize namespace/name rewriting is required +Why: +- `ValidatingWebhookConfiguration` is cluster-scoped, but it embeds `clientConfig.service.name/namespace` fields. +- Kustomize needs explicit field specs to rewrite those embedded fields with your prefix/namespace. + +References: +- `config/webhook/kustomizeconfig.yaml:1` +- `config/webhook/webhook_service_name_patch.yaml:1` + +### 3. CA injection annotation wiring for cert-manager is required (if using cert-manager) +Why: +- API server must trust the serving cert for the validating webhook. +- `cert-manager.io/inject-ca-from` annotation on `ValidatingWebhookConfiguration` is the mechanism you currently use. + +References: +- `config/default/kustomization.yaml:187` +- Rendered output includes: `cert-manager.io/inject-ca-from: sut/gitops-reverser-webhook-server-cert` + +### 4. `certmanager/kustomizeconfig.yaml` is required with `namePrefix` +Why: +- `namePrefix: gitops-reverser-` renames `Issuer` metadata.name. +- `Certificate.spec.issuerRef.name` must be rewritten to match, otherwise cert issuance breaks. + +References: +- `config/default/kustomization.yaml:9` +- `config/certmanager/kustomizeconfig.yaml:1` + +## What is currently over-complicated / likely removable + +### 1. Huge commented replacement blocks in `config/default/kustomization.yaml` +- Most of the metrics/servicemonitor replacement blocks are commented and unused in your current default/e2e flow. +- Keeping them bloats maintenance and confuses intent. + +Reference: +- `config/default/kustomization.yaml:53` + +### 2. Mutating webhook CA injection replacements appear unused +- You only have `ValidatingWebhookConfiguration` in `config/webhook/manifests.yaml`. +- Replacement entries targeting `MutatingWebhookConfiguration` look like kubebuilder scaffold leftovers. + +References: +- `config/default/kustomization.yaml:218` +- `config/webhook/manifests.yaml:3` + +### 3. Metrics certificate is created but not mounted by default +- `metrics-server-cert.yaml` is included in resources. +- But `cert_metrics_manager_patch.yaml` is commented out, so manager does not mount/use `metrics-server-cert` by default. +- E2E Prometheus scrape uses `insecure_skip_verify: true` anyway. + +References: +- `config/certmanager/kustomization.yaml:5` +- `config/default/kustomization.yaml:40` +- `test/e2e/prometheus/deployment.yaml:24` + +## Certificate flow (how certs are used today) + +### Admission webhook cert (`webhook-server-cert` secret) +1. `Certificate` resource requests cert for service DNS. +2. cert-manager writes secret `webhook-server-cert`. +3. Deployment mounts that secret and passes `--webhook-cert-path`. +4. webhook server serves TLS on `9443` using cert watcher. +5. cert-manager injects CA into `ValidatingWebhookConfiguration` annotation target. +6. kube-apiserver calls webhook via Service over TLS and trusts injected CA. + +Key refs: +- `config/certmanager/webhook-server-cert.yaml:18` +- `config/default/manager_webhook_patch.yaml:52` +- `config/default/kustomization.yaml:187` + +### Audit ingress cert (`audit-webhook-server-cert` secret) +1. Separate `Certificate` resource issues audit cert. +2. Secret `audit-webhook-server-cert` is mounted. +3. Manager serves HTTPS audit endpoint on `9444` using `--audit-cert-path`. +4. In e2e, kube-apiserver audit webhook config uses `insecure-skip-tls-verify: true` (so CA pinning is not enforced in test). + +Key refs: +- `config/certmanager/audit-server-cert.yaml:17` +- `config/default/manager_webhook_patch.yaml:60` +- `test/e2e/kind/audit/webhook-config.yaml:14` + +### Metrics cert (`metrics-server-cert` secret) +- Issued by cert-manager, but only actively used if you also enable metrics cert patch and corresponding monitor TLS config. + +Refs: +- `config/certmanager/metrics-server-cert.yaml:20` +- `config/default/cert_metrics_manager_patch.yaml:12` +- `config/prometheus/monitor_tls_patch.yaml:1` + +## Your namespace question: `sut` vs `system` + +Short answer: +- `system` is **not required** for kube-api webhooks. +- Certs should live in the same namespace as the workload/service that uses them. +- In your current default deployment, that namespace is effectively `sut`. + +Important detail: +- Source files still show `namespace: system` in some places, but `config/default/kustomization.yaml` applies `namespace: sut` globally. +- Rendered manifests confirm certs, issuer, service, deployment are in `sut`. + +## Recommended simplification plan (test-focused) + +### Phase 1 (safe cleanup, behavior unchanged) +1. Remove large commented blocks in `config/default/kustomization.yaml` (keep only active replacements). +2. Remove unused mutating-webhook replacement entries if you do not plan mutating webhooks. +3. Add a short comment block at top: "test profile: single service + validating webhook + audit ingress". + +### Phase 2 (decide metrics cert strategy) +Choose one: +1. Keep metrics cert end-to-end: enable `cert_metrics_manager_patch.yaml` and proper monitor TLS usage. +2. Or simplify: remove `metrics-server-cert.yaml` from `config/certmanager/kustomization.yaml` and stop waiting for `metrics-server-cert` in e2e helper. + +Given current e2e (`insecure_skip_verify: true`), option 2 is simpler and consistent. + +### Phase 3 (optional bigger simplification) +If these manifests are truly test-only and namespace/prefix are fixed: +1. Replace dynamic cert DNS replacements with explicit static DNS names. +2. Replace dynamic `inject-ca-from` replacements with static annotation value. + +Tradeoff: +- Less kustomize complexity, but less reusable/generic. + +## Extra note on fixed ClusterIP +- The fixed service ClusterIP (`10.96.200.200`) is coupled to Kind audit webhook bootstrap (API server before DNS). +- Keep it if you depend on that startup behavior in e2e. + +Refs: +- `config/webhook/service.yaml:10` +- `test/e2e/kind/audit/webhook-config.yaml:12` + +## Bold strategy (essence-first): freeze rendered output, delete most kustomize machinery + +### What you mean in practice +1. Render today’s desired install profile once (`kustomize build config/default`). +2. Split that output into plain, human-owned files by concern (for example `namespace.yaml`, `crds.yaml`, `rbac.yaml`, `deployment.yaml`, `service.yaml`, `certificates.yaml`, `webhook.yaml`). +3. Remove the current deep transformer/replacement structure from `config/`. +4. Keep either: + - no kustomize at all (apply a folder of plain YAML in order), or + - one tiny `kustomization.yaml` that just lists resources with zero patches/replacements. + +This is a valid strategy if your goal is readability and low cognitive overhead over portability. + +### Why this can be good +1. You get back to essentials: explicit manifests, no hidden transformations. +2. Refactoring confidence improves because object names/refs are visible directly. +3. New contributors can reason about install behavior without learning kustomize tricks. +4. Debugging production/test drift is easier because rendered state is source of truth. + +### What you lose +1. Easy rebasing of namespace/namePrefix/env variants. +2. Automatic reference rewriting (`Issuer` name, webhook service references, CA injection path assembly). +3. Scaffold compatibility with future kubebuilder regeneration patterns. + +### Where this can hurt later +1. If you later need a second profile (for example non-e2e namespace or no fixed ClusterIP), you will duplicate YAML or re-introduce templating. +2. If cert naming/service naming changes, all references must be manually updated everywhere. +3. Large CRD/regenerated sections can become noisy unless you keep strict ownership boundaries. + +### Guardrails to keep this maintainable +1. Declare one supported raw-manifest profile explicitly (for example: `sut` test profile). +2. Keep clear file boundaries: + - `config/raw/00-namespace.yaml` + - `config/raw/10-crds.yaml` + - `config/raw/20-rbac.yaml` + - `config/raw/30-manager.yaml` + - `config/raw/40-service.yaml` + - `config/raw/50-certificates.yaml` + - `config/raw/60-webhook.yaml` +3. If you keep minimal kustomize, allow only `resources:` entries (no `patches`, no `replacements`, no `configurations`). +4. Add a lightweight validation target (for example `kubectl apply --dry-run=server -f config/raw` in CI). + +### Recommendation for your repo +- If these manifests are primarily for e2e and internal testing, this essence-first model is reasonable and likely worth it. +- If you want `config/` to be a broadly reusable install path, keep some kustomize composition and instead prune it aggressively (not fully remove it). diff --git a/test/e2e/helpers.go b/test/e2e/helpers.go index f37e2828..1b2a8bb9 100644 --- a/test/e2e/helpers.go +++ b/test/e2e/helpers.go @@ -179,15 +179,6 @@ func waitForCertificateSecrets() { g.Expect(err).NotTo(HaveOccurred(), "webhook-server-cert secret should exist") }, 60*time.Second, 2*time.Second).Should(Succeed()) //nolint:mnd // reasonable timeout for cert-manager - By("waiting for metrics certificate secret to be created by cert-manager") - Eventually(func(g Gomega) { - ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) //nolint:mnd // reasonable timeout - defer cancel() - cmd := exec.CommandContext(ctx, "kubectl", "get", "secret", "metrics-server-cert", "-n", namespace) - _, err := utils.Run(cmd) - g.Expect(err).NotTo(HaveOccurred(), "metrics-server-cert secret should exist") - }, 60*time.Second, 2*time.Second).Should(Succeed()) //nolint:mnd // reasonable timeout for cert-manager - By("waiting for dedicated audit certificate secret to be created by cert-manager") Eventually(func(g Gomega) { ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) //nolint:mnd // reasonable timeout diff --git a/test/e2e/prometheus/deployment.yaml b/test/e2e/prometheus/deployment.yaml index c3614e01..22ced0de 100644 --- a/test/e2e/prometheus/deployment.yaml +++ b/test/e2e/prometheus/deployment.yaml @@ -19,9 +19,7 @@ data: scrape_configs: # Scrape gitops-reverser metrics from the single controller Service in 'sut' - job_name: 'gitops-reverser-metrics' - scheme: https - tls_config: - insecure_skip_verify: true # Self-signed certs in e2e + scheme: http bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token static_configs: - targets: From c2e7a8ac0e39da32ef377571e04c3ad34df40f3b Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 08:39:11 +0000 Subject: [PATCH 07/32] docs: Updating expectations --- README.md | 3 +++ .../templates/validate-replica-count.yaml | 3 +++ config/README.md | 23 +++++++++++-------- 3 files changed, 20 insertions(+), 9 deletions(-) create mode 100644 charts/gitops-reverser/templates/validate-replica-count.yaml diff --git a/README.md b/README.md index e2498971..1a8d7b6f 100644 --- a/README.md +++ b/README.md @@ -40,6 +40,8 @@ Reverse GitOps gives you both: the interactivity of the Kubernetes API with Git' 🚨 This is early stage software. CRDs and behavior may change; not recommended for production yet. Feedback and contributions are very welcome! +Current limitation: GitOps Reverser must run as a single pod (`replicas=1`). Multi-pod/HA operation is not supported yet. + ### Use of AI I have been thinking about the idea behind GitOps Reverser for several years (I've given up my fulltime job to work on it). Some of the hardest parts, especially writing to Git efficiently and safely under load, were designed and implemented manually. The rest is vibe coded, and needs more refinement before I would run it in production. @@ -158,6 +160,7 @@ Avoid infinite loops: Do not point GitOps (Argo CD/Flux) and GitOps Reverser at ## Known limitations +- GitOps Reverser currently supports only a single controller pod (no multi-pod/HA yet). - Avoid multiple GitProvider configurations pointing at the same repo to prevent queue collisions (see [`docs/TODO.md`](docs/TODO.md)). - Queue collisions are possible when multiple configs target the same repository; mitigation is planned. diff --git a/charts/gitops-reverser/templates/validate-replica-count.yaml b/charts/gitops-reverser/templates/validate-replica-count.yaml new file mode 100644 index 00000000..d12b2e34 --- /dev/null +++ b/charts/gitops-reverser/templates/validate-replica-count.yaml @@ -0,0 +1,3 @@ +{{- if gt (int .Values.replicaCount) 1 -}} +{{- fail "gitops-reverser does not support HA yet (Sorry I feel your pain: but it can't be perfect from the start). Set .Values.replicaCount to 1." -}} +{{- end -}} diff --git a/config/README.md b/config/README.md index 1194fdf7..961026ae 100644 --- a/config/README.md +++ b/config/README.md @@ -1,12 +1,17 @@ -# config_raw +# config -This folder is a static, rendered snapshot of `kustomize build config/default`. +This folder contains simplified raw manifests used primarily for local development and testing, +especially end-to-end (e2e) test workflows. -Goals: -- Keep manifests simple and explicit. -- Avoid patches/replacements/transformer indirection. -- Make side-by-side comparison with `config/` easy. +## Intended use +- Local cluster bring-up. +- E2E test deployments. +- Debugging and iteration with explicit manifests. -Notes: -- These files are intentionally environment-specific to the current render profile. -- Update by re-rendering from `config/default` when source config changes. +## Production guidance +For production deployments, use the Helm chart in `charts/gitops-reverser`. +The Helm chart is the recommended installation and lifecycle management path for production. + +## Notes +- These manifests are opinionated toward the local/e2e setup. +- Keep them simple and explicit; avoid reintroducing heavy kustomize indirection here. From e9e1cf666e98d1338e3ce3260d6dd070245a042a Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 09:10:11 +0000 Subject: [PATCH 08/32] fix: Never ever commit secrets in their raw form --- docs/SOPS_ENCRYPTION_PLAN.md | 248 +++++++++++++++++++++++++ internal/watch/gvr.go | 4 + internal/watch/informers.go | 5 + internal/watch/manager.go | 14 +- internal/watch/resource_filter.go | 25 +++ internal/watch/resource_filter_test.go | 48 +++++ test/e2e/e2e_test.go | 58 ++++++ 7 files changed, 400 insertions(+), 2 deletions(-) create mode 100644 docs/SOPS_ENCRYPTION_PLAN.md create mode 100644 internal/watch/resource_filter.go create mode 100644 internal/watch/resource_filter_test.go diff --git a/docs/SOPS_ENCRYPTION_PLAN.md b/docs/SOPS_ENCRYPTION_PLAN.md new file mode 100644 index 00000000..45115c40 --- /dev/null +++ b/docs/SOPS_ENCRYPTION_PLAN.md @@ -0,0 +1,248 @@ +# SOPS Encryption Plan For Git Writes + +## Goal + +Encrypt sensitive Kubernetes resources (initially `Secret`) with SOPS before they are written to the Git worktree, so commits contain encrypted payloads instead of plaintext `data`/`stringData`. + +## Scope + +- In scope: + - Encrypt on write path (watch event -> sanitize -> git file write). + - Support SOPS execution strategy for encryption (external binary first iteration). + - Add runtime configuration for enablement, policy, and SOPS invocation. + - Support standard SOPS key backends through mounted credentials/config. + - Tests, docs, and Helm wiring. +- Out of scope (first iteration): + - Decryption in controller runtime. + - Re-encrypting existing historical commits. + - Complex per-namespace/per-rule encryption policies. + +## Current Baseline (Why this is needed) + +- `internal/sanitize/sanitize.go` preserves `data` and `binaryData`. +- `internal/watch/informers.go` enqueues sanitized objects as-is. +- `internal/git/git.go` writes YAML generated from event object directly to disk. +- Result: if a `WatchRule` includes `secrets` (or `*`), secret payloads are committed in plaintext. + +## High-Level Design + +### 1. Encryption Hook Point + +Add encryption at the final write stage in `internal/git/git.go` inside `handleCreateOrUpdateOperation`: + +1. Generate ordered YAML from sanitized object (existing behavior). +2. Apply encryption policy: + - If resource should be encrypted, run SOPS encryption. + - If not, keep plaintext YAML. +3. Continue with existing file compare/write/stage logic. + +This keeps upstream watch/sanitize flow unchanged and centralizes git-output guarantees. + +### 2. Encryption Policy + +Introduce explicit policy config (controller process-level first): + +- `disabled` (default for backward compatibility). +- `secretsOnly` (recommended default when enabled). +- `matchResources` (future): configurable list of `(group, version, resource)` patterns. + +Initial policy decision: +- Encrypt only Kubernetes `Secret` resources (`group=""`, `version="v1"`, `resource="secrets"`). + +### 3. SOPS Invocation Model + +Use external SOPS binary for first implementation, invoked by the manager process. + +Proposed approach: + +1. Write plaintext YAML to a secure temp file in `/tmp`. +2. Run SOPS command to produce encrypted YAML. +3. Read encrypted output and remove temp files. +4. Write encrypted output to repo path. + +Command strategy: +- Prefer `.sops.yaml`-driven encryption rules. +- Allow optional explicit args passthrough only through an allowlist (for example output/input type and config path), not arbitrary raw flags. + +Failure behavior (configurable): +- `failClosed` (recommended): do not write/commit if encryption fails. +- `failOpen` (optional): log error and write plaintext (not recommended for production). + +### 3a. Architecture Choice: External Binary vs Embedded Library + +This should be an explicit engineering decision, not a hidden assumption. + +Option A: External SOPS binary (first iteration) +- Pros: + - Reuses upstream SOPS behavior exactly (CLI parity with existing workflows). + - Faster implementation and lower maintenance in this codebase. + - Keeps cloud KMS/age/PGP backend behavior aligned with standard SOPS usage. +- Cons: + - Extra process spawn overhead per encrypted object. + - Runtime dependency management (binary presence, version pinning, CVE tracking). + +Option B: Embed encryption implementation directly in `gitops-reverser` +- Pros: + - No external process execution; simpler runtime dependency surface. + - Potentially better performance and tighter observability hooks. +- Cons: + - Higher implementation and long-term maintenance cost. + - Risk of behavior drift from upstream SOPS semantics and config handling. + - More complex support burden across key backends. + +Decision for this plan: +- Implement Option A first (external binary), behind feature flags. +- Keep abstraction boundary (`Encryptor` interface) so Option B can be added later without reworking git write flow. + +Revisit triggers: +- Encryption latency becomes a measurable bottleneck. +- Operational burden from binary distribution/versioning is high. +- There is a strong requirement for in-process crypto execution. + +### 4. Runtime Config Model + +Add manager flags + Helm values for encryption: + +- `--encryption-enabled` +- `--encryption-policy=secretsOnly|disabled` +- `--encryption-provider=sops` +- `--sops-binary-path=/usr/local/bin/sops` +- `--sops-config-path=/etc/sops/.sops.yaml` (optional) +- `--encryption-failure-policy=failClosed|failOpen` + +Configuration precedence: +- `--encryption-enabled=false` always disables encryption regardless of policy value. +- `--encryption-enabled=true` requires a non-`disabled` policy. +- Invalid combinations should fail startup with a clear validation error. + +Helm values section proposal: + +```yaml +encryption: + enabled: false + policy: secretsOnly + failurePolicy: failClosed + sops: + binaryPath: /usr/local/bin/sops + configPath: /etc/sops/.sops.yaml +``` + +### 5. Key Material / Backend Configuration + +Do not invent key management inside the operator. Reuse native SOPS backends: + +- `age` via mounted secret and `SOPS_AGE_KEY_FILE`. +- cloud KMS via workload identity / IAM env (AWS/GCP/Azure). +- PGP if needed (lower priority). + +Helm should support: + +- Extra volume mounts for key files and `.sops.yaml`. +- Extra env vars for SOPS backend configuration. + +## Implementation Phases + +## Phase 1: Core plumbing (code-only, no encryption yet) + +- Add `EncryptionConfig` struct and wire it from `cmd/main.go` into git worker path. +- Add policy evaluator utility (`shouldEncrypt(event)`). +- Add unit tests for policy decisions. + +Deliverable: +- Feature-flagged no-op framework merged. + +## Phase 2: SOPS binary integration + +- Implement `SOPSEncryptor` (interface + concrete implementation). +- Integrate into `handleCreateOrUpdateOperation` before file write. +- Implement temp-file execution with strict permissions. +- Add structured logging and metrics: + - encrypt attempts + - encrypt success/failure + - fail-open count + +Deliverable: +- Functional encryption when enabled and policy matches. + +## Phase 3: Packaging and Helm configuration + +- Update `Dockerfile` multi-stage build: + - Add stage to fetch pinned SOPS release binary. + - Copy binary into final distroless image (e.g. `/usr/local/bin/sops`). +- Update chart: + - New `encryption.*` values. + - Add manager args from values. + - Document volume/env examples for keys and `.sops.yaml`. + +Deliverable: +- Deployable encrypted workflow via Helm settings. + +## Phase 4: Test coverage + +- Unit tests: + - `Secret` gets encrypted. + - non-secret not encrypted (policy `secretsOnly`). + - encryption failure with `failClosed` blocks write. + - encryption failure with `failOpen` writes plaintext and emits warning metric. + - invalid flag combinations are rejected at config validation time. +- Integration tests (git operations): + - verify resulting file contains SOPS envelope fields and no raw secret values. +- Optional e2e: + - run with local age key and assert encrypted commits. + +Deliverable: +- CI coverage for happy path and failure modes. + +## Phase 5: Documentation and migration + +- Update `README.md` and chart README with: + - enabling encryption + - key backend setup examples + - operational caveats +- Add migration note: + - existing plaintext history remains in git; requires manual history rewrite if needed. + +Deliverable: +- Operator docs for secure rollout. + +## Security Considerations + +- Default to `failClosed` when encryption is enabled. +- Treat `failOpen` as development-only or break-glass behavior. +- Ensure temp files are `0600` and cleaned up. +- Ensure temp-file cleanup runs on both success and failure paths. +- Avoid logging plaintext content. +- Prefer `age` or cloud KMS over static PGP workflows. +- Recommend separate repos/branches for encrypted outputs when integrating with downstream GitOps tools. + +## Operational Considerations + +- Performance: + - SOPS process spawn per encrypted object adds overhead. + - Mitigation: keep policy narrow (`secretsOnly`) and batch commit behavior unchanged. +- Determinism: + - SOPS metadata may vary; deduplication currently happens pre-write on sanitized plaintext. + - This is acceptable for first iteration but should be documented. +- Compatibility: + - Downstream consumers (Flux/Argo) must be configured for SOPS decryption if they deploy encrypted files. + +## Proposed Acceptance Criteria + +- When enabled with `secretsOnly`, committed Secret manifests are SOPS-encrypted and plaintext secret values never appear in repo files. +- Non-secret resources continue to be committed as before. +- If SOPS is missing or misconfigured: + - `failClosed`: write is rejected and error surfaced. + - `failOpen`: plaintext write proceeds with explicit warning/metric (non-production only). +- If invalid encryption configuration is provided, manager startup fails with actionable error output. +- Helm users can: + - enable encryption + - mount key/config material + - point to SOPS binary/config path without rebuilding chart templates manually. + +## Suggested Rollout + +1. Merge framework + binary integration behind feature flag (disabled by default). +2. Run in staging with `enabled=true`, `policy=secretsOnly`, `failClosed`. +3. Validate commit contents and operational metrics. +4. Roll to production. +5. Optionally extend policy beyond `Secret` after proving stability. diff --git a/internal/watch/gvr.go b/internal/watch/gvr.go index 1f981e70..46283cb6 100644 --- a/internal/watch/gvr.go +++ b/internal/watch/gvr.go @@ -178,6 +178,10 @@ func addGVR( out *[]GVR, seen map[string]struct{}, ) { + if shouldIgnoreResource(group, resource) { + return + } + key := group + "|" + version + "|" + resource + "|" + string(scope) if _, ok := seen[key]; ok { return diff --git a/internal/watch/informers.go b/internal/watch/informers.go index 261ac5dc..d7adba2f 100644 --- a/internal/watch/informers.go +++ b/internal/watch/informers.go @@ -64,6 +64,11 @@ func (m *Manager) handleEvent(obj interface{}, g GVR, op configv1alpha1.Operatio if u == nil { return } + if shouldIgnoreResource(g.Group, g.Resource) { + m.Log.V(1).Info("Skipping resource due to safety filter", + "group", g.Group, "version", g.Version, "resource", g.Resource) + return + } ctx := context.Background() diff --git a/internal/watch/manager.go b/internal/watch/manager.go index 42a6ba32..204e88cc 100644 --- a/internal/watch/manager.go +++ b/internal/watch/manager.go @@ -581,6 +581,12 @@ func (m *Manager) processListedObject( u *unstructured.Unstructured, g GVR, ) { + if shouldIgnoreResource(g.Group, g.Resource) { + m.Log.V(1).Info("Skipping seeded resource due to safety filter", + "group", g.Group, "version", g.Version, "resource", g.Resource) + return + } + id := types.NewResourceIdentifier(g.Group, g.Version, g.Resource, u.GetNamespace(), u.GetName()) var nsLabels map[string]string @@ -693,7 +699,7 @@ func (m *Manager) addGVRsFromResourceRule( } for _, resource := range rr.Resources { normalized := normalizeResource(resource) - if normalized == "*" { + if normalized == "*" || shouldIgnoreResource(group, normalized) { continue // Skip wildcards } gvr := schema.GroupVersionResource{ @@ -731,7 +737,7 @@ func (m *Manager) addGVRsFromClusterResourceRule( } for _, resource := range rr.Resources { normalized := normalizeResource(resource) - if normalized == "*" { + if normalized == "*" || shouldIgnoreResource(group, normalized) { continue // Skip wildcards } gvr := schema.GroupVersionResource{ @@ -752,6 +758,10 @@ func (m *Manager) listResourcesForGVR( gvr schema.GroupVersionResource, gitTarget *configv1alpha1.GitTarget, ) ([]types.ResourceIdentifier, error) { + if shouldIgnoreResource(gvr.Group, gvr.Resource) { + return nil, nil + } + var resources []types.ResourceIdentifier // List resources (cluster-wide for now, namespace filtering would go here) diff --git a/internal/watch/resource_filter.go b/internal/watch/resource_filter.go new file mode 100644 index 00000000..90b0ebfa --- /dev/null +++ b/internal/watch/resource_filter.go @@ -0,0 +1,25 @@ +/* +SPDX-License-Identifier: Apache-2.0 + +Copyright 2025 ConfigButler + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package watch + +import "strings" + +func shouldIgnoreResource(group, resource string) bool { + return group == "" && strings.EqualFold(resource, "secrets") +} diff --git a/internal/watch/resource_filter_test.go b/internal/watch/resource_filter_test.go new file mode 100644 index 00000000..4c10d116 --- /dev/null +++ b/internal/watch/resource_filter_test.go @@ -0,0 +1,48 @@ +/* +SPDX-License-Identifier: Apache-2.0 + +Copyright 2025 ConfigButler + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package watch + +import "testing" + +func TestShouldIgnoreResource(t *testing.T) { + t.Parallel() + + tests := []struct { + name string + group string + resource string + want bool + }{ + {name: "core secrets", group: "", resource: "secrets", want: true}, + {name: "core secrets case insensitive", group: "", resource: "Secrets", want: true}, + {name: "core configmaps", group: "", resource: "configmaps", want: false}, + {name: "non-core secrets", group: "example.com", resource: "secrets", want: false}, + } + + for _, tt := range tests { + tt := tt + t.Run(tt.name, func(t *testing.T) { + t.Parallel() + got := shouldIgnoreResource(tt.group, tt.resource) + if got != tt.want { + t.Fatalf("shouldIgnoreResource(%q, %q) = %v, want %v", tt.group, tt.resource, got, tt.want) + } + }) + } +} diff --git a/test/e2e/e2e_test.go b/test/e2e/e2e_test.go index 9de90180..efca3949 100644 --- a/test/e2e/e2e_test.go +++ b/test/e2e/e2e_test.go @@ -521,6 +521,64 @@ var _ = Describe("Manager", Ordered, func() { cleanupGitTarget(destName, namespace) }) + It("should never commit Secret manifests even if WatchRule includes secrets", func() { + gitProviderName := "gitprovider-normal" + watchRuleName := "watchrule-secret-ignore-test" + secretName := "test-secret-ignore" + + By("creating WatchRule that includes secrets") + destName := watchRuleName + "-dest" + createGitTarget(destName, namespace, gitProviderName, "e2e/secret-ignore-test", "main") + + data := struct { + Name string + Namespace string + DestinationName string + }{ + Name: watchRuleName, + Namespace: namespace, + DestinationName: destName, + } + + err := applyFromTemplate("test/e2e/templates/watchrule.tmpl", data, namespace) + Expect(err).NotTo(HaveOccurred(), "Failed to apply WatchRule") + verifyResourceStatus("watchrule", watchRuleName, namespace, "True", "Ready", "") + + By("creating Secret in watched namespace") + _, _ = utils.Run(exec.Command("kubectl", "delete", "secret", secretName, + "-n", namespace, "--ignore-not-found=true")) + + cmd := exec.Command("kubectl", "create", "secret", "generic", secretName, + "--from-literal=password=do-not-commit", "-n", namespace) + _, err = utils.Run(cmd) + Expect(err).NotTo(HaveOccurred(), "Secret creation should succeed") + + By("verifying Secret file never appears in Git repository") + verifySecretNotCommitted := func(g Gomega) { + pullCmd := exec.Command("git", "pull") + pullCmd.Dir = checkoutDir + pullOutput, pullErr := pullCmd.CombinedOutput() + if pullErr != nil { + g.Expect(pullErr).NotTo(HaveOccurred(), + fmt.Sprintf("Should successfully pull latest changes. Output: %s", string(pullOutput))) + } + + expectedFile := filepath.Join(checkoutDir, + "e2e/secret-ignore-test", + fmt.Sprintf("v1/secrets/%s/%s.yaml", namespace, secretName)) + _, statErr := os.Stat(expectedFile) + g.Expect(statErr).To(HaveOccurred(), fmt.Sprintf("Secret file must NOT exist at %s", expectedFile)) + g.Expect(os.IsNotExist(statErr)).To(BeTrue(), "Error should be 'file does not exist'") + } + Consistently(verifySecretNotCommitted, "20s", "2s").Should(Succeed()) + + By("cleaning up test resources") + _, _ = utils.Run(exec.Command("kubectl", "delete", "secret", secretName, + "-n", namespace, "--ignore-not-found=true")) + cleanupWatchRule(watchRuleName, namespace) + cleanupGitTarget(destName, namespace) + }) + It("should create Git commit when ConfigMap is added via WatchRule", func() { gitProviderName := "gitprovider-normal" watchRuleName := "watchrule-configmap-test" From 471b75266dd8b09fe522d96c6fa103b09cd2114f Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 09:13:17 +0000 Subject: [PATCH 09/32] docs: Improving overview docs --- README.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 1a8d7b6f..3a0491f9 100644 --- a/README.md +++ b/README.md @@ -158,17 +158,15 @@ Avoid infinite loops: Do not point GitOps (Argo CD/Flux) and GitOps Reverser at - Drift detection (use commits as alert inputs) - Hybrid (traditional GitOps for infra; Reverser for app/config changes) -## Known limitations +## Known limitations / design choices - GitOps Reverser currently supports only a single controller pod (no multi-pod/HA yet). -- Avoid multiple GitProvider configurations pointing at the same repo to prevent queue collisions (see [`docs/TODO.md`](docs/TODO.md)). -- Queue collisions are possible when multiple configs target the same repository; mitigation is planned. +- `Secret` resources (`core/v1`, `secrets`) are intentionally ignored and never written to Git, even if a `WatchRule` includes `secrets` or `*`. +- Avoid multiple GitProvider configurations pointing at the same repo to prevent queue collisions. +- Queue collisions are possible when multiple configs target the same repository (so don't do that). -## Monitoring -Exposes basic OpenTelemetry metrics. See `config/prometheus/` for example manifests. - -## Other options to consider +## Other applications to consider | **Tool** | **How it Works** | **Key Differences** | |---|---|---| From f6ee4b9b89c78167cd246cc1e5bfce06a7c9a798 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 10:02:35 +0000 Subject: [PATCH 10/32] chore: Aligning http(s) server names and ports --- charts/gitops-reverser/README.md | 43 +++++-------------- ...ng-webhook.yaml => admission-webhook.yaml} | 3 +- .../templates/certificates.yaml | 6 +-- .../gitops-reverser/templates/deployment.yaml | 12 +++--- .../gitops-reverser/templates/services.yaml | 4 +- charts/gitops-reverser/values.yaml | 8 ++-- config/certs/certificates.yaml | 6 +-- config/manager.yaml | 19 ++++---- config/service.yaml | 7 ++- config/webhook.yaml | 4 +- config/webhook/manifests.yaml | 1 + ...onfig-kustomize-simplification-findings.md | 14 +++--- ...udit-ingress-first-steps-execution-plan.md | 36 ++++++++-------- ... audit-ingress-separate-server-options.md} | 14 +++--- ...https-server-alignment-and-service-plan.md | 6 +-- test/e2e/helpers.go | 8 ++-- 16 files changed, 86 insertions(+), 105 deletions(-) rename charts/gitops-reverser/templates/{validating-webhook.yaml => admission-webhook.yaml} (93%) rename docs/design/{audit-ingress-separate-webserver-options.md => audit-ingress-separate-server-options.md} (90%) diff --git a/charts/gitops-reverser/README.md b/charts/gitops-reverser/README.md index ad6b8a1a..b8e3004f 100644 --- a/charts/gitops-reverser/README.md +++ b/charts/gitops-reverser/README.md @@ -85,7 +85,7 @@ The chart deploys 1 replica by default: ▼ ┌──────────────────────────────────────────┐ │ gitops-reverser (Service) │ -│ Ports: admission(443), audit(9444), metrics(8080) | +│ Ports: admission(9443), audit(9444), metrics(8080) | └──────────────┬───────────────────────────┘ │ ▼ @@ -98,9 +98,7 @@ The chart deploys 1 replica by default: **Key Features:** - **Single-pod operation**: Minimal moving parts while HA work is deferred -- **Single Service topology**: Admission, audit, and metrics on one Service -- **Pod anti-affinity**: Pods spread across different nodes -- **Pod disruption budget**: Ensures at least 1 pod available during maintenance +- **Single Service topology**: admission, audit, and metrics on one Service ## Configuration @@ -176,7 +174,7 @@ webhook: | Parameter | Description | Default | |-----------|-------------|---------| -| `replicaCount` | Number of controller replicas | `1` | +| `replicaCount` | Number of controller replicas (can't be higher than 1 for now, sorry) | `1` | | `image.repository` | Container image repository | `ghcr.io/configbutler/gitops-reverser` | | `webhook.validating.failurePolicy` | Webhook failure policy (Ignore/Fail) | `Ignore` | | `servers.admission.tls.enabled` | Serve admission webhook with TLS (disable only for local/testing) | `true` | @@ -184,14 +182,14 @@ webhook: | `servers.audit.port` | Audit container port | `9444` | | `servers.audit.tls.enabled` | Serve audit ingress with TLS | `true` | | `servers.audit.maxRequestBodyBytes` | Max accepted audit request size | `10485760` | -| `servers.audit.timeouts.read` | Audit server read timeout | `15s` | -| `servers.audit.timeouts.write` | Audit server write timeout | `30s` | -| `servers.audit.timeouts.idle` | Audit server idle timeout | `60s` | -| `servers.audit.tls.secretName` | Secret name for audit TLS cert/key | `-audit-server-tls-cert` | +| `servers.audit.timeouts.read` | Audit-server read timeout | `15s` | +| `servers.audit.timeouts.write` | Audit-server write timeout | `30s` | +| `servers.audit.timeouts.idle` | Audit-server idle timeout | `60s` | +| `servers.audit.tls.secretName` | Secret name for audit TLS cert/key | `-audit-server-cert` | | `servers.metrics.bindAddress` | Metrics listener bind address | `:8080` | | `servers.metrics.tls.enabled` | Serve metrics with TLS | `true` | | `service.clusterIP` | Optional fixed ClusterIP for single controller Service | `""` | -| `service.ports.admission` | Service port for admission webhook | `443` | +| `service.ports.admission` | Service port for admission webhook | `9443` | | `service.ports.audit` | Service port for audit ingress | `9444` | | `service.ports.metrics` | Service port for metrics | `8080` | | `certificates.certManager.enabled` | Use cert-manager for certificates | `true` | @@ -314,7 +312,7 @@ Check certificate status: ```bash kubectl get certificate -n gitops-reverser-system -kubectl describe certificate gitops-reverser-webhook-server-tls-cert -n gitops-reverser-system +kubectl describe certificate gitops-reverser-admission-server-cert -n gitops-reverser-system ``` If cert-manager is not working: @@ -384,33 +382,12 @@ webhook: Create certificate secret manually: ```bash -kubectl create secret tls gitops-reverser-webhook-server-tls-cert \ +kubectl create secret tls gitops-reverser-admission-server-cert \ --cert=path/to/tls.crt \ --key=path/to/tls.key \ -n gitops-reverser-system ``` -### Network Policies - -Enable network policies for additional security: - -```yaml -networkPolicy: - enabled: true - ingress: - - from: - - namespaceSelector: {} - ports: - - protocol: TCP - port: 9443 # webhook port - egress: - - to: - - namespaceSelector: {} - ports: - - protocol: TCP - port: 443 # Kubernetes API -``` - ### Custom Resource Limits For clusters with high resource usage: diff --git a/charts/gitops-reverser/templates/validating-webhook.yaml b/charts/gitops-reverser/templates/admission-webhook.yaml similarity index 93% rename from charts/gitops-reverser/templates/validating-webhook.yaml rename to charts/gitops-reverser/templates/admission-webhook.yaml index 494bf1b3..ec2b6a97 100644 --- a/charts/gitops-reverser/templates/validating-webhook.yaml +++ b/charts/gitops-reverser/templates/admission-webhook.yaml @@ -7,7 +7,7 @@ metadata: {{- include "gitops-reverser.labels" . | nindent 4 }} {{- if .Values.certificates.certManager.enabled }} annotations: - cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "gitops-reverser.fullname" . }}-webhook-server-cert + cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ include "gitops-reverser.fullname" . }}-admission-server-cert {{- end }} webhooks: - admissionReviewVersions: @@ -16,6 +16,7 @@ webhooks: service: name: {{ include "gitops-reverser.fullname" . }} namespace: {{ .Release.Namespace }} + port: {{ .Values.service.ports.admission }} path: /process-validating-webhook {{- if not .Values.certificates.certManager.enabled }} caBundle: {{ .Values.webhook.caBundle | b64enc }} diff --git a/charts/gitops-reverser/templates/certificates.yaml b/charts/gitops-reverser/templates/certificates.yaml index debf58ca..ecaece57 100644 --- a/charts/gitops-reverser/templates/certificates.yaml +++ b/charts/gitops-reverser/templates/certificates.yaml @@ -15,7 +15,7 @@ spec: apiVersion: cert-manager.io/v1 kind: Certificate metadata: - name: {{ include "gitops-reverser.fullname" . }}-webhook-server-cert + name: {{ include "gitops-reverser.fullname" . }}-admission-server-cert namespace: {{ .Release.Namespace }} labels: {{- include "gitops-reverser.labels" . | nindent 4 }} @@ -26,7 +26,7 @@ spec: issuerRef: kind: {{ .Values.certificates.certManager.issuer.kind }} name: {{ .Values.certificates.certManager.issuer.name }} - secretName: {{ include "gitops-reverser.fullname" . }}-webhook-server-tls-cert + secretName: {{ include "gitops-reverser.fullname" . }}-admission-server-cert usages: - digital signature - key encipherment @@ -53,7 +53,7 @@ spec: issuerRef: kind: {{ .Values.certificates.certManager.issuer.kind }} name: {{ .Values.certificates.certManager.issuer.name }} - secretName: {{ .Values.servers.audit.tls.secretName | default (printf "%s-audit-server-tls-cert" (include "gitops-reverser.fullname" .)) }} + secretName: {{ .Values.servers.audit.tls.secretName | default (printf "%s-audit-server-cert" (include "gitops-reverser.fullname" .)) }} usages: - digital signature - key encipherment diff --git a/charts/gitops-reverser/templates/deployment.yaml b/charts/gitops-reverser/templates/deployment.yaml index 1a311a31..eddd94c7 100644 --- a/charts/gitops-reverser/templates/deployment.yaml +++ b/charts/gitops-reverser/templates/deployment.yaml @@ -83,11 +83,11 @@ spec: - --audit-dump-path=/var/run/audit-dumps {{- end }} ports: - - name: webhook-server + - name: admission containerPort: {{ .Values.servers.admission.port }} protocol: TCP {{- if .Values.servers.audit.enabled }} - - name: audit-server + - name: audit containerPort: {{ .Values.servers.audit.port }} protocol: TCP {{- end }} @@ -128,7 +128,7 @@ spec: - name: tmp-dir mountPath: /tmp {{- if .Values.servers.admission.tls.enabled }} - - name: cert + - name: admission-cert mountPath: {{ .Values.servers.admission.tls.certPath }} readOnly: true {{- end }} @@ -148,15 +148,15 @@ spec: - name: tmp-dir emptyDir: {} {{- if .Values.servers.admission.tls.enabled }} - - name: cert + - name: admission-cert secret: - secretName: {{ include "gitops-reverser.fullname" . }}-webhook-server-tls-cert + secretName: {{ include "gitops-reverser.fullname" . }}-admission-server-cert defaultMode: 420 {{- end }} {{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} - name: audit-cert secret: - secretName: {{ .Values.servers.audit.tls.secretName | default (printf "%s-audit-server-tls-cert" (include "gitops-reverser.fullname" .)) }} + secretName: {{ .Values.servers.audit.tls.secretName | default (printf "%s-audit-server-cert" (include "gitops-reverser.fullname" .)) }} defaultMode: 420 {{- end }} {{- with .Values.volumes }} diff --git a/charts/gitops-reverser/templates/services.yaml b/charts/gitops-reverser/templates/services.yaml index ebedc56f..acb468f1 100644 --- a/charts/gitops-reverser/templates/services.yaml +++ b/charts/gitops-reverser/templates/services.yaml @@ -14,12 +14,12 @@ spec: clusterIP: {{ .Values.service.clusterIP }} {{- end }} ports: - - name: webhook-server + - name: admission port: {{ .Values.service.ports.admission }} targetPort: {{ .Values.servers.admission.port }} protocol: TCP {{- if .Values.servers.audit.enabled }} - - name: audit-server + - name: audit port: {{ .Values.service.ports.audit }} targetPort: {{ .Values.servers.audit.port }} protocol: TCP diff --git a/charts/gitops-reverser/values.yaml b/charts/gitops-reverser/values.yaml index ea8e2658..4e023357 100644 --- a/charts/gitops-reverser/values.yaml +++ b/charts/gitops-reverser/values.yaml @@ -59,7 +59,7 @@ servers: # Controls webhook TLS wiring in the controller process. # Keep enabled for normal Kubernetes webhook operation. enabled: true - certPath: "/tmp/k8s-webhook-server/webhook-server-certs" + certPath: "/tmp/k8s-admission-server/admission-server-certs" certName: "tls.crt" certKey: "tls.key" @@ -70,7 +70,7 @@ servers: tls: # Serve audit ingress over HTTPS when true, HTTP when false. enabled: true - certPath: "/tmp/k8s-audit-webhook-server/webhook-server-certs" + certPath: "/tmp/k8s-audit-server/audit-server-certs" certName: "tls.crt" certKey: "tls.key" secretName: "" @@ -85,7 +85,7 @@ servers: port: 8080 tls: # Serve metrics over HTTPS when true, HTTP when false. - enabled: true + enabled: false certPath: "" certName: "tls.crt" certKey: "tls.key" @@ -187,7 +187,7 @@ service: # Optional fixed ClusterIP (useful for Kind/bootstrap environments before DNS is ready) clusterIP: "" ports: - admission: 443 + admission: 9443 audit: 9444 metrics: 8080 diff --git a/config/certs/certificates.yaml b/config/certs/certificates.yaml index baa79522..c580689c 100644 --- a/config/certs/certificates.yaml +++ b/config/certs/certificates.yaml @@ -4,7 +4,7 @@ metadata: labels: app.kubernetes.io/managed-by: kustomize app.kubernetes.io/name: gitops-reverser - name: gitops-reverser-webhook-server-cert + name: gitops-reverser-admission-server-cert namespace: sut spec: dnsNames: @@ -15,7 +15,7 @@ spec: name: gitops-reverser-selfsigned-issuer privateKey: rotationPolicy: Always - secretName: webhook-server-cert + secretName: admission-server-cert --- apiVersion: cert-manager.io/v1 kind: Certificate @@ -34,4 +34,4 @@ spec: name: gitops-reverser-selfsigned-issuer privateKey: rotationPolicy: Always - secretName: audit-webhook-server-cert + secretName: audit-server-cert diff --git a/config/manager.yaml b/config/manager.yaml index e6823fbd..3d6319bc 100644 --- a/config/manager.yaml +++ b/config/manager.yaml @@ -26,8 +26,8 @@ spec: - --metrics-bind-address=:8443 - --metrics-insecure - --health-probe-bind-address=:8081 - - --webhook-cert-path=/tmp/k8s-webhook-server/webhook-server-certs - - --audit-cert-path=/tmp/k8s-audit-webhook-server/webhook-server-certs + - --webhook-cert-path=/tmp/k8s-admission-server/admission-server-certs + - --audit-cert-path=/tmp/k8s-audit-server/audit-server-certs command: - /manager env: @@ -49,10 +49,13 @@ spec: name: manager ports: - containerPort: 9443 - name: webhook-server + name: admission protocol: TCP - containerPort: 9444 - name: audit-server + name: audit + protocol: TCP + - containerPort: 8443 + name: metrics protocol: TCP readinessProbe: httpGet: @@ -76,10 +79,10 @@ spec: volumeMounts: - mountPath: /tmp name: tmp-dir - - mountPath: /tmp/k8s-webhook-server/webhook-server-certs + - mountPath: /tmp/k8s-admission-server/admission-server-certs name: webhook-certs readOnly: true - - mountPath: /tmp/k8s-audit-webhook-server/webhook-server-certs + - mountPath: /tmp/k8s-audit-server/audit-server-certs name: audit-webhook-certs readOnly: true securityContext: @@ -93,7 +96,7 @@ spec: name: tmp-dir - name: webhook-certs secret: - secretName: webhook-server-cert + secretName: admission-server-cert - name: audit-webhook-certs secret: - secretName: audit-webhook-server-cert + secretName: audit-server-cert diff --git a/config/service.yaml b/config/service.yaml index 1915c93c..8636109e 100644 --- a/config/service.yaml +++ b/config/service.yaml @@ -9,11 +9,11 @@ metadata: spec: clusterIP: 10.96.200.200 # This is required because kube-apiserver starts before CoreDNS (so we use a fixed address) ports: - - name: webhook-server - port: 443 + - name: admission + port: 9443 protocol: TCP targetPort: 9443 - - name: audit-server + - name: audit port: 9444 protocol: TCP targetPort: 9444 @@ -24,4 +24,3 @@ spec: selector: app.kubernetes.io/name: gitops-reverser control-plane: controller-manager - diff --git a/config/webhook.yaml b/config/webhook.yaml index 6ab477e2..8d3afdb4 100644 --- a/config/webhook.yaml +++ b/config/webhook.yaml @@ -2,7 +2,7 @@ apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: annotations: - cert-manager.io/inject-ca-from: sut/gitops-reverser-webhook-server-cert + cert-manager.io/inject-ca-from: sut/gitops-reverser-admission-server-cert name: gitops-reverser-validating-webhook-configuration webhooks: - admissionReviewVersions: @@ -11,6 +11,7 @@ webhooks: service: name: gitops-reverser-service namespace: sut + port: 9443 path: /process-validating-webhook failurePolicy: Ignore name: gitops-reverser.configbutler.ai @@ -27,4 +28,3 @@ webhooks: - '*' sideEffects: None - diff --git a/config/webhook/manifests.yaml b/config/webhook/manifests.yaml index 26cdf5e5..a307864c 100644 --- a/config/webhook/manifests.yaml +++ b/config/webhook/manifests.yaml @@ -10,6 +10,7 @@ webhooks: service: name: webhook-service namespace: system + port: 9443 path: /process-validating-webhook failurePolicy: Ignore name: gitops-reverser.configbutler.ai diff --git a/docs/config-kustomize-simplification-findings.md b/docs/config-kustomize-simplification-findings.md index 97836cdd..e1875b05 100644 --- a/docs/config-kustomize-simplification-findings.md +++ b/docs/config-kustomize-simplification-findings.md @@ -43,7 +43,7 @@ Why: References: - `config/default/kustomization.yaml:187` -- Rendered output includes: `cert-manager.io/inject-ca-from: sut/gitops-reverser-webhook-server-cert` +- Rendered output includes: `cert-manager.io/inject-ca-from: sut/gitops-reverser-admission-server-cert` ### 4. `certmanager/kustomizeconfig.yaml` is required with `namePrefix` Why: @@ -83,22 +83,22 @@ References: ## Certificate flow (how certs are used today) -### Admission webhook cert (`webhook-server-cert` secret) +### Admission webhook cert (`admission-server-cert` secret) 1. `Certificate` resource requests cert for service DNS. -2. cert-manager writes secret `webhook-server-cert`. +2. cert-manager writes secret `admission-server-cert`. 3. Deployment mounts that secret and passes `--webhook-cert-path`. -4. webhook server serves TLS on `9443` using cert watcher. +4. admission-server listener serves TLS on `9443` using cert watcher. 5. cert-manager injects CA into `ValidatingWebhookConfiguration` annotation target. 6. kube-apiserver calls webhook via Service over TLS and trusts injected CA. Key refs: -- `config/certmanager/webhook-server-cert.yaml:18` +- `config/certmanager/admission-server-cert.yaml:18` - `config/default/manager_webhook_patch.yaml:52` - `config/default/kustomization.yaml:187` -### Audit ingress cert (`audit-webhook-server-cert` secret) +### Audit ingress cert (`audit-server-cert` secret) 1. Separate `Certificate` resource issues audit cert. -2. Secret `audit-webhook-server-cert` is mounted. +2. Secret `audit-server-cert` is mounted. 3. Manager serves HTTPS audit endpoint on `9444` using `--audit-cert-path`. 4. In e2e, kube-apiserver audit webhook config uses `insecure-skip-tls-verify: true` (so CA pinning is not enforced in test). diff --git a/docs/design/audit-ingress-first-steps-execution-plan.md b/docs/design/audit-ingress-first-steps-execution-plan.md index 5eef98c8..fd5e8282 100644 --- a/docs/design/audit-ingress-first-steps-execution-plan.md +++ b/docs/design/audit-ingress-first-steps-execution-plan.md @@ -7,7 +7,7 @@ Execution-focused handoff plan for implementation agent. Scope is fixed to: - single deployment -- extra in-binary audit webserver +- extra in-binary audit-server listener - path-based cluster recognition - Kind remains the e2e cluster target @@ -19,7 +19,7 @@ This document intentionally excludes alternative architecture discussion. Implement an initial production-ready split where: -- admission webhook keeps running on current webhook server path [`/process-validating-webhook`](cmd/main.go:191) +- admission webhook keeps running on current admission-server path [`/process-validating-webhook`](cmd/main.go:191) - audit ingress moves to a separate server in the same binary on a different port - audit ingress is exposed via a dedicated Service - cluster identity is derived from request path segment @@ -31,7 +31,7 @@ Implement an initial production-ready split where: ### 2.1 Coupling risks -- Both admission and audit handlers are registered on one webhook server in [`cmd/main.go`](cmd/main.go:101) and [`cmd/main.go`](cmd/main.go:204) +- Both admission and audit handlers are registered on one admission-server listener in [`cmd/main.go`](cmd/main.go:101) and [`cmd/main.go`](cmd/main.go:204) - One service endpoint currently fronts this surface in [`charts/gitops-reverser/templates/services.yaml`](charts/gitops-reverser/templates/services.yaml:3) - One cert lifecycle currently serves this surface in [`charts/gitops-reverser/templates/certificates.yaml`](charts/gitops-reverser/templates/certificates.yaml:16) @@ -62,10 +62,10 @@ In [`internal/webhook/audit_handler.go`](internal/webhook/audit_handler.go:86): Implement two servers in one process: -- admission server - - existing controller-runtime webhook server +- admission-server + - existing controller-runtime admission-server listener - keeps current cert and service behavior -- audit server +- audit-server - dedicated `http.Server` listener on separate port - independent TLS config inputs - serves audit paths with cluster path segment @@ -97,11 +97,11 @@ For phase 1: ## 4. Concrete code work items -### 4.1 Add audit server config model in main +### 4.1 Add audit-server config model in main Target file: [`cmd/main.go`](cmd/main.go:253) -Add new app config fields for audit server, separate from webhook server fields: +Add new app config fields for audit-server, separate from admission-server fields: - audit listen address and port - audit cert path, cert name, cert key @@ -112,7 +112,7 @@ Add new app config fields for audit server, separate from webhook server fields: Add flags in [`parseFlags()`](cmd/main.go:270) for above. -### 4.2 Implement dedicated audit server bootstrap +### 4.2 Implement dedicated audit-server bootstrap Target file: [`cmd/main.go`](cmd/main.go:77) @@ -126,7 +126,7 @@ Add functions to: Implementation note: -- audit server should be started via manager runnable so lifecycle follows manager start and stop. +- audit-server should be started via manager runnable so lifecycle follows manager start and stop. ### 4.3 Extend audit handler with path identity and guardrails @@ -146,7 +146,7 @@ Add behavior: ### 4.4 Keep admission webhook behavior untouched -Do not change current validating webhook registration semantics in [`cmd/main.go`](cmd/main.go:190) and chart registration in [`charts/gitops-reverser/templates/validating-webhook.yaml`](charts/gitops-reverser/templates/validating-webhook.yaml:16). +Do not change current validating webhook registration semantics in [`cmd/main.go`](cmd/main.go:190) and chart registration in [`charts/gitops-reverser/templates/admission-webhook.yaml`](charts/gitops-reverser/templates/admission-webhook.yaml:16). --- @@ -176,7 +176,7 @@ Keep existing webhook block for admission as-is in first phase. Target file: [`charts/gitops-reverser/templates/deployment.yaml`](charts/gitops-reverser/templates/deployment.yaml:41) -Add container args for audit server flags and add second named container port for audit ingress. +Add container args for audit-server flags and add second named container port for audit ingress. Mount dedicated audit TLS secret path in addition to admission cert mount. @@ -243,7 +243,7 @@ Actions: - add replacement wiring for audit service name and namespace into cert DNS entries - keep admission CA injection for validating webhook intact -### 6.3 Add manager patch entries for audit server args and mounts +### 6.3 Add manager patch entries for audit-server args and mounts Relevant file: [`config/default/manager_webhook_patch.yaml`](config/default/manager_webhook_patch.yaml:1) @@ -273,7 +273,7 @@ Add table-driven cases for: ### 7.2 Main bootstrap tests -Add new tests for config parsing and audit server bootstrap behavior. +Add new tests for config parsing and audit-server bootstrap behavior. Suggested new file: @@ -284,7 +284,7 @@ Cover: - default flag values - custom audit flag parsing - invalid timeout parsing behavior if introduced -- audit server runnable registration +- audit-server runnable registration ### 7.3 E2E changes on Kind @@ -349,7 +349,7 @@ Ensure cardinality protection: ### 8.3 Error handling -Audit server must return: +Audit-server must return: - `400` for malformed path or body - `405` for method mismatch @@ -375,7 +375,7 @@ Reason: Implementation is complete only when all are true: -1. Separate in-binary audit server is active on separate port with separate service exposure. +1. Separate in-binary audit-server is active on separate port with separate service exposure. 2. Audit endpoint requires path-based cluster ID on fixed `/audit-webhook/{clusterID}` and accepts newly seen cluster IDs. 3. Admission webhook behavior remains unchanged. 4. Helm and kustomize manifests include independent audit TLS and service resources. @@ -398,6 +398,6 @@ Implementation is complete only when all are true: - Update docs for cluster audit config in [`docs/audit-setup/cluster/audit/webhook-config.yaml`](docs/audit-setup/cluster/audit/webhook-config.yaml:1) - Update Kind docs in [`test/e2e/kind/README.md`](test/e2e/kind/README.md:1) - Update chart docs in [`charts/gitops-reverser/README.md`](charts/gitops-reverser/README.md:1) -- Keep architecture alternatives in [`docs/design/audit-ingress-separate-webserver-options.md`](docs/design/audit-ingress-separate-webserver-options.md:1) and keep this document implementation-only +- Keep architecture alternatives in [`docs/design/audit-ingress-separate-server-options.md`](docs/design/audit-ingress-separate-server-options.md:1) and keep this document implementation-only This plan is ready to hand to a coding agent for direct execution. diff --git a/docs/design/audit-ingress-separate-webserver-options.md b/docs/design/audit-ingress-separate-server-options.md similarity index 90% rename from docs/design/audit-ingress-separate-webserver-options.md rename to docs/design/audit-ingress-separate-server-options.md index 78261a56..36720055 100644 --- a/docs/design/audit-ingress-separate-webserver-options.md +++ b/docs/design/audit-ingress-separate-server-options.md @@ -6,17 +6,17 @@ Design proposal only, updated with webhook ingress best-practice alignment. ## Context -Today both endpoints are served by the same controller-runtime webhook server and the same Service: +Today both endpoints are served by the same controller-runtime admission-server listener and the same Service: - Admission webhook endpoint [`/process-validating-webhook`](cmd/main.go:191) - Audit endpoint [`/audit-webhook`](cmd/main.go:204) -- Single leader-only Service on port [`443`](charts/gitops-reverser/templates/services.yaml:18) targeting one webhook server port [`9443`](charts/gitops-reverser/values.yaml:70) +- Single leader-only Service on port [`9443`](charts/gitops-reverser/templates/services.yaml:18) targeting one admission-server port [`9443`](charts/gitops-reverser/values.yaml:70) This coupling limits independent exposure and independent TLS policy for incoming audit traffic. ## Objectives -- Move audit ingress to a separate webserver and separate port. +- Move audit ingress to a separate audit-server listener and separate port. - Allow explicit configuration of incoming TLS requirements for audit traffic. - Support audit streaming from external or secondary clusters. - Provide cluster differentiation options with trade-offs. @@ -49,11 +49,11 @@ From [`docs/design/best-practices-webhook-ingress.md`](docs/design/best-practice --- -## Separation options for audit webserver +## Separation options for audit-server ### Option A: Same pod, second HTTP server, separate Service and port -Run a second server process inside the manager binary for audit ingest. +Run a second server process inside the manager binary for audit-server ingest. Pros: @@ -302,7 +302,7 @@ Minimum design controls: graph TD A[Source cluster A api server] --> P[/audit-webhook/cluster-a] B[Source cluster B api server] --> Q[/audit-webhook/cluster-b] - P --> S[Audit webserver separate port] + P --> S[audit-server separate port] Q --> S S --> V[Cluster ID allowlist validator] V --> R[Queue and backpressure controls] @@ -311,4 +311,4 @@ graph TD ## Final position -A separate audit webserver on a separate port with configurable incoming TLS policy is the correct direction. For cluster differentiation, path-based identity is a practical default when it is combined with strict allowlist and network restrictions. The chart should be evolved to treat audit ingress as its own product surface with dedicated TLS, exposure, identity, and reliability controls. +A separate audit-server listener on a separate port with configurable incoming TLS policy is the correct direction. For cluster differentiation, path-based identity is a practical default when it is combined with strict allowlist and network restrictions. The chart should be evolved to treat audit ingress as its own product surface with dedicated TLS, exposure, identity, and reliability controls. diff --git a/docs/design/https-server-alignment-and-service-plan.md b/docs/design/https-server-alignment-and-service-plan.md index 260f70b5..460e5c87 100644 --- a/docs/design/https-server-alignment-and-service-plan.md +++ b/docs/design/https-server-alignment-and-service-plan.md @@ -4,9 +4,9 @@ Improve consistency across the three HTTPS surfaces: -1. admission webhook server +1. admission-server 2. audit ingress server -3. metrics server +3. metrics-server With Service topology now simplified after removing the leader-only Service. @@ -179,7 +179,7 @@ network: name: "" # defaults to {{ include "gitops-reverser.fullname" . }} type: ClusterIP ports: - admission: 443 + admission: 9443 audit: 8444 metrics: 8443 diff --git a/test/e2e/helpers.go b/test/e2e/helpers.go index 1b2a8bb9..c58798eb 100644 --- a/test/e2e/helpers.go +++ b/test/e2e/helpers.go @@ -174,18 +174,18 @@ func waitForCertificateSecrets() { Eventually(func(g Gomega) { ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) //nolint:mnd // reasonable timeout defer cancel() - cmd := exec.CommandContext(ctx, "kubectl", "get", "secret", "webhook-server-cert", "-n", namespace) + cmd := exec.CommandContext(ctx, "kubectl", "get", "secret", "admission-server-cert", "-n", namespace) _, err := utils.Run(cmd) - g.Expect(err).NotTo(HaveOccurred(), "webhook-server-cert secret should exist") + g.Expect(err).NotTo(HaveOccurred(), "admission-server-cert secret should exist") }, 60*time.Second, 2*time.Second).Should(Succeed()) //nolint:mnd // reasonable timeout for cert-manager By("waiting for dedicated audit certificate secret to be created by cert-manager") Eventually(func(g Gomega) { ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) //nolint:mnd // reasonable timeout defer cancel() - cmd := exec.CommandContext(ctx, "kubectl", "get", "secret", "audit-webhook-server-cert", "-n", namespace) + cmd := exec.CommandContext(ctx, "kubectl", "get", "secret", "audit-server-cert", "-n", namespace) _, err := utils.Run(cmd) - g.Expect(err).NotTo(HaveOccurred(), "audit-webhook-server-cert secret should exist") + g.Expect(err).NotTo(HaveOccurred(), "audit-server-cert secret should exist") }, 60*time.Second, 2*time.Second).Should(Succeed()) //nolint:mnd // reasonable timeout for cert-manager By("✅ All certificate secrets are ready") From 9cc2c2f6f48f4c8f817448e6ef22982bbdd0399e Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 10:05:10 +0000 Subject: [PATCH 11/32] chore: First steps in testing the helm chart as well --- .github/workflows/ci.yml | 68 +++++++- Makefile | 16 ++ docs/design/quickstart-flow-e2e-strategy.md | 184 ++++++++++++++++++++ test/e2e/scripts/install-smoke.sh | 100 +++++++++++ 4 files changed, 367 insertions(+), 1 deletion(-) create mode 100644 docs/design/quickstart-flow-e2e-strategy.md create mode 100755 test/e2e/scripts/install-smoke.sh diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d91994e1..c9651f6d 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -292,12 +292,78 @@ jobs: make test-e2e " + e2e-install-smoke: + name: E2E Install Smoke (${{ matrix.scenario }}) + runs-on: ubuntu-latest + needs: [build-ci-container, docker-build] + strategy: + matrix: + scenario: [helm, manifest] + env: + PROJECT_IMAGE: ${{ needs.docker-build.outputs.image }} + KIND_CLUSTER: gitops-reverser-test-e2e-smoke-${{ matrix.scenario }} + CI_CONTAINER: ${{ needs.build-ci-container.outputs.image }} + + steps: + - name: Checkout code + uses: actions/checkout@v6 + + - name: Generate Kind cluster config from template + env: + HOST_PROJECT_PATH: ${{ github.workspace }} + run: | + echo "🔧 Generating cluster config with HOST_PROJECT_PATH=${HOST_PROJECT_PATH}" + envsubst < test/e2e/kind/cluster-template.yaml > test/e2e/kind/cluster.yaml + echo "✅ Generated configuration:" + cat test/e2e/kind/cluster.yaml + + - name: Set up Kind cluster with audit webhook support + uses: helm/kind-action@v1.13.0 + with: + cluster_name: ${{ env.KIND_CLUSTER }} + version: v0.30.0 + kubectl_version: v1.32.3 + config: test/e2e/kind/cluster.yaml + wait: 5m + + - name: Verify cluster setup + run: | + kubectl cluster-info + kubectl get nodes + echo "✅ Kind cluster is ready" + + - name: Login to Docker registry + run: | + echo "${{ secrets.GITHUB_TOKEN }}" | docker login ${{ env.REGISTRY }} -u ${{ github.actor }} --password-stdin + + - name: Pull and load image to Kind + run: | + echo "Pulling image: ${{ env.PROJECT_IMAGE }}" + docker pull ${{ env.PROJECT_IMAGE }} + kind load docker-image ${{ env.PROJECT_IMAGE }} --name ${{ env.KIND_CLUSTER }} + + - name: Run install smoke test in CI container + run: | + TARGET="test-e2e-install-${{ matrix.scenario }}" + docker run --rm \ + --network host \ + -v ${{ github.workspace }}:/workspace \ + -v $HOME/.kube:/root/.kube \ + -w /workspace \ + -e PROJECT_IMAGE=${{ env.PROJECT_IMAGE }} \ + -e KIND_CLUSTER=${{ env.KIND_CLUSTER }} \ + ${{ env.CI_CONTAINER }} \ + bash -c " + git config --global --add safe.directory /workspace + make ${TARGET} + " + # Release job only runs on push to main after tests pass release-please: name: Release Please runs-on: ubuntu-latest if: github.event_name == 'push' && github.ref == 'refs/heads/main' - needs: [lint-helm, lint, test, e2e-test, validate-devcontainer] + needs: [lint-helm, lint, test, e2e-test, e2e-install-smoke, validate-devcontainer] outputs: release_created: ${{ steps.release.outputs.release_created }} tag_name: ${{ steps.release.outputs.tag_name }} diff --git a/Makefile b/Makefile index 5e70363b..a1563bf6 100644 --- a/Makefile +++ b/Makefile @@ -246,3 +246,19 @@ cleanup-prometheus-e2e: ## Clean up Prometheus e2e environment .PHONY: setup-e2e setup-e2e: setup-cert-manager setup-gitea-e2e setup-prometheus-e2e ## Setup all e2e test infrastructure @echo "✅ E2E infrastructure initialized" + +.PHONY: wait-cert-manager +wait-cert-manager: ## Wait for cert-manager pods to become ready + @$(KUBECTL) wait --for=condition=ready pod -l app.kubernetes.io/instance=cert-manager -n cert-manager --timeout=300s + +.PHONY: test-e2e-install-helm +test-e2e-install-helm: setup-cluster cleanup-webhook setup-cert-manager wait-cert-manager manifests helm-sync ## Smoke test: install from local Helm chart and verify rollout + @bash test/e2e/scripts/install-smoke.sh helm + +.PHONY: test-e2e-install-manifest +test-e2e-install-manifest: setup-cluster cleanup-webhook setup-cert-manager wait-cert-manager ## Smoke test: install from generated dist/install.yaml and verify rollout + @bash test/e2e/scripts/install-smoke.sh manifest + +.PHONY: test-e2e-install-smoke +test-e2e-install-smoke: test-e2e-install-helm test-e2e-install-manifest ## Run all Layer 1 install smoke tests + @echo "✅ All install smoke tests passed" diff --git a/docs/design/quickstart-flow-e2e-strategy.md b/docs/design/quickstart-flow-e2e-strategy.md new file mode 100644 index 00000000..0e94b7b2 --- /dev/null +++ b/docs/design/quickstart-flow-e2e-strategy.md @@ -0,0 +1,184 @@ +# Quickstart Flow E2E Strategy + +## Why this document exists + +We currently validate core behavior with: + +- unit/integration tests (`make test`) +- full e2e tests (`make test-e2e`) with Kind + Gitea + Prometheus + +But we do **not** have a dedicated test focused on "new user install paths": + +- install from the raw/basic Helm chart +- install from generated `dist/install.yaml` (the path shown in quickstart) + +This document defines how to test those flows in CI and whether Gitea should be part of that validation. + +## Goals + +- Ensure first-time installation paths do not regress. +- Catch packaging/rendering/rollout failures before release. +- Keep runtime and maintenance cost reasonable. + +## Non-goals + +- Replacing existing full behavior e2e coverage. +- Re-testing every reconciliation scenario in this new flow. + +## Current gaps + +- Existing e2e deploys using `make install` + `make deploy` (kustomize path), not Helm chart install. +- `dist/install.yaml` is generated in release pipeline, but not validated as an install/rollout path in e2e. +- Quickstart user journey is not directly tested end-to-end. + +## Should we add Gitea here? + +Short answer: **yes, but as a second phase**. + +### Option A: Install smoke only (no Gitea) + +What it tests: + +- Kind cluster bootstrap +- cert-manager dependency +- Helm install from `charts/gitops-reverser` +- `kubectl apply -f dist/install.yaml` +- controller rollout readiness +- CRDs and webhook objects present + +Pros: + +- Fast, stable, low maintenance. +- Directly validates packaging and install UX. +- Best signal per minute for quickstart regressions. + +Cons: + +- Does not prove end-to-end "create resource -> commit appears in git" in this specific path. + +### Option B: Full quickstart flow with Gitea + +What it adds: + +- Create credentials secret +- Apply minimal `GitProvider` + `GitTarget` + `WatchRule` +- Create a ConfigMap +- Verify resulting commit/file in Git repo (Gitea) + +Pros: + +- Closest possible validation of "new user success" narrative. +- Strong confidence that install path + runtime behavior work together. + +Cons: + +- Higher runtime and flakiness surface. +- More setup/teardown complexity. +- Duplicates part of existing heavy e2e coverage. + +### Option C: Commit to a dedicated GitHub repository + +What it adds: + +- Use a purpose-built GitHub repository for e2e output validation. +- Create short-lived branch per run (for example `e2e/`). +- Configure `GitProvider` credentials for GitHub. +- Apply minimal quickstart CRs and assert commit/file appears in that branch. + +Pros: + +- Highest fidelity to real user setup from quickstart perspective. +- Validates network/auth/provider behavior against actual GitHub. +- Catches provider-specific issues that local Gitea cannot. + +Cons: + +- More operational overhead (token/key rotation, branch cleanup, rate limits). +- Higher flakiness due to external service dependency and internet variability. +- Secret handling is stricter in CI (especially for PRs from forks). + +Security/ops considerations: + +- Use a dedicated low-privilege bot account and repo. +- Scope credentials to one repo and minimal permissions. +- Never run secret-bearing jobs for untrusted fork PRs. +- Auto-clean old e2e branches with retention policy. + +## Recommendation + +Adopt a **three-layer strategy**: + +1. **Layer 1 (required in CI): install smoke tests without Gitea** +2. **Layer 2 (targeted quickstart journey with Gitea): one focused scenario** +3. **Layer 3 (external reality check): periodic quickstart run against dedicated GitHub repo** + +This balances reliability and confidence: + +- Layer 1 catches most breakages early (chart, manifest, certs, webhook, rollout). +- Layer 2 ensures we do not disappoint new users on the full "it commits to git" story. +- Layer 3 validates real hosted-provider behavior without making every PR depend on external systems. + +## Proposed test matrix + +### Layer 1: `install-smoke` + +Run on every PR: + +- Scenario 1: Helm chart install (raw/basic values) +- Scenario 2: Generated `dist/install.yaml` install + +Assertions: + +- Namespace/resources created +- Deployment available and pod ready +- CRDs installed +- Validating webhook configuration exists + +### Layer 2: `quickstart-e2e` + +Run on main and/or nightly at first (can be promoted to PR later): + +- Start fresh Kind cluster +- Install via `dist/install.yaml` (quickstart parity) +- Bring up lightweight local Git endpoint (Gitea as today) +- Apply minimal quickstart CRs +- Create test ConfigMap +- Assert git repo contains expected YAML/commit + +### Layer 3: `quickstart-e2e-github` + +Run on schedule (nightly) and on protected branches only: + +- Start fresh Kind cluster +- Install via `dist/install.yaml` +- Configure GitHub credentials from CI secrets +- Apply minimal quickstart CRs against dedicated e2e repo +- Create test ConfigMap +- Assert commit/file appears in dedicated branch +- Optionally delete branch at end (or rely on periodic cleanup job) + +## CI integration proposal + +- Add dedicated Make targets: + - `test-e2e-install-helm` + - `test-e2e-install-manifest` + - `test-e2e-quickstart` (includes Gitea) + - `test-e2e-quickstart-github` (external provider validation) +- Add a new workflow job for Layer 1 and keep it mandatory. +- Add Layer 2 as non-blocking initially; promote to required once stable. +- Add Layer 3 as scheduled/protected-branch only (non-blocking for PRs). + +## Success criteria + +- PRs fail if Helm/basic install or `install.yaml` install cannot roll out cleanly. +- Quickstart flow test validates an actual commit path at least on main/nightly. +- External GitHub quickstart path is exercised regularly and alerts on failures. +- Runtime overhead remains acceptable and failures are actionable. + +## Rollout plan + +1. Implement Layer 1 install smoke tests first. +2. Land CI wiring and make it required. +3. Implement Layer 2 quickstart-with-Gitea scenario. +4. Observe flakiness for 1-2 weeks; then decide if Layer 2 should be required on PRs. +5. Add Layer 3 scheduled GitHub-repo validation with strict secret handling. diff --git a/test/e2e/scripts/install-smoke.sh b/test/e2e/scripts/install-smoke.sh new file mode 100755 index 00000000..d22e7f22 --- /dev/null +++ b/test/e2e/scripts/install-smoke.sh @@ -0,0 +1,100 @@ +#!/usr/bin/env bash +set -euo pipefail + +MODE="${1:-}" +NAMESPACE="${INSTALL_SMOKE_NAMESPACE:-gitops-reverser}" +RELEASE_NAME="${INSTALL_SMOKE_RELEASE:-gitops-reverser}" +DEPLOYMENT_NAME="${INSTALL_SMOKE_DEPLOYMENT:-gitops-reverser}" + +if [[ -z "${MODE}" ]]; then + echo "usage: $0 " + exit 1 +fi + +cleanup_install() { + helm uninstall "${RELEASE_NAME}" --namespace "${NAMESPACE}" >/dev/null 2>&1 || true + kubectl delete namespace "${NAMESPACE}" --wait=true --ignore-not-found=true >/dev/null 2>&1 || true +} + +configure_project_image() { + if [[ -z "${PROJECT_IMAGE:-}" ]]; then + return + fi + + if [[ "${PROJECT_IMAGE}" != *":"* ]]; then + echo "Skipping image override for PROJECT_IMAGE=${PROJECT_IMAGE} (no tag present)" + return + fi + + local image_repo image_tag + image_repo="${PROJECT_IMAGE%:*}" + image_tag="${PROJECT_IMAGE##*:}" + + helm upgrade --install "${RELEASE_NAME}" charts/gitops-reverser \ + --namespace "${NAMESPACE}" \ + --create-namespace \ + --set "image.repository=${image_repo}" \ + --set "image.tag=${image_tag}" +} + +install_helm() { + echo "Installing from Helm chart (mode=helm)" + cleanup_install + + if [[ -n "${PROJECT_IMAGE:-}" && "${PROJECT_IMAGE}" == *":"* ]]; then + configure_project_image + return + fi + + helm upgrade --install "${RELEASE_NAME}" charts/gitops-reverser \ + --namespace "${NAMESPACE}" \ + --create-namespace +} + +install_manifest() { + echo "Installing from generated dist/install.yaml (mode=manifest)" + cleanup_install + make build-installer + kubectl apply -f dist/install.yaml + + if [[ -n "${PROJECT_IMAGE:-}" ]]; then + kubectl -n "${NAMESPACE}" set image deployment/"${DEPLOYMENT_NAME}" manager="${PROJECT_IMAGE}" + fi +} + +verify_installation() { + echo "Waiting for deployment rollout" + kubectl -n "${NAMESPACE}" rollout status deployment/"${DEPLOYMENT_NAME}" --timeout=180s + + echo "Checking deployment availability" + kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/"${DEPLOYMENT_NAME}" --timeout=120s + + echo "Checking pod readiness" + kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout=120s + + echo "Checking CRDs" + kubectl get crd \ + gitproviders.configbutler.ai \ + gittargets.configbutler.ai \ + watchrules.configbutler.ai \ + clusterwatchrules.configbutler.ai >/dev/null + + echo "Checking validating webhook configuration" + kubectl get validatingwebhookconfiguration gitops-reverser-validating-webhook-configuration >/dev/null +} + +case "${MODE}" in + helm) + install_helm + ;; + manifest) + install_manifest + ;; + *) + echo "unsupported mode: ${MODE} (expected helm or manifest)" + exit 1 + ;; +esac + +verify_installation +echo "Install smoke test passed (${MODE})" From a45a7435c4bce65c307368121192d6ca31798a87 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 14:19:34 +0000 Subject: [PATCH 12/32] chore: Testing the helm output --- .github/workflows/ci.yml | 49 +++++++++++++------ Makefile | 1 + .../gitops-reverser/templates/namespace.yaml | 8 +++ charts/gitops-reverser/values.yaml | 3 ++ test/e2e/scripts/install-smoke.sh | 8 ++- 5 files changed, 52 insertions(+), 17 deletions(-) create mode 100644 charts/gitops-reverser/templates/namespace.yaml diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index c9651f6d..57f072ed 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -155,9 +155,24 @@ jobs: --set image.repository=test/image \ --set image.tag=test - - name: Validate packaged chart + - name: Generate install.yaml from Helm chart run: | - helm package charts/gitops-reverser --destination /tmp + make build-installer + + - name: Package Helm chart + run: | + helm package charts/gitops-reverser --destination . + mv gitops-reverser-*.tgz gitops-reverser.tgz + + - name: Upload release bundle artifact + uses: actions/upload-artifact@v6 + with: + name: release-bundle + path: | + dist/install.yaml + gitops-reverser.tgz + if-no-files-found: error + retention-days: 1 lint: name: Lint Go Code @@ -256,8 +271,7 @@ jobs: uses: helm/kind-action@v1.13.0 with: cluster_name: ${{ env.KIND_CLUSTER }} - version: v0.30.0 - kubectl_version: v1.32.3 + version: v0.31.0 config: test/e2e/kind/cluster.yaml wait: 5m @@ -295,7 +309,7 @@ jobs: e2e-install-smoke: name: E2E Install Smoke (${{ matrix.scenario }}) runs-on: ubuntu-latest - needs: [build-ci-container, docker-build] + needs: [build-ci-container, docker-build, lint-helm] strategy: matrix: scenario: [helm, manifest] @@ -308,6 +322,12 @@ jobs: - name: Checkout code uses: actions/checkout@v6 + - name: Download tested release bundle artifact + uses: actions/download-artifact@v7 + with: + name: release-bundle + path: . + - name: Generate Kind cluster config from template env: HOST_PROJECT_PATH: ${{ github.workspace }} @@ -321,8 +341,7 @@ jobs: uses: helm/kind-action@v1.13.0 with: cluster_name: ${{ env.KIND_CLUSTER }} - version: v0.30.0 - kubectl_version: v1.32.3 + version: v0.31.0 config: test/e2e/kind/cluster.yaml wait: 5m @@ -504,7 +523,7 @@ jobs: publish-helm: name: Publish Helm Chart runs-on: ubuntu-latest - needs: [build-ci-container, release-please] + needs: [build-ci-container, e2e-install-smoke, release-please] if: needs.release-please.outputs.release_created == 'true' container: image: ${{ needs.build-ci-container.outputs.image }} @@ -518,21 +537,19 @@ jobs: - name: Configure Git safe directory run: git config --global --add safe.directory /__w/gitops-reverser/gitops-reverser - - name: Generate install.yaml from Helm chart (also does helm-sync) - run: | - make build-installer + - name: Download tested release bundle artifact + uses: actions/download-artifact@v7 + with: + name: release-bundle + path: . - name: Login to GitHub Container Registry run: | echo "${{ secrets.GITHUB_TOKEN }}" | helm registry login ${{ env.REGISTRY }} --username ${{ github.actor }} --password-stdin - - name: Package Helm chart - run: | - helm package charts/gitops-reverser --destination .helm-charts - - name: Push Helm chart to GHCR run: | - helm push .helm-charts/gitops-reverser-${{ needs.release-please.outputs.version }}.tgz oci://${{ env.CHART_REGISTRY }} + helm push ./gitops-reverser.tgz oci://${{ env.CHART_REGISTRY }} - name: Upload install.yaml as release asset uses: softprops/action-gh-release@v2 diff --git a/Makefile b/Makefile index a1563bf6..dc64c141 100644 --- a/Makefile +++ b/Makefile @@ -145,6 +145,7 @@ build-installer: manifests helm-sync ## Generate a consolidated YAML from Helm c @$(HELM) template gitops-reverser charts/gitops-reverser \ --namespace gitops-reverser \ --set labels.managedBy=kubectl \ + --set createNamespace=true \ --include-crds > dist/install.yaml @echo "✅ Generated dist/install.yaml ($(shell wc -l < dist/install.yaml) lines)" diff --git a/charts/gitops-reverser/templates/namespace.yaml b/charts/gitops-reverser/templates/namespace.yaml new file mode 100644 index 00000000..54e579d7 --- /dev/null +++ b/charts/gitops-reverser/templates/namespace.yaml @@ -0,0 +1,8 @@ +{{- if .Values.createNamespace }} +apiVersion: v1 +kind: Namespace +metadata: + name: {{ .Release.Namespace }} + labels: + {{- include "gitops-reverser.labels" . | nindent 4 }} +{{- end }} diff --git a/charts/gitops-reverser/values.yaml b/charts/gitops-reverser/values.yaml index 4e023357..1056d294 100644 --- a/charts/gitops-reverser/values.yaml +++ b/charts/gitops-reverser/values.yaml @@ -13,6 +13,9 @@ image: imagePullSecrets: [] nameOverride: "" fullnameOverride: "" +# Create the release namespace as part of chart rendering (useful for install.yaml workflows). +# Keep false for standard Helm installs where --create-namespace is preferred. +createNamespace: false serviceAccount: # Specifies whether a service account should be created diff --git a/test/e2e/scripts/install-smoke.sh b/test/e2e/scripts/install-smoke.sh index d22e7f22..8dd3d31f 100755 --- a/test/e2e/scripts/install-smoke.sh +++ b/test/e2e/scripts/install-smoke.sh @@ -54,7 +54,13 @@ install_helm() { install_manifest() { echo "Installing from generated dist/install.yaml (mode=manifest)" cleanup_install - make build-installer + + if [[ ! -f dist/install.yaml ]]; then + echo "dist/install.yaml not found; generating locally" + make build-installer + fi + + kubectl create namespace "${NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f - kubectl apply -f dist/install.yaml if [[ -n "${PROJECT_IMAGE:-}" ]]; then From 34f8294fc24b91dedeb27632b037d0dbac1e54db Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 14:34:20 +0000 Subject: [PATCH 13/32] Let's give it a try --- .github/workflows/ci.yml | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 57f072ed..d1f4da16 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -303,7 +303,8 @@ jobs: ${{ env.CI_CONTAINER }} \ bash -c " git config --global --add safe.directory /workspace - make test-e2e + test -n \"\$PROJECT_IMAGE\" + PROJECT_IMAGE=\"\$PROJECT_IMAGE\" KIND_CLUSTER=\"\$KIND_CLUSTER\" make test-e2e " e2e-install-smoke: @@ -374,7 +375,8 @@ jobs: ${{ env.CI_CONTAINER }} \ bash -c " git config --global --add safe.directory /workspace - make ${TARGET} + test -n \"\$PROJECT_IMAGE\" + PROJECT_IMAGE=\"\$PROJECT_IMAGE\" KIND_CLUSTER=\"\$KIND_CLUSTER\" make ${TARGET} " # Release job only runs on push to main after tests pass From 8a25db94845d5a652607a6cfd87c74426749a10f Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 15:00:04 +0000 Subject: [PATCH 14/32] Let's try it! --- Makefile | 4 ++-- config/{manager.yaml => deployment.yaml} | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) rename config/{manager.yaml => deployment.yaml} (98%) diff --git a/Makefile b/Makefile index dc64c141..6b695222 100644 --- a/Makefile +++ b/Makefile @@ -82,7 +82,7 @@ cleanup-cluster: ## Tear down the Kind cluster used for e2e tests .PHONY: test-e2e test-e2e: setup-cluster cleanup-webhook setup-e2e manifests setup-port-forwards ## Run end-to-end tests in Kind cluster, note that vet, fmt and generate are not run! - KIND_CLUSTER=$(KIND_CLUSTER) go test ./test/e2e/ -v -ginkgo.v + KIND_CLUSTER=$(KIND_CLUSTER) PROJECT_IMAGE=$(PROJECT_IMAGE) go test ./test/e2e/ -v -ginkgo.v .PHONY: cleanup-webhook cleanup-webhook: ## Preventive cleanup of ValidatingWebhookConfiguration potenially left by previous test runs @@ -161,7 +161,7 @@ uninstall: manifests ## Uninstall CRDs from the K8s cluster specified in ~/.kube .PHONY: deploy deploy: manifests ## Deploy controller to the K8s cluster specified in ~/.kube/config. - cd config && $(KUSTOMIZE) edit set image controller=${IMG} + cd config && $(KUSTOMIZE) edit set image gitops-reverser=${IMG} $(KUSTOMIZE) build config | $(KUBECTL) apply -f - .PHONY: undeploy diff --git a/config/manager.yaml b/config/deployment.yaml similarity index 98% rename from config/manager.yaml rename to config/deployment.yaml index 3d6319bc..eddd54f5 100644 --- a/config/manager.yaml +++ b/config/deployment.yaml @@ -39,7 +39,7 @@ spec: valueFrom: fieldRef: fieldPath: metadata.namespace - image: example.com/gitops-reverser:v0.0.1 + image: gitops-reverser:latest livenessProbe: httpGet: path: /healthz From 7a485726b0e40f18ee5c150d801b26c0da009e9b Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 15:08:28 +0000 Subject: [PATCH 15/32] fix: That should resolve it --- .github/workflows/ci.yml | 6 ------ config/kustomization.yaml | 4 ++-- config/webhook/manifests.yaml | 1 - 3 files changed, 2 insertions(+), 9 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d1f4da16..ff9dfadf 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -285,12 +285,6 @@ jobs: run: | echo "${{ secrets.GITHUB_TOKEN }}" | docker login ${{ env.REGISTRY }} -u ${{ github.actor }} --password-stdin - - name: Pull and load image to Kind - run: | - echo "Pulling image: ${{ env.PROJECT_IMAGE }}" - docker pull ${{ env.PROJECT_IMAGE }} - kind load docker-image ${{ env.PROJECT_IMAGE }} --name ${{ env.KIND_CLUSTER }} - - name: Run E2E tests in CI container run: | docker run --rm \ diff --git a/config/kustomization.yaml b/config/kustomization.yaml index beb057a5..38a204a4 100644 --- a/config/kustomization.yaml +++ b/config/kustomization.yaml @@ -5,10 +5,10 @@ resources: - crd - rbac - service.yaml -- manager.yaml +- deployment.yaml - certs - webhook.yaml images: -- name: controller +- name: gitops-reverser newName: example.com/gitops-reverser newTag: v0.0.1 diff --git a/config/webhook/manifests.yaml b/config/webhook/manifests.yaml index a307864c..26cdf5e5 100644 --- a/config/webhook/manifests.yaml +++ b/config/webhook/manifests.yaml @@ -10,7 +10,6 @@ webhooks: service: name: webhook-service namespace: system - port: 9443 path: /process-validating-webhook failurePolicy: Ignore name: gitops-reverser.configbutler.ai From 3a03f9e855211844623e9d0e65a6e9dcab2adba4 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 15:24:37 +0000 Subject: [PATCH 16/32] fix: Also make that part simpler please --- .github/workflows/ci.yml | 9 +---- Makefile | 4 --- test/e2e/scripts/install-smoke.sh | 57 +++---------------------------- 3 files changed, 6 insertions(+), 64 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index ff9dfadf..c3badd6a 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -350,12 +350,6 @@ jobs: run: | echo "${{ secrets.GITHUB_TOKEN }}" | docker login ${{ env.REGISTRY }} -u ${{ github.actor }} --password-stdin - - name: Pull and load image to Kind - run: | - echo "Pulling image: ${{ env.PROJECT_IMAGE }}" - docker pull ${{ env.PROJECT_IMAGE }} - kind load docker-image ${{ env.PROJECT_IMAGE }} --name ${{ env.KIND_CLUSTER }} - - name: Run install smoke test in CI container run: | TARGET="test-e2e-install-${{ matrix.scenario }}" @@ -369,8 +363,7 @@ jobs: ${{ env.CI_CONTAINER }} \ bash -c " git config --global --add safe.directory /workspace - test -n \"\$PROJECT_IMAGE\" - PROJECT_IMAGE=\"\$PROJECT_IMAGE\" KIND_CLUSTER=\"\$KIND_CLUSTER\" make ${TARGET} + make ${TARGET} " # Release job only runs on push to main after tests pass diff --git a/Makefile b/Makefile index 6b695222..df4d0d11 100644 --- a/Makefile +++ b/Makefile @@ -259,7 +259,3 @@ test-e2e-install-helm: setup-cluster cleanup-webhook setup-cert-manager wait-cer .PHONY: test-e2e-install-manifest test-e2e-install-manifest: setup-cluster cleanup-webhook setup-cert-manager wait-cert-manager ## Smoke test: install from generated dist/install.yaml and verify rollout @bash test/e2e/scripts/install-smoke.sh manifest - -.PHONY: test-e2e-install-smoke -test-e2e-install-smoke: test-e2e-install-helm test-e2e-install-manifest ## Run all Layer 1 install smoke tests - @echo "✅ All install smoke tests passed" diff --git a/test/e2e/scripts/install-smoke.sh b/test/e2e/scripts/install-smoke.sh index 8dd3d31f..7ace0a6d 100755 --- a/test/e2e/scripts/install-smoke.sh +++ b/test/e2e/scripts/install-smoke.sh @@ -2,81 +2,34 @@ set -euo pipefail MODE="${1:-}" -NAMESPACE="${INSTALL_SMOKE_NAMESPACE:-gitops-reverser}" -RELEASE_NAME="${INSTALL_SMOKE_RELEASE:-gitops-reverser}" -DEPLOYMENT_NAME="${INSTALL_SMOKE_DEPLOYMENT:-gitops-reverser}" +NAMESPACE="gitops-reverser" if [[ -z "${MODE}" ]]; then echo "usage: $0 " exit 1 fi -cleanup_install() { - helm uninstall "${RELEASE_NAME}" --namespace "${NAMESPACE}" >/dev/null 2>&1 || true - kubectl delete namespace "${NAMESPACE}" --wait=true --ignore-not-found=true >/dev/null 2>&1 || true -} - -configure_project_image() { - if [[ -z "${PROJECT_IMAGE:-}" ]]; then - return - fi - - if [[ "${PROJECT_IMAGE}" != *":"* ]]; then - echo "Skipping image override for PROJECT_IMAGE=${PROJECT_IMAGE} (no tag present)" - return - fi - - local image_repo image_tag - image_repo="${PROJECT_IMAGE%:*}" - image_tag="${PROJECT_IMAGE##*:}" - - helm upgrade --install "${RELEASE_NAME}" charts/gitops-reverser \ - --namespace "${NAMESPACE}" \ - --create-namespace \ - --set "image.repository=${image_repo}" \ - --set "image.tag=${image_tag}" -} - install_helm() { echo "Installing from Helm chart (mode=helm)" - cleanup_install - - if [[ -n "${PROJECT_IMAGE:-}" && "${PROJECT_IMAGE}" == *":"* ]]; then - configure_project_image - return - fi - - helm upgrade --install "${RELEASE_NAME}" charts/gitops-reverser \ + helm upgrade --install "cool-release" charts/gitops-reverser \ --namespace "${NAMESPACE}" \ --create-namespace } install_manifest() { echo "Installing from generated dist/install.yaml (mode=manifest)" - cleanup_install - - if [[ ! -f dist/install.yaml ]]; then - echo "dist/install.yaml not found; generating locally" - make build-installer - fi - - kubectl create namespace "${NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f - kubectl apply -f dist/install.yaml - - if [[ -n "${PROJECT_IMAGE:-}" ]]; then - kubectl -n "${NAMESPACE}" set image deployment/"${DEPLOYMENT_NAME}" manager="${PROJECT_IMAGE}" - fi } verify_installation() { echo "Waiting for deployment rollout" - kubectl -n "${NAMESPACE}" rollout status deployment/"${DEPLOYMENT_NAME}" --timeout=180s + kubectl -n "${NAMESPACE}" rollout status deployment/gitops-reverser --timeout=30s echo "Checking deployment availability" - kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/"${DEPLOYMENT_NAME}" --timeout=120s + kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/gitops-reverser --timeout=30s echo "Checking pod readiness" - kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout=120s + kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout=30s echo "Checking CRDs" kubectl get crd \ From d0a8092fcb0ede16417896fe53d4ccb1ceeb4c5d Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 15:29:20 +0000 Subject: [PATCH 17/32] fix: That should fix it --- Makefile | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index df4d0d11..08a63eab 100644 --- a/Makefile +++ b/Makefile @@ -249,13 +249,17 @@ setup-e2e: setup-cert-manager setup-gitea-e2e setup-prometheus-e2e ## Setup all @echo "✅ E2E infrastructure initialized" .PHONY: wait-cert-manager -wait-cert-manager: ## Wait for cert-manager pods to become ready +wait-cert-manager: setup-cert-manager ## Wait for cert-manager pods to become ready @$(KUBECTL) wait --for=condition=ready pod -l app.kubernetes.io/instance=cert-manager -n cert-manager --timeout=300s +## Smoke test: install from local Helm chart and verify rollout +## Only tested in GH for now .PHONY: test-e2e-install-helm -test-e2e-install-helm: setup-cluster cleanup-webhook setup-cert-manager wait-cert-manager manifests helm-sync ## Smoke test: install from local Helm chart and verify rollout +test-e2e-install-helm: setup-e2e wait-cert-manager @bash test/e2e/scripts/install-smoke.sh helm +## Smoke test: install from generated dist/install.yaml and verify rollout +## Only tested in GH for now .PHONY: test-e2e-install-manifest -test-e2e-install-manifest: setup-cluster cleanup-webhook setup-cert-manager wait-cert-manager ## Smoke test: install from generated dist/install.yaml and verify rollout +test-e2e-install-manifest: setup-e2e wait-cert-manager @bash test/e2e/scripts/install-smoke.sh manifest From 5f2118a1230ee866dcc29a84773d9a55aa0e1457 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 15:35:46 +0000 Subject: [PATCH 18/32] fix: Would this now finally work? --- test/e2e/scripts/install-smoke.sh | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/test/e2e/scripts/install-smoke.sh b/test/e2e/scripts/install-smoke.sh index 7ace0a6d..ea0ab4da 100755 --- a/test/e2e/scripts/install-smoke.sh +++ b/test/e2e/scripts/install-smoke.sh @@ -11,9 +11,10 @@ fi install_helm() { echo "Installing from Helm chart (mode=helm)" - helm upgrade --install "cool-release" charts/gitops-reverser \ + helm upgrade --install "name-is-cool-but-not-relevant" charts/gitops-reverser \ --namespace "${NAMESPACE}" \ - --create-namespace + --create-namespace \ + --set fullnameOverride=gitops-reverser } install_manifest() { @@ -23,13 +24,13 @@ install_manifest() { verify_installation() { echo "Waiting for deployment rollout" - kubectl -n "${NAMESPACE}" rollout status deployment/gitops-reverser --timeout=30s + kubectl -n "${NAMESPACE}" rollout status deployment/gitops-reverser --timeout=10s echo "Checking deployment availability" - kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/gitops-reverser --timeout=30s + kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/gitops-reverser --timeout=10s echo "Checking pod readiness" - kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout=30s + kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout=10s echo "Checking CRDs" kubectl get crd \ From 3476237f26179ba6be389b6d9aaafd169f11828a Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 15:52:00 +0000 Subject: [PATCH 19/32] fix: linting issues --- .github/workflows/ci.yml | 1 + internal/watch/resource_filter_test.go | 1 - test/e2e/scripts/install-smoke.sh | 5 +++-- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index c3badd6a..72808c88 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -360,6 +360,7 @@ jobs: -w /workspace \ -e PROJECT_IMAGE=${{ env.PROJECT_IMAGE }} \ -e KIND_CLUSTER=${{ env.KIND_CLUSTER }} \ + -e HELM_CHART_SOURCE="./gitops-reverser.tgz" \ ${{ env.CI_CONTAINER }} \ bash -c " git config --global --add safe.directory /workspace diff --git a/internal/watch/resource_filter_test.go b/internal/watch/resource_filter_test.go index 4c10d116..76763f9f 100644 --- a/internal/watch/resource_filter_test.go +++ b/internal/watch/resource_filter_test.go @@ -36,7 +36,6 @@ func TestShouldIgnoreResource(t *testing.T) { } for _, tt := range tests { - tt := tt t.Run(tt.name, func(t *testing.T) { t.Parallel() got := shouldIgnoreResource(tt.group, tt.resource) diff --git a/test/e2e/scripts/install-smoke.sh b/test/e2e/scripts/install-smoke.sh index ea0ab4da..a04af5c3 100755 --- a/test/e2e/scripts/install-smoke.sh +++ b/test/e2e/scripts/install-smoke.sh @@ -3,6 +3,7 @@ set -euo pipefail MODE="${1:-}" NAMESPACE="gitops-reverser" +HELM_CHART_SOURCE="${HELM_CHART_SOURCE:-charts/gitops-reverser}" if [[ -z "${MODE}" ]]; then echo "usage: $0 " @@ -10,8 +11,8 @@ if [[ -z "${MODE}" ]]; then fi install_helm() { - echo "Installing from Helm chart (mode=helm)" - helm upgrade --install "name-is-cool-but-not-relevant" charts/gitops-reverser \ + echo "Installing from Helm chart (mode=helm, source=${HELM_CHART_SOURCE})" + helm upgrade --install "name-is-cool-but-not-relevant" "${HELM_CHART_SOURCE}" \ --namespace "${NAMESPACE}" \ --create-namespace \ --set fullnameOverride=gitops-reverser From fa69cb3d24d74afa005c1a6b3cd0653498c36a22 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 16:03:59 +0000 Subject: [PATCH 20/32] fix: Remove double crds --- .devcontainer/FINDINGS.md | 101 +++++++ Makefile | 2 + .../clusterwatchrules.configbutler.ai.yaml | 253 ------------------ .../bases/gitproviders.configbutler.ai.yaml | 168 ------------ .../crd/bases/gittargets.configbutler.ai.yaml | 148 ---------- .../crd/bases/watchrules.configbutler.ai.yaml | 232 ---------------- config/crd/kustomization.yaml | 8 +- test/e2e/scripts/install-smoke.sh | 6 +- 8 files changed, 110 insertions(+), 808 deletions(-) create mode 100644 .devcontainer/FINDINGS.md delete mode 100644 config/crd/bases/clusterwatchrules.configbutler.ai.yaml delete mode 100644 config/crd/bases/gitproviders.configbutler.ai.yaml delete mode 100644 config/crd/bases/gittargets.configbutler.ai.yaml delete mode 100644 config/crd/bases/watchrules.configbutler.ai.yaml diff --git a/.devcontainer/FINDINGS.md b/.devcontainer/FINDINGS.md new file mode 100644 index 00000000..45e9943e --- /dev/null +++ b/.devcontainer/FINDINGS.md @@ -0,0 +1,101 @@ +## Findings: `make lint`, cache behavior, and workspace paths + +### 1) `make lint` did not use a warm module cache in this run + +`make lint` executes: + +```make +lint: + $(GOLANGCI_LINT) run +``` + +There is no `GOMODCACHE` override in `Makefile`, so cache behavior depends on the runtime environment defaults. + +Evidence collected during debugging: + +- `go env` reported: + - `GOMODCACHE=/go/pkg/mod` + - `GOCACHE=/home/vscode/.cache/go-build` +- Running `go list ./...` showed many `go: downloading ...` lines, which indicates cache misses (or unavailable cache entries) for current dependencies. +- In restricted execution, writes/access under `/go/pkg/mod/cache/...` were blocked, and module fetches to `proxy.golang.org` were also blocked, which prevented normal dependency resolution. +- This produced a misleading top-level `golangci-lint` error (`no go files to analyze`) even though Go files exist. + +Conclusion: in this environment, `make lint` did not have an effectively usable warm module cache path for dependency resolution. + +### 2) `/workspace` is valid in a devcontainer, but not always the active workspace mount + +Observed runtime paths: + +- Active repo path: `/workspaces/gitops-reverser2` +- `/workspace` exists, but only contains files copied during image build steps. + +Why this happens: + +- `Dockerfile` build steps create image-layer content (here under `/workspace`). +- VS Code Dev Containers then bind-mount your real host repo into the running container (commonly under `/workspaces/` unless overridden). + +Implication in this repo: + +- `.devcontainer/devcontainer.json` `postCreateCommand` currently runs `chown` on `/workspace`. +- That command is valid, but it does not affect the mounted repo at `/workspaces/gitops-reverser2` in this session. + +### 3) Main cause of the lint failure seen in Codex + +Primary cause was execution constraints in this Codex session (restricted network and restricted writable roots), not an intrinsic Go/lint config break in the repository. + +When lint was run with elevated permissions (normal module/network access), it completed and reported actionable issues. + +### 4) Best-practice model: bind mounts for source, volumes for caches + +Use this mental model: + +- Source code: bind mount (live, editable, synced with host filesystem). +- Tool and dependency caches: Docker volumes (persistent across container rebuilds, independent of source tree). + +For Go specifically: + +- `GOMODCACHE` should map to `/go/pkg/mod` (module download cache). +- `GOCACHE` should map to `/home/vscode/.cache/go-build` (compiled package/build cache). + +### 5) Recommended improvements for this repo + +1. Make workspace targeting explicit (optional but reduces ambiguity). + +```json +{ + "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}" +} +``` + +2. Avoid hardcoding `/workspace` in post-create logic. + +Use `${containerWorkspaceFolder}` or relative paths: + +```json +{ + "postCreateCommand": "sudo chown -R vscode:vscode ${containerWorkspaceFolder} || true" +} +``` + +3. Persist Go caches via named volumes. + +```json +{ + "mounts": [ + "source=gomodcache,target=/go/pkg/mod,type=volume", + "source=gobuildcache,target=/home/vscode/.cache/go-build,type=volume", + "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" + ] +} +``` + +Notes: + +- The cache targets above are intentionally mapped to Go defaults in this container. +- Earlier advice that swapped these two cache targets is incorrect. + +### 6) Practical balance (local machine vs container) + +- Keep source code on the host via bind mount for normal editor/Git workflow. +- Keep heavy generated caches and dependencies in container volumes for speed and reproducibility. +- Keep absolute paths out of scripts unless they are the canonical runtime paths for this specific devcontainer configuration. diff --git a/Makefile b/Makefile index 08a63eab..4375d61a 100644 --- a/Makefile +++ b/Makefile @@ -43,10 +43,12 @@ help: ## Display this help. .PHONY: manifests manifests: ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects. + @rm -f config/crd/bases/*.yaml $(CONTROLLER_GEN) rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases .PHONY: helm-sync helm-sync: ## Sync CRDs and roles from config/crd/bases to Helm chart crds directory (for packaging) + @rm -f charts/gitops-reverser/crds/*.yaml @cp config/crd/bases/*.yaml charts/gitops-reverser/crds/ @cp config/rbac/role.yaml charts/gitops-reverser/config diff --git a/config/crd/bases/clusterwatchrules.configbutler.ai.yaml b/config/crd/bases/clusterwatchrules.configbutler.ai.yaml deleted file mode 100644 index 42eea57f..00000000 --- a/config/crd/bases/clusterwatchrules.configbutler.ai.yaml +++ /dev/null @@ -1,253 +0,0 @@ -apiVersion: apiextensions.k8s.io/v1 -kind: CustomResourceDefinition -metadata: - annotations: - controller-gen.kubebuilder.io/version: v0.19.0 - name: clusterwatchrules.configbutler.ai -spec: - group: configbutler.ai - names: - kind: ClusterWatchRule - listKind: ClusterWatchRuleList - plural: clusterwatchrules - singular: clusterwatchrule - scope: Cluster - versions: - - additionalPrinterColumns: - - jsonPath: .spec.destinationRef.name - name: Destination - type: string - - jsonPath: .status.conditions[?(@.type=="Ready")].status - name: Ready - type: string - - jsonPath: .metadata.creationTimestamp - name: Age - type: date - name: v1alpha1 - schema: - openAPIV3Schema: - description: |- - ClusterWatchRule watches resources across the entire cluster. - It provides the ability to audit both cluster-scoped resources (Nodes, ClusterRoles, CRDs) - and namespaced resources across multiple namespaces with per-rule filtering. - - Security model: - - ClusterWatchRule is cluster-scoped and requires cluster-admin permissions - - Referenced GitRepoConfig must have accessPolicy.allowClusterRules set to true - - Each rule can independently specify Cluster or Namespaced scope - - Namespaced rules can optionally filter by namespace labels - - Use cases: - - Audit cluster infrastructure (Nodes, PersistentVolumes, StorageClasses) - - Audit RBAC changes (ClusterRoles, ClusterRoleBindings) - - Audit CRD installations and updates - - Audit resources across multiple namespaces (e.g., all production namespaces) - properties: - apiVersion: - description: |- - APIVersion defines the versioned schema of this representation of an object. - Servers should convert recognized schemas to the latest internal value, and - may reject unrecognized values. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources - type: string - kind: - description: |- - Kind is a string value representing the REST resource this object represents. - Servers may infer this from the endpoint the client submits requests to. - Cannot be updated. - In CamelCase. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds - type: string - metadata: - type: object - spec: - description: spec defines the desired state of ClusterWatchRule. - properties: - rules: - description: |- - Rules define which resources to watch. - Multiple rules create a logical OR - a resource matching ANY rule is watched. - Each rule can specify cluster-scoped or namespaced resources. - items: - description: |- - ClusterResourceRule defines which resources to watch with scope control. - Each rule can independently specify whether it watches cluster-scoped or - namespaced resources, with optional namespace filtering for namespaced resources. - properties: - apiGroups: - description: |- - APIGroups to match. Empty string ("") matches the core API group. - If empty, matches all API groups. - Wildcards supported: "*" matches all groups. - Examples: - - [""] matches core API (nodes, namespaces) - - ["rbac.authorization.k8s.io"] matches RBAC resources - - ["*"] or [] matches all groups - items: - type: string - type: array - apiVersions: - description: |- - APIVersions to match. If empty, matches all versions. - Wildcards supported: "*" matches all versions. - Examples: - - ["v1"] matches only v1 version - - ["*"] or [] matches all versions - items: - type: string - type: array - operations: - description: |- - Operations to watch. If empty, watches all operations (CREATE, UPDATE, DELETE). - Supports: CREATE, UPDATE, DELETE, or * (wildcard for all operations). - Examples: - - ["CREATE", "UPDATE"] watches only creation and updates - - ["*"] or [] watches all operations - items: - description: OperationType specifies the type of operation - that triggers a watch event. - enum: - - CREATE - - UPDATE - - DELETE - - '*' - type: string - type: array - resources: - description: |- - Resources to match (plural names like "nodes", "clusterroles"). - This field is required and determines which resource types trigger this rule. - Wildcard semantics follow Kubernetes admission webhook patterns: - - "*" matches all resources - - "nodes" matches exactly nodes - - "pods" matches exactly pods (for namespaced scope) - items: - type: string - minItems: 1 - type: array - scope: - allOf: - - enum: - - Cluster - - Namespaced - - enum: - - Cluster - - Namespaced - description: |- - Scope defines whether this rule watches Cluster-scoped or Namespaced resources. - - "Cluster": For cluster-scoped resources (Nodes, ClusterRoles, CRDs, etc.). - The namespaceSelector field is ignored for cluster-scoped rules. - - "Namespaced": For namespaced resources (Pods, Deployments, Secrets, etc.). - Optionally filtered by namespaceSelector. - If namespaceSelector is omitted, watches resources in ALL namespaces. - type: string - required: - - resources - - scope - type: object - minItems: 1 - type: array - targetRef: - description: |- - TargetRef references the GitTarget to use. - Must specify namespace. - properties: - group: - default: configbutler.ai - description: |- - API Group of the referent. - Kind of the referrer. - enum: - - configbutler.ai - type: string - kind: - default: GitTarget - description: |- - Kind of the referrer. - Optional because this reference currently only supports a single kind (GitTarget). - Keeping it optional allows users to omit it while still benefiting from CRD defaulting. - enum: - - GitTarget - type: string - name: - type: string - namespace: - description: Required because ClusterWatchRule has no namespace. - type: string - required: - - name - - namespace - type: object - required: - - rules - - targetRef - type: object - status: - description: status defines the observed state of ClusterWatchRule. - properties: - conditions: - description: Conditions represent the latest available observations - of the ClusterWatchRule's state. - items: - description: Condition contains details for one aspect of the current - state of this API Resource. - properties: - lastTransitionTime: - description: |- - lastTransitionTime is the last time the condition transitioned from one status to another. - This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. - format: date-time - type: string - message: - description: |- - message is a human readable message indicating details about the transition. - This may be an empty string. - maxLength: 32768 - type: string - observedGeneration: - description: |- - observedGeneration represents the .metadata.generation that the condition was set based upon. - For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date - with respect to the current state of the instance. - format: int64 - minimum: 0 - type: integer - reason: - description: |- - reason contains a programmatic identifier indicating the reason for the condition's last transition. - Producers of specific condition types may define expected values and meanings for this field, - and whether the values are considered a guaranteed API. - The value should be a CamelCase string. - This field may not be empty. - maxLength: 1024 - minLength: 1 - pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ - type: string - status: - description: status of the condition, one of True, False, Unknown. - enum: - - "True" - - "False" - - Unknown - type: string - type: - description: type of condition in CamelCase or in foo.example.com/CamelCase. - maxLength: 316 - pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ - type: string - required: - - lastTransitionTime - - message - - reason - - status - - type - type: object - type: array - type: object - required: - - spec - type: object - served: true - storage: true - subresources: - status: {} diff --git a/config/crd/bases/gitproviders.configbutler.ai.yaml b/config/crd/bases/gitproviders.configbutler.ai.yaml deleted file mode 100644 index 19c6d198..00000000 --- a/config/crd/bases/gitproviders.configbutler.ai.yaml +++ /dev/null @@ -1,168 +0,0 @@ -apiVersion: apiextensions.k8s.io/v1 -kind: CustomResourceDefinition -metadata: - annotations: - controller-gen.kubebuilder.io/version: v0.19.0 - name: gitproviders.configbutler.ai -spec: - group: configbutler.ai - names: - kind: GitProvider - listKind: GitProviderList - plural: gitproviders - singular: gitprovider - scope: Namespaced - versions: - - name: v1alpha1 - schema: - openAPIV3Schema: - description: GitProvider is the Schema for the gitproviders API. - properties: - apiVersion: - description: |- - APIVersion defines the versioned schema of this representation of an object. - Servers should convert recognized schemas to the latest internal value, and - may reject unrecognized values. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources - type: string - kind: - description: |- - Kind is a string value representing the REST resource this object represents. - Servers may infer this from the endpoint the client submits requests to. - Cannot be updated. - In CamelCase. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds - type: string - metadata: - type: object - spec: - description: spec defines the desired state of GitProvider - properties: - allowedBranches: - description: AllowedBranches restricts which branches can be written - to. - items: - type: string - type: array - push: - description: Push defines the strategy for pushing commits (batching). - properties: - interval: - description: |- - Interval is the maximum time to wait before pushing queued commits. - Defaults to "1m". - type: string - maxCommits: - description: |- - MaxCommits is the maximum number of commits to queue before pushing. - Defaults to 20. - type: integer - type: object - secretRef: - description: SecretRef for authentication credentials (may be nil - for public repos) - properties: - group: - default: "" - description: Group of the referent. - type: string - kind: - default: Secret - description: Kind of the referent. - enum: - - Secret - type: string - name: - description: Name of the Secret. - minLength: 1 - type: string - required: - - name - type: object - url: - description: URL of the repository (HTTP/SSH) - type: string - required: - - allowedBranches - - url - type: object - status: - description: status defines the observed state of GitProvider - properties: - conditions: - description: |- - conditions represent the current state of the GitProvider resource. - Each condition has a unique type and reflects the status of a specific aspect of the resource. - - Standard condition types include: - - "Available": the resource is fully functional - - "Progressing": the resource is being created or updated - - "Degraded": the resource failed to reach or maintain its desired state - - The status of each condition is one of True, False, or Unknown. - items: - description: Condition contains details for one aspect of the current - state of this API Resource. - properties: - lastTransitionTime: - description: |- - lastTransitionTime is the last time the condition transitioned from one status to another. - This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. - format: date-time - type: string - message: - description: |- - message is a human readable message indicating details about the transition. - This may be an empty string. - maxLength: 32768 - type: string - observedGeneration: - description: |- - observedGeneration represents the .metadata.generation that the condition was set based upon. - For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date - with respect to the current state of the instance. - format: int64 - minimum: 0 - type: integer - reason: - description: |- - reason contains a programmatic identifier indicating the reason for the condition's last transition. - Producers of specific condition types may define expected values and meanings for this field, - and whether the values are considered a guaranteed API. - The value should be a CamelCase string. - This field may not be empty. - maxLength: 1024 - minLength: 1 - pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ - type: string - status: - description: status of the condition, one of True, False, Unknown. - enum: - - "True" - - "False" - - Unknown - type: string - type: - description: type of condition in CamelCase or in foo.example.com/CamelCase. - maxLength: 316 - pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ - type: string - required: - - lastTransitionTime - - message - - reason - - status - - type - type: object - type: array - x-kubernetes-list-map-keys: - - type - x-kubernetes-list-type: map - type: object - required: - - spec - type: object - served: true - storage: true - subresources: - status: {} diff --git a/config/crd/bases/gittargets.configbutler.ai.yaml b/config/crd/bases/gittargets.configbutler.ai.yaml deleted file mode 100644 index 9dbb4d1c..00000000 --- a/config/crd/bases/gittargets.configbutler.ai.yaml +++ /dev/null @@ -1,148 +0,0 @@ -apiVersion: apiextensions.k8s.io/v1 -kind: CustomResourceDefinition -metadata: - annotations: - controller-gen.kubebuilder.io/version: v0.19.0 - name: gittargets.configbutler.ai -spec: - group: configbutler.ai - names: - kind: GitTarget - listKind: GitTargetList - plural: gittargets - singular: gittarget - scope: Namespaced - versions: - - name: v1alpha1 - schema: - openAPIV3Schema: - description: GitTarget is the Schema for the gittargets API. - properties: - apiVersion: - description: |- - APIVersion defines the versioned schema of this representation of an object. - Servers should convert recognized schemas to the latest internal value, and - may reject unrecognized values. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources - type: string - kind: - description: |- - Kind is a string value representing the REST resource this object represents. - Servers may infer this from the endpoint the client submits requests to. - Cannot be updated. - In CamelCase. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds - type: string - metadata: - type: object - spec: - description: spec defines the desired state of GitTarget - properties: - branch: - description: |- - Branch to use for this target. - Must be one of the allowed branches in the provider. - type: string - path: - description: Path within the repository to write resources to. - type: string - providerRef: - description: ProviderRef references the GitProvider or Flux GitRepository. - properties: - group: - default: configbutler.ai - description: API Group of the referent. - type: string - kind: - default: GitProvider - description: |- - Kind of the referent. - NOTE: Support for reading from Flux GitRepository is not yet implemented! - type: string - name: - description: Name of the referent. - type: string - required: - - name - type: object - required: - - branch - - providerRef - type: object - status: - description: status defines the observed state of GitTarget - properties: - conditions: - description: Conditions represent the latest available observations - of an object's state - items: - description: Condition contains details for one aspect of the current - state of this API Resource. - properties: - lastTransitionTime: - description: |- - lastTransitionTime is the last time the condition transitioned from one status to another. - This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. - format: date-time - type: string - message: - description: |- - message is a human readable message indicating details about the transition. - This may be an empty string. - maxLength: 32768 - type: string - observedGeneration: - description: |- - observedGeneration represents the .metadata.generation that the condition was set based upon. - For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date - with respect to the current state of the instance. - format: int64 - minimum: 0 - type: integer - reason: - description: |- - reason contains a programmatic identifier indicating the reason for the condition's last transition. - Producers of specific condition types may define expected values and meanings for this field, - and whether the values are considered a guaranteed API. - The value should be a CamelCase string. - This field may not be empty. - maxLength: 1024 - minLength: 1 - pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ - type: string - status: - description: status of the condition, one of True, False, Unknown. - enum: - - "True" - - "False" - - Unknown - type: string - type: - description: type of condition in CamelCase or in foo.example.com/CamelCase. - maxLength: 316 - pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ - type: string - required: - - lastTransitionTime - - message - - reason - - status - - type - type: object - type: array - lastCommit: - description: LastCommit is the SHA of the last commit processed. - type: string - lastPushTime: - description: LastPushTime is the timestamp of the last successful - push. - format: date-time - type: string - type: object - required: - - spec - type: object - served: true - storage: true - subresources: - status: {} diff --git a/config/crd/bases/watchrules.configbutler.ai.yaml b/config/crd/bases/watchrules.configbutler.ai.yaml deleted file mode 100644 index 6fc2e877..00000000 --- a/config/crd/bases/watchrules.configbutler.ai.yaml +++ /dev/null @@ -1,232 +0,0 @@ -apiVersion: apiextensions.k8s.io/v1 -kind: CustomResourceDefinition -metadata: - annotations: - controller-gen.kubebuilder.io/version: v0.19.0 - name: watchrules.configbutler.ai -spec: - group: configbutler.ai - names: - kind: WatchRule - listKind: WatchRuleList - plural: watchrules - singular: watchrule - scope: Namespaced - versions: - - additionalPrinterColumns: - - jsonPath: .spec.destinationRef.name - name: Destination - type: string - - jsonPath: .status.conditions[?(@.type=="Ready")].status - name: Ready - type: string - - jsonPath: .metadata.creationTimestamp - name: Age - type: date - name: v1alpha1 - schema: - openAPIV3Schema: - description: |- - WatchRule watches namespaced resources within its own namespace. - It provides fine-grained control over which resources trigger Git commits, - with filtering by operation type, API group, version, and labels. - - Security model: - - WatchRule is namespace-scoped and can only watch resources in its own namespace - - Use ClusterWatchRule for watching cluster-scoped resources (Nodes, ClusterRoles, etc.) - - RBAC controls who can create/modify WatchRules per namespace - properties: - apiVersion: - description: |- - APIVersion defines the versioned schema of this representation of an object. - Servers should convert recognized schemas to the latest internal value, and - may reject unrecognized values. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources - type: string - kind: - description: |- - Kind is a string value representing the REST resource this object represents. - Servers may infer this from the endpoint the client submits requests to. - Cannot be updated. - In CamelCase. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds - type: string - metadata: - type: object - spec: - description: spec defines the desired state of WatchRule - properties: - rules: - description: |- - Rules define which resources to watch within this namespace. - Multiple rules create a logical OR - a resource matching ANY rule is watched. - Each rule can specify operations, API groups, versions, and resource types. - items: - description: |- - ResourceRule defines a set of namespaced resources to watch. - This follows Kubernetes admission control semantics but simplified for our use case. - All fields except Resources are optional and default to matching all when not specified. - properties: - apiGroups: - description: |- - APIGroups to match. Empty string ("") matches the core API group. - If empty, matches all API groups. - Wildcards supported: "*" matches all groups. - Examples: - - [""] matches core API (pods, services, configmaps) - - ["apps"] matches apps API group (deployments, statefulsets) - - ["", "apps"] matches both core and apps groups - - ["*"] or [] matches all groups - items: - type: string - type: array - apiVersions: - description: |- - APIVersions to match. If empty, matches all versions. - Wildcards supported: "*" matches all versions. - Examples: - - ["v1"] matches only v1 version - - ["v1", "v1beta1"] matches both versions - - ["*"] or [] matches all versions - items: - type: string - type: array - operations: - description: |- - Operations to watch. If empty, watches all operations (CREATE, UPDATE, DELETE). - Supports: CREATE, UPDATE, DELETE, or * (wildcard for all operations). - Examples: - - ["CREATE", "UPDATE"] watches only creation and updates, ignoring deletions - - ["*"] or [] watches all operations - items: - description: OperationType specifies the type of operation - that triggers a watch event. - enum: - - CREATE - - UPDATE - - DELETE - - '*' - type: string - type: array - resources: - description: |- - Resources to match (plural names like "pods", "configmaps"). - This field is required and determines which resource types trigger this rule. - Wildcard semantics follow Kubernetes admission webhook patterns: - - "*" matches all resources - - "pods" matches exactly pods (case-insensitive) - - "pods/*" matches all pod subresources (e.g., pods/log, pods/status) - - "pods/log" matches specific subresource - - For custom resources, use exact group-qualified names: - - "myapps.example.com" matches MyApp CRD - - Note: Prefix/suffix wildcards like "pod*" or "*.example.com" are NOT supported. - Use exact matches or the "*" wildcard for broad matching. - items: - type: string - minItems: 1 - type: array - required: - - resources - type: object - minItems: 1 - type: array - targetRef: - description: |- - TargetRef references the GitTarget to use. - Must be in the same namespace. - properties: - group: - default: configbutler.ai - description: API Group of the referent. - type: string - kind: - default: GitTarget - description: |- - Kind of the referent. - Optional because this reference currently only supports a single kind (GitTarget). - Keeping it optional allows users to omit it while still benefiting from CRD defaulting. - enum: - - GitTarget - type: string - name: - type: string - required: - - name - type: object - required: - - rules - - targetRef - type: object - status: - description: status defines the observed state of WatchRule - properties: - conditions: - description: Conditions represent the latest available observations - of an object's state - items: - description: Condition contains details for one aspect of the current - state of this API Resource. - properties: - lastTransitionTime: - description: |- - lastTransitionTime is the last time the condition transitioned from one status to another. - This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. - format: date-time - type: string - message: - description: |- - message is a human readable message indicating details about the transition. - This may be an empty string. - maxLength: 32768 - type: string - observedGeneration: - description: |- - observedGeneration represents the .metadata.generation that the condition was set based upon. - For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date - with respect to the current state of the instance. - format: int64 - minimum: 0 - type: integer - reason: - description: |- - reason contains a programmatic identifier indicating the reason for the condition's last transition. - Producers of specific condition types may define expected values and meanings for this field, - and whether the values are considered a guaranteed API. - The value should be a CamelCase string. - This field may not be empty. - maxLength: 1024 - minLength: 1 - pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ - type: string - status: - description: status of the condition, one of True, False, Unknown. - enum: - - "True" - - "False" - - Unknown - type: string - type: - description: type of condition in CamelCase or in foo.example.com/CamelCase. - maxLength: 316 - pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ - type: string - required: - - lastTransitionTime - - message - - reason - - status - - type - type: object - type: array - type: object - required: - - spec - type: object - served: true - storage: true - subresources: - status: {} - - diff --git a/config/crd/kustomization.yaml b/config/crd/kustomization.yaml index 67a0ce69..2b954c79 100644 --- a/config/crd/kustomization.yaml +++ b/config/crd/kustomization.yaml @@ -1,7 +1,7 @@ apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - - bases/clusterwatchrules.configbutler.ai.yaml - - bases/gitproviders.configbutler.ai.yaml - - bases/gittargets.configbutler.ai.yaml - - bases/watchrules.configbutler.ai.yaml + - bases/configbutler.ai_clusterwatchrules.yaml + - bases/configbutler.ai_gitproviders.yaml + - bases/configbutler.ai_gittargets.yaml + - bases/configbutler.ai_watchrules.yaml diff --git a/test/e2e/scripts/install-smoke.sh b/test/e2e/scripts/install-smoke.sh index a04af5c3..9cde5c87 100755 --- a/test/e2e/scripts/install-smoke.sh +++ b/test/e2e/scripts/install-smoke.sh @@ -25,13 +25,13 @@ install_manifest() { verify_installation() { echo "Waiting for deployment rollout" - kubectl -n "${NAMESPACE}" rollout status deployment/gitops-reverser --timeout=10s + kubectl -n "${NAMESPACE}" rollout status deployment/gitops-reverser --timeout=30s echo "Checking deployment availability" - kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/gitops-reverser --timeout=10s + kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/gitops-reverser --timeout=30s echo "Checking pod readiness" - kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout=10s + kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout=30s echo "Checking CRDs" kubectl get crd \ From 4626cc4550dcd81b2881c23f931d800ecf6ae939 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 16:27:39 +0000 Subject: [PATCH 21/32] ci: Hopefully improving my ci stuff with this --- .devcontainer/Dockerfile | 6 ++-- .devcontainer/FINDINGS.md | 38 +++++++++++++++++++++-- .devcontainer/devcontainer.json | 8 +++-- .github/workflows/ci.yml | 12 ++++---- Dockerfile | 4 +-- test/e2e/scripts/install-smoke.sh | 50 +++++++++++++++++++++++++++---- 6 files changed, 97 insertions(+), 21 deletions(-) diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile index 53e57386..44ba913d 100644 --- a/.devcontainer/Dockerfile +++ b/.devcontainer/Dockerfile @@ -54,7 +54,7 @@ RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/i | sh -s -- -b /usr/local/bin ${GOLANGCI_LINT_VERSION} # Set working directory -WORKDIR /workspace +WORKDIR /workspaces # Create godev group for shared Go development directory access # This allows both root (during build) and vscode user (during dev) to write to /go @@ -128,8 +128,8 @@ RUN groupadd -f docker && usermod -aG docker vscode # Ensure vscode user can write to workspace (empty, so fast) # Note: /go permissions are already set in CI stage and preserved here -RUN chown -R vscode:vscode /workspace && \ - chmod -R 755 /workspace +RUN chown -R vscode:vscode /workspaces && \ + chmod -R 755 /workspaces # Switch back to vscode user for development USER vscode diff --git a/.devcontainer/FINDINGS.md b/.devcontainer/FINDINGS.md index 45e9943e..21f7d98a 100644 --- a/.devcontainer/FINDINGS.md +++ b/.devcontainer/FINDINGS.md @@ -59,10 +59,11 @@ For Go specifically: ### 5) Recommended improvements for this repo -1. Make workspace targeting explicit (optional but reduces ambiguity). +1. Make workspace targeting explicit (recommended to remove ambiguity). ```json { + "workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/${localWorkspaceFolderBasename},type=bind", "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}" } ``` @@ -94,7 +95,40 @@ Notes: - The cache targets above are intentionally mapped to Go defaults in this container. - Earlier advice that swapped these two cache targets is incorrect. -### 6) Practical balance (local machine vs container) +### 6) Clean local-to-container path strategy + +Recommended baseline: + +- Host repo path: keep your normal path, for example `~/git/gitops-reverser2`. +- Container repo path: standardize on `/workspaces/gitops-reverser2`. +- Do not depend on `/workspace` for active development files. + +This gives: + +- Normal local Git workflow on host. +- Predictable in-container path for scripts/tools. +- Fewer permission/path surprises when onboarding or troubleshooting. + +### 7) Shared Dockerfile for devcontainer and CI: benefits and constraints + +Current setup uses `.devcontainer/Dockerfile` in both local Dev Container and GitHub Actions CI (`.github/workflows/ci.yml`), which is beneficial: + +- Single source of truth for tool versions. +- Less drift between local and CI behavior. + +But it requires discipline: + +- Keep stage intent clear (`ci` stage for CI runtime, `dev` stage for local extras). +- Avoid dev-only assumptions in shared base stages (for example hardcoded workspace paths). +- Keep runtime-mount concerns in `devcontainer.json` (workspace mount, post-create behavior), not in CI-oriented image logic. + +Practical rule: + +- Image should provide tools. +- `devcontainer.json` should define developer runtime ergonomics. +- CI workflow should choose the appropriate image stage and avoid relying on local-mount semantics. + +### 8) Practical balance (local machine vs container) - Keep source code on the host via bind mount for normal editor/Git workflow. - Keep heavy generated caches and dependencies in container volumes for speed and reproducibility. diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 15540575..2f4b1af3 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -5,6 +5,8 @@ "context": "..", "target": "dev" }, + "workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/${localWorkspaceFolderBasename},type=bind", + "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}", "features": { "ghcr.io/devcontainers/features/common-utils:2": { "userUid": "automatic", @@ -62,13 +64,15 @@ ] } }, - "postCreateCommand": "sudo chmod 666 /var/run/docker.sock || true && docker network create -d=bridge --subnet=172.19.0.0/24 kind || true && sudo chown -R vscode:vscode /workspace || true", + "postCreateCommand": "sudo chmod 666 /var/run/docker.sock || true && docker network create -d=bridge --subnet=172.19.0.0/24 kind || true && sudo chown -R vscode:vscode ${containerWorkspaceFolder} || true", "remoteUser": "vscode", "mounts": [ + "source=gomodcache,target=/go/pkg/mod,type=volume", + "source=gobuildcache,target=/home/vscode/.cache/go-build,type=volume", "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" ], "containerEnv": { "HOST_PROJECT_PATH": "${localWorkspaceFolder}", "DOCKER_API_VERSION": "1.44" } -} \ No newline at end of file +} diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 72808c88..6afca514 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -289,14 +289,14 @@ jobs: run: | docker run --rm \ --network host \ - -v ${{ github.workspace }}:/workspace \ + -v ${{ github.workspace }}:/workspaces \ -v $HOME/.kube:/root/.kube \ - -w /workspace \ + -w /workspaces \ -e PROJECT_IMAGE=${{ env.PROJECT_IMAGE }} \ -e KIND_CLUSTER=${{ env.KIND_CLUSTER }} \ ${{ env.CI_CONTAINER }} \ bash -c " - git config --global --add safe.directory /workspace + git config --global --add safe.directory /workspaces test -n \"\$PROJECT_IMAGE\" PROJECT_IMAGE=\"\$PROJECT_IMAGE\" KIND_CLUSTER=\"\$KIND_CLUSTER\" make test-e2e " @@ -355,15 +355,15 @@ jobs: TARGET="test-e2e-install-${{ matrix.scenario }}" docker run --rm \ --network host \ - -v ${{ github.workspace }}:/workspace \ + -v ${{ github.workspace }}:/workspaces \ -v $HOME/.kube:/root/.kube \ - -w /workspace \ + -w /workspaces \ -e PROJECT_IMAGE=${{ env.PROJECT_IMAGE }} \ -e KIND_CLUSTER=${{ env.KIND_CLUSTER }} \ -e HELM_CHART_SOURCE="./gitops-reverser.tgz" \ ${{ env.CI_CONTAINER }} \ bash -c " - git config --global --add safe.directory /workspace + git config --global --add safe.directory /workspaces make ${TARGET} " diff --git a/Dockerfile b/Dockerfile index 13a23307..a3f13fb3 100644 --- a/Dockerfile +++ b/Dockerfile @@ -5,7 +5,7 @@ FROM golang:1.25.6 AS builder ARG TARGETOS ARG TARGETARCH -WORKDIR /workspace +WORKDIR /workspaces # Copy the Go Modules manifests COPY go.mod go.sum ./ @@ -25,7 +25,7 @@ RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o manager cmd/ # Refer to https://github.com/GoogleContainerTools/distroless for more details FROM gcr.io/distroless/static:debug WORKDIR / -COPY --from=builder /workspace/manager . +COPY --from=builder /workspaces/manager . USER 65532:65532 ENTRYPOINT ["/manager"] diff --git a/test/e2e/scripts/install-smoke.sh b/test/e2e/scripts/install-smoke.sh index 9cde5c87..417007b9 100755 --- a/test/e2e/scripts/install-smoke.sh +++ b/test/e2e/scripts/install-smoke.sh @@ -4,6 +4,7 @@ set -euo pipefail MODE="${1:-}" NAMESPACE="gitops-reverser" HELM_CHART_SOURCE="${HELM_CHART_SOURCE:-charts/gitops-reverser}" +WAIT_TIMEOUT="${WAIT_TIMEOUT:-60s}" if [[ -z "${MODE}" ]]; then echo "usage: $0 " @@ -23,15 +24,52 @@ install_manifest() { kubectl apply -f dist/install.yaml } +print_debug_info() { + echo + echo "Install smoke test diagnostics (${MODE})" + echo "Namespace: ${NAMESPACE}" + echo "Deployment status:" + kubectl -n "${NAMESPACE}" get deployment gitops-reverser -o wide || true + echo + echo "Deployment describe:" + kubectl -n "${NAMESPACE}" describe deployment gitops-reverser || true + echo + echo "Pods:" + kubectl -n "${NAMESPACE}" get pods -o wide || true + echo + echo "Controller-manager pod describe:" + kubectl -n "${NAMESPACE}" describe pod -l control-plane=controller-manager || true + echo + echo "Controller-manager logs (last 200 lines):" + kubectl -n "${NAMESPACE}" logs -l control-plane=controller-manager --tail=200 --all-containers=true || true + echo + echo "Recent namespace events:" + kubectl -n "${NAMESPACE}" get events --sort-by=.metadata.creationTimestamp | tail -n 50 || true +} + +run_or_debug() { + local description="$1" + shift + echo "${description}" + if ! "$@"; then + echo "FAILED: ${description}" >&2 + print_debug_info + return 1 + fi +} + verify_installation() { - echo "Waiting for deployment rollout" - kubectl -n "${NAMESPACE}" rollout status deployment/gitops-reverser --timeout=30s + run_or_debug \ + "Waiting for deployment rollout (timeout=${WAIT_TIMEOUT})" \ + kubectl -n "${NAMESPACE}" rollout status deployment/gitops-reverser --timeout="${WAIT_TIMEOUT}" - echo "Checking deployment availability" - kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/gitops-reverser --timeout=30s + run_or_debug \ + "Checking deployment availability (timeout=${WAIT_TIMEOUT})" \ + kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/gitops-reverser --timeout="${WAIT_TIMEOUT}" - echo "Checking pod readiness" - kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout=30s + run_or_debug \ + "Checking pod readiness (timeout=${WAIT_TIMEOUT})" \ + kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout="${WAIT_TIMEOUT}" echo "Checking CRDs" kubectl get crd \ From 9dc1e456716c8f81bc8e0d2d95f6a2161488d514 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 16:30:35 +0000 Subject: [PATCH 22/32] ci: More alignment between local builds and remote builds --- .github/workflows/ci.yml | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 6afca514..e2ec67b7 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -13,6 +13,8 @@ env: IMAGE_NAME: configbutler/gitops-reverser IMAGE_TAG: ci-${{ github.sha }} CHART_REGISTRY: ghcr.io/configbutler/charts + REPO_NAME: ${{ github.event.repository.name }} + CI_WORKDIR: /workspaces/${{ github.event.repository.name }} permissions: contents: write @@ -289,22 +291,21 @@ jobs: run: | docker run --rm \ --network host \ - -v ${{ github.workspace }}:/workspaces \ - -v $HOME/.kube:/root/.kube \ - -w /workspaces \ + -v "${GITHUB_WORKSPACE}:${{ env.CI_WORKDIR }}" \ + -v "$HOME/.kube:/root/.kube" \ + -w "${{ env.CI_WORKDIR }}" \ -e PROJECT_IMAGE=${{ env.PROJECT_IMAGE }} \ -e KIND_CLUSTER=${{ env.KIND_CLUSTER }} \ ${{ env.CI_CONTAINER }} \ bash -c " - git config --global --add safe.directory /workspaces - test -n \"\$PROJECT_IMAGE\" - PROJECT_IMAGE=\"\$PROJECT_IMAGE\" KIND_CLUSTER=\"\$KIND_CLUSTER\" make test-e2e + git config --global --add safe.directory ${{ env.CI_WORKDIR }} + make test-e2e " e2e-install-smoke: name: E2E Install Smoke (${{ matrix.scenario }}) runs-on: ubuntu-latest - needs: [build-ci-container, docker-build, lint-helm] + needs: [lint-helm] strategy: matrix: scenario: [helm, manifest] @@ -355,15 +356,15 @@ jobs: TARGET="test-e2e-install-${{ matrix.scenario }}" docker run --rm \ --network host \ - -v ${{ github.workspace }}:/workspaces \ - -v $HOME/.kube:/root/.kube \ - -w /workspaces \ + -v "${GITHUB_WORKSPACE}:${{ env.CI_WORKDIR }}" \ + -v "$HOME/.kube:/root/.kube" \ + -w "${{ env.CI_WORKDIR }}" \ -e PROJECT_IMAGE=${{ env.PROJECT_IMAGE }} \ -e KIND_CLUSTER=${{ env.KIND_CLUSTER }} \ -e HELM_CHART_SOURCE="./gitops-reverser.tgz" \ ${{ env.CI_CONTAINER }} \ bash -c " - git config --global --add safe.directory /workspaces + git config --global --add safe.directory ${{ env.CI_WORKDIR }} make ${TARGET} " From ab663dde774c7e19a98de8a9730f9bb764bf77bb Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 16:40:48 +0000 Subject: [PATCH 23/32] ci: Hopefully fixing more issues now --- .devcontainer/devcontainer.json | 2 +- .devcontainer/post-create.sh | 19 +++++++++++++++++++ .github/workflows/ci.yml | 2 +- 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 .devcontainer/post-create.sh diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 2f4b1af3..0a6cf4df 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -64,7 +64,7 @@ ] } }, - "postCreateCommand": "sudo chmod 666 /var/run/docker.sock || true && docker network create -d=bridge --subnet=172.19.0.0/24 kind || true && sudo chown -R vscode:vscode ${containerWorkspaceFolder} || true", + "postCreateCommand": "bash .devcontainer/post-create.sh", "remoteUser": "vscode", "mounts": [ "source=gomodcache,target=/go/pkg/mod,type=volume", diff --git a/.devcontainer/post-create.sh b/.devcontainer/post-create.sh new file mode 100644 index 00000000..7c486d0a --- /dev/null +++ b/.devcontainer/post-create.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash + +set -euo pipefail + +# Keep docker socket usable in the devcontainer (best-effort) +sudo chmod 666 /var/run/docker.sock || true + +# Ensure kind network exists (best-effort) +docker network create -d=bridge --subnet=172.19.0.0/24 kind || true + +# Ensure Go-related caches exist and are writable by vscode +sudo mkdir -p \ + /home/vscode/.cache/go-build \ + /home/vscode/.cache/goimports \ + /home/vscode/.cache/golangci-lint + +# Fix ownership for workspace and cache roots used by tooling +sudo chown -R vscode:vscode "${containerWorkspaceFolder}" /home/vscode/.cache || true + diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e2ec67b7..fda9ced4 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -305,7 +305,7 @@ jobs: e2e-install-smoke: name: E2E Install Smoke (${{ matrix.scenario }}) runs-on: ubuntu-latest - needs: [lint-helm] + needs: [build-ci-container, docker-build, lint-helm] strategy: matrix: scenario: [helm, manifest] From 37a11ffd215b6c992ae825adc0b27d9f094361d1 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 16:51:04 +0000 Subject: [PATCH 24/32] ci: Let's see if we can now rebuild without errors --- .devcontainer/devcontainer.json | 5 +++-- .devcontainer/post-create.sh | 28 ++++++++++++++++++++++++++-- 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 0a6cf4df..0b87d945 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -60,15 +60,16 @@ "golang.go", "ms-kubernetes-tools.vscode-kubernetes-tools", "ms-azuretools.vscode-docker", - "kilocode.kilo-code" + "openai.chatgpt" ] } }, - "postCreateCommand": "bash .devcontainer/post-create.sh", + "postCreateCommand": "bash .devcontainer/post-create.sh '${containerWorkspaceFolder}'", "remoteUser": "vscode", "mounts": [ "source=gomodcache,target=/go/pkg/mod,type=volume", "source=gobuildcache,target=/home/vscode/.cache/go-build,type=volume", + "source=${localEnv:HOME}${localEnv:USERPROFILE}/.gitconfig,target=/home/vscode/.gitconfig,type=bind,consistency=cached", "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" ], "containerEnv": { diff --git a/.devcontainer/post-create.sh b/.devcontainer/post-create.sh index 7c486d0a..acfad5a3 100644 --- a/.devcontainer/post-create.sh +++ b/.devcontainer/post-create.sh @@ -2,18 +2,42 @@ set -euo pipefail +log() { + echo "[post-create] $*" +} + # Keep docker socket usable in the devcontainer (best-effort) +log "Setting permissions on /var/run/docker.sock (best-effort)" sudo chmod 666 /var/run/docker.sock || true +# Resolve workspace path in a way that works both inside and outside +# VS Code-specific shell variable injection. +workspace_dir="${1:-${containerWorkspaceFolder:-${WORKSPACE_FOLDER:-$(pwd)}}}" +log "Using workspace directory: ${workspace_dir}" + # Ensure kind network exists (best-effort) -docker network create -d=bridge --subnet=172.19.0.0/24 kind || true +log "Checking docker network 'kind'" +if ! docker network inspect kind >/dev/null 2>&1; then + log "Creating docker network 'kind'" + docker network create -d=bridge --subnet=172.19.0.0/24 kind >/dev/null 2>&1 || true +else + log "Docker network 'kind' already exists" +fi # Ensure Go-related caches exist and are writable by vscode +log "Ensuring Go cache directories exist" sudo mkdir -p \ /home/vscode/.cache/go-build \ /home/vscode/.cache/goimports \ /home/vscode/.cache/golangci-lint # Fix ownership for workspace and cache roots used by tooling -sudo chown -R vscode:vscode "${containerWorkspaceFolder}" /home/vscode/.cache || true +if [ -d "${workspace_dir}" ]; then + log "Fixing ownership for workspace and cache directories" + sudo chown -R vscode:vscode "${workspace_dir}" /home/vscode/.cache || true +else + log "Workspace directory not found; fixing ownership for cache only" + sudo chown -R vscode:vscode /home/vscode/.cache || true +fi +log "post-create completed" From 0db7d6ff9e154a6419cd78488ee32849b11bf05e Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 17:05:27 +0000 Subject: [PATCH 25/32] ci: Rabithole --- .devcontainer/devcontainer.json | 1 + .devcontainer/post-create.sh | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 0b87d945..7bb1382e 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -70,6 +70,7 @@ "source=gomodcache,target=/go/pkg/mod,type=volume", "source=gobuildcache,target=/home/vscode/.cache/go-build,type=volume", "source=${localEnv:HOME}${localEnv:USERPROFILE}/.gitconfig,target=/home/vscode/.gitconfig,type=bind,consistency=cached", + "source=codexconfig,target=/home/vscode/.codex,type=volume", "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" ], "containerEnv": { diff --git a/.devcontainer/post-create.sh b/.devcontainer/post-create.sh index acfad5a3..2ce1ef0e 100644 --- a/.devcontainer/post-create.sh +++ b/.devcontainer/post-create.sh @@ -34,10 +34,10 @@ sudo mkdir -p \ # Fix ownership for workspace and cache roots used by tooling if [ -d "${workspace_dir}" ]; then log "Fixing ownership for workspace and cache directories" - sudo chown -R vscode:vscode "${workspace_dir}" /home/vscode/.cache || true + sudo chown -R vscode:vscode "${workspace_dir}" /home/vscode || true else log "Workspace directory not found; fixing ownership for cache only" - sudo chown -R vscode:vscode /home/vscode/.cache || true + sudo chown -R vscode:vscode /home/vscode || true fi log "post-create completed" From 8785f027f6b196834d7194d144a1ec4cf216d727 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Thu, 12 Feb 2026 18:57:14 +0000 Subject: [PATCH 26/32] ci: It's nice if we can simplify this --- .devcontainer/Dockerfile | 2 +- .../SETUP_CLUSTER_TROUBLESHOOTING.md | 72 +++++++++++++++++++ .devcontainer/devcontainer.json | 4 +- .devcontainer/post-create.sh | 4 -- .github/workflows/ci.yml | 8 +-- test/e2e/kind/cluster-template.yaml | 4 +- 6 files changed, 81 insertions(+), 13 deletions(-) create mode 100644 .devcontainer/SETUP_CLUSTER_TROUBLESHOOTING.md diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile index 44ba913d..736e77c2 100644 --- a/.devcontainer/Dockerfile +++ b/.devcontainer/Dockerfile @@ -138,4 +138,4 @@ USER vscode ENV DEBIAN_FRONTEND=dialog # Default command -CMD ["/bin/bash"] \ No newline at end of file +CMD ["/bin/bash"] diff --git a/.devcontainer/SETUP_CLUSTER_TROUBLESHOOTING.md b/.devcontainer/SETUP_CLUSTER_TROUBLESHOOTING.md new file mode 100644 index 00000000..84977e37 --- /dev/null +++ b/.devcontainer/SETUP_CLUSTER_TROUBLESHOOTING.md @@ -0,0 +1,72 @@ +# Troubleshooting `make setup-cluster` in DevContainer + +## Symptom + +`make setup-cluster` fails and Kind waits for the control-plane API server, with logs like: + +``` +Get "https://172.19.0.2:6443/livez?timeout=10s": dial tcp 172.19.0.2:6443: connect: connection refused +``` + +## Root cause (current setup) + +`test/e2e/kind/start-cluster.sh` generates `test/e2e/kind/cluster.ignore.yaml` from `HOST_PROJECT_PATH`. + +In the current devcontainer config, `HOST_PROJECT_PATH` is set from `${localWorkspaceFolder}`. +That produced: + +``` +hostPath: /home/simon/git/gitops-reverser2/test/e2e/kind/audit +``` + +But that mounted directory exists and is empty in the container, while the real audit files are under: + +``` +/workspaces/gitops-reverser2/test/e2e/kind/audit +``` + +Because the mount source is wrong/empty, kube-apiserver cannot read: + +- `/etc/kubernetes/audit/policy.yaml` +- `/etc/kubernetes/audit/webhook-config.yaml` + +Then kube-apiserver fails startup, and Kind reports API server connection refused on `:6443`. + +## Why this happens + +The path strategy differs by Docker mode: + +- Host Docker socket mode: daemon needs host-visible paths. +- Docker-in-Docker mode: daemon needs container-visible paths. + +Your current config mixes modes and path assumptions, so Kind mount path resolution is inconsistent. + +## Fix options + +1. Use Docker-in-Docker only (recommended) +- Remove host socket mount from `.devcontainer/devcontainer.json`. +- Set `HOST_PROJECT_PATH` to container workspace path (for example `/workspaces/${localWorkspaceFolderBasename}`). + +2. Use host Docker socket only +- Remove `docker-in-docker` feature. +- Keep `HOST_PROJECT_PATH` as host path. + +## Quick verification + +Before running `make setup-cluster`, verify generated config points to a path with files: + +```bash +cat test/e2e/kind/cluster.ignore.yaml +ls -la +``` + +Expected: `policy.yaml` and `webhook-config.yaml` are present. + +## Immediate workaround + +Run setup with a container-visible path explicitly: + +```bash +HOST_PROJECT_PATH=/workspaces/$(basename "$PWD") make setup-cluster +``` + diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 7bb1382e..432fb720 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -70,11 +70,11 @@ "source=gomodcache,target=/go/pkg/mod,type=volume", "source=gobuildcache,target=/home/vscode/.cache/go-build,type=volume", "source=${localEnv:HOME}${localEnv:USERPROFILE}/.gitconfig,target=/home/vscode/.gitconfig,type=bind,consistency=cached", - "source=codexconfig,target=/home/vscode/.codex,type=volume", - "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" + "source=codexconfig,target=/home/vscode/.codex,type=volume" ], "containerEnv": { "HOST_PROJECT_PATH": "${localWorkspaceFolder}", + "PROJECT_PATH": "/workspaces/${localWorkspaceFolderBasename}", "DOCKER_API_VERSION": "1.44" } } diff --git a/.devcontainer/post-create.sh b/.devcontainer/post-create.sh index 2ce1ef0e..09993b65 100644 --- a/.devcontainer/post-create.sh +++ b/.devcontainer/post-create.sh @@ -6,10 +6,6 @@ log() { echo "[post-create] $*" } -# Keep docker socket usable in the devcontainer (best-effort) -log "Setting permissions on /var/run/docker.sock (best-effort)" -sudo chmod 666 /var/run/docker.sock || true - # Resolve workspace path in a way that works both inside and outside # VS Code-specific shell variable injection. workspace_dir="${1:-${containerWorkspaceFolder:-${WORKSPACE_FOLDER:-$(pwd)}}}" diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index fda9ced4..b0593610 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -262,9 +262,9 @@ jobs: - name: Generate Kind cluster config from template env: - HOST_PROJECT_PATH: ${{ github.workspace }} # docker in docker, so we need to pass the host path + PROJECT_PATH: ${{ github.workspace }} run: | - echo "🔧 Generating cluster config with HOST_PROJECT_PATH=${HOST_PROJECT_PATH}" + echo "🔧 Generating cluster config with PROJECT_PATH=${PROJECT_PATH}" envsubst < test/e2e/kind/cluster-template.yaml > test/e2e/kind/cluster.yaml echo "✅ Generated configuration:" cat test/e2e/kind/cluster.yaml @@ -326,9 +326,9 @@ jobs: - name: Generate Kind cluster config from template env: - HOST_PROJECT_PATH: ${{ github.workspace }} + PROJECT_PATH: ${{ github.workspace }} run: | - echo "🔧 Generating cluster config with HOST_PROJECT_PATH=${HOST_PROJECT_PATH}" + echo "🔧 Generating cluster config with PROJECT_PATH=${PROJECT_PATH}" envsubst < test/e2e/kind/cluster-template.yaml > test/e2e/kind/cluster.yaml echo "✅ Generated configuration:" cat test/e2e/kind/cluster.yaml diff --git a/test/e2e/kind/cluster-template.yaml b/test/e2e/kind/cluster-template.yaml index 13154536..cb6be7b4 100644 --- a/test/e2e/kind/cluster-template.yaml +++ b/test/e2e/kind/cluster-template.yaml @@ -5,9 +5,9 @@ kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - # Mount the entire audit directory using HOST path for Docker-in-Docker + # Mount the entire audit directory using $PROJECT_PATH path for Docker-in-Docker (or $HOST_PROJECT_PATH if you use mount to the docker socket) extraMounts: - - hostPath: ${HOST_PROJECT_PATH}/test/e2e/kind/audit + - hostPath: ${PROJECT_PATH}/test/e2e/kind/audit containerPath: /etc/kubernetes/audit readOnly: false From 39cf2a72ba262c2fcbe269e938db0ab92042a75d Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Fri, 13 Feb 2026 08:07:24 +0000 Subject: [PATCH 27/32] ci: This is what DinD would like like (but it fails with networking on my machine). --- .devcontainer/KUBECTL_TLS_DEBUG_REPORT.md | 151 ++++++++++++++++++++++ .devcontainer/devcontainer.json | 8 +- .devcontainer/post-create.sh | 9 -- test/e2e/kind/cluster-template.yaml | 3 + 4 files changed, 160 insertions(+), 11 deletions(-) create mode 100644 .devcontainer/KUBECTL_TLS_DEBUG_REPORT.md diff --git a/.devcontainer/KUBECTL_TLS_DEBUG_REPORT.md b/.devcontainer/KUBECTL_TLS_DEBUG_REPORT.md new file mode 100644 index 00000000..6d73edf3 --- /dev/null +++ b/.devcontainer/KUBECTL_TLS_DEBUG_REPORT.md @@ -0,0 +1,151 @@ +# kubectl TLS Debug Report (DevContainer) + +Date: 2026-02-12 +Scope: Debug `kubectl` failures inside the VS Code devcontainer. + +## Symptom + +Inside devcontainer: + +```bash +kubectl get nodes +``` + +Output: + +```text +tls: failed to verify certificate: x509: certificate signed by unknown authority +``` + +## Commands Run and Results + +### 1) Check kubectl context/config + +```bash +kubectl config current-context +kubectl config get-contexts +kubectl config view --minify --raw +``` + +Result: +- Current context was `kind-gitops-reverser-test-e2e` +- Cluster server was `https://127.0.0.1:44431` + +--- + +### 2) Attempt kubeconfig reset + re-export + +```bash +kubectl config delete-context kind-gitops-reverser-test-e2e || true +kubectl config delete-cluster kind-gitops-reverser-test-e2e || true +kubectl config delete-user kind-gitops-reverser-test-e2e || true +kind export kubeconfig --name gitops-reverser-test-e2e +kubectl get nodes +``` + +Result: +- Context/cluster/user entries were deleted and re-created successfully. +- `kubectl get nodes` still failed with TLS verification error against `127.0.0.1:44431`. + +--- + +### 3) Compare kubeconfig CA vs Kind control-plane CA + +```bash +kubectl config view --raw -o jsonpath='{.clusters[?(@.name=="kind-gitops-reverser-test-e2e")].cluster.certificate-authority-data}' \ + | base64 -d | openssl x509 -noout -fingerprint -sha256 -subject -issuer -dates +``` + +```bash +docker exec gitops-reverser-test-e2e-control-plane \ + openssl x509 -in /etc/kubernetes/pki/ca.crt -noout -fingerprint -sha256 -subject -issuer -dates +``` + +Result: +- Both matched: + - `sha256 Fingerprint=E1:1E:2C:CC:76:B7:6E:A7:7F:A7:F2:EB:D4:54:3D:E9:29:7C:26:EA:69:A8:A9:58:12:86:BF:77:39:D7:67:36` + - `subject=CN = kubernetes` + +--- + +### 4) Inspect certificate actually served on localhost endpoint + +```bash +openssl s_client -connect 127.0.0.1:44431 -showcerts /dev/null \ + | awk '/-----BEGIN CERTIFICATE-----/{f=1} f{print} /-----END CERTIFICATE-----/{exit}' \ + | openssl x509 -noout -fingerprint -sha256 -subject -issuer -dates +``` + +Result: +- Served cert fingerprint was different: + - `sha256 Fingerprint=1C:66:DF:93:B3:D1:AE:57:CF:28:A6:31:48:77:84:9E:A3:A6:6D:E7:1F:E6:0C:15:54:F5:EB:72:55:DB:F7:4C` + - `subject=CN = kube-apiserver` + - `issuer=CN = kubernetes` + +Interpretation: +- kubeconfig trusts CA `E1:...:67:36` +- endpoint `127.0.0.1:44431` presented a chain anchored differently in this environment +- explains x509 verification failure + +--- + +### 5) Verify target container and published port + +```bash +docker ps -a --format 'table {{.ID}}\t{{.Names}}\t{{.Status}}\t{{.Ports}}' +docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}} {{json .NetworkSettings.Ports}}' gitops-reverser-test-e2e-control-plane +``` + +Result: +- Container: `gitops-reverser-test-e2e-control-plane` +- Port mapping showed: `127.0.0.1:44431->6443/tcp` +- Container IP: `172.19.0.2` + +--- + +### 6) Check cluster health from inside control-plane container + +```bash +docker exec gitops-reverser-test-e2e-control-plane \ + kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes +``` + +Result: + +```text +NAME STATUS ROLES AGE VERSION +gitops-reverser-test-e2e-control-plane Ready control-plane ... v1.35.0 +``` + +Interpretation: +- Kind cluster itself is healthy. +- Failure is specific to devcontainer-local endpoint/trust path for `127.0.0.1:44431`. + +--- + +### 7) Additional connectivity test + +```bash +kubectl --insecure-skip-tls-verify=true get nodes -o wide +``` + +Result: + +```text +error: You must be logged in to the server (the server has asked for the client to provide credentials) +``` + +Interpretation: +- Endpoint is reachable. +- Credentials/cert trust path does not match what current kubeconfig expects. + +## Conclusion + +Inside the devcontainer, `kubectl` points to `https://127.0.0.1:44431`, but that endpoint presents a cert chain that does not validate with the CA currently in kubeconfig for this Kind cluster. +The cluster is healthy; issue is endpoint/cert mismatch in the devcontainer runtime networking path. + +## User Observation (Host CLI) + +From host machine CLI, access works and is served on a different port. +This is consistent with endpoint mapping differences between host runtime and devcontainer runtime. + diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 432fb720..9b12a0ea 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -20,12 +20,12 @@ "ghcr.io/devcontainers/features/git:1": {} }, "runArgs": [ - "--network=host", "--group-add=docker" ], "forwardPorts": [ 13000, - 19090 + 19090, + 6443 ], "portsAttributes": { "13000": { @@ -35,6 +35,10 @@ "19090": { "label": "Prometheus", "onAutoForward": "notify" + }, + "6443": { + "label": "Kind API", + "onAutoForward": "notify" } }, "customizations": { diff --git a/.devcontainer/post-create.sh b/.devcontainer/post-create.sh index 09993b65..84f83146 100644 --- a/.devcontainer/post-create.sh +++ b/.devcontainer/post-create.sh @@ -11,15 +11,6 @@ log() { workspace_dir="${1:-${containerWorkspaceFolder:-${WORKSPACE_FOLDER:-$(pwd)}}}" log "Using workspace directory: ${workspace_dir}" -# Ensure kind network exists (best-effort) -log "Checking docker network 'kind'" -if ! docker network inspect kind >/dev/null 2>&1; then - log "Creating docker network 'kind'" - docker network create -d=bridge --subnet=172.19.0.0/24 kind >/dev/null 2>&1 || true -else - log "Docker network 'kind' already exists" -fi - # Ensure Go-related caches exist and are writable by vscode log "Ensuring Go cache directories exist" sudo mkdir -p \ diff --git a/test/e2e/kind/cluster-template.yaml b/test/e2e/kind/cluster-template.yaml index cb6be7b4..a8209ce9 100644 --- a/test/e2e/kind/cluster-template.yaml +++ b/test/e2e/kind/cluster-template.yaml @@ -3,6 +3,9 @@ # with the actual host path by the start-cluster.sh script kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 +networking: + apiServerAddress: "0.0.0.0" + apiServerPort: 6443 nodes: - role: control-plane # Mount the entire audit directory using $PROJECT_PATH path for Docker-in-Docker (or $HOST_PROJECT_PATH if you use mount to the docker socket) From 20dace6c6c654e239838850a3818f6033d2df712 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Fri, 13 Feb 2026 09:09:34 +0000 Subject: [PATCH 28/32] ci: Back to docker-outside-of-docker Let's just use docker on the host, but I did managed to avoid the network=host which makes it all a bit more isolated. --- .devcontainer/FINDINGS.md | 135 ------------------- .devcontainer/KUBECTL_TLS_DEBUG_REPORT.md | 151 --------------------- .devcontainer/devcontainer.json | 21 +-- .devcontainer/post-create.sh | 12 ++ docs/ci/CI_NON_ROOT_USER_ANALYSIS.md | 12 +- docs/ci/FINDINGS.md | 107 +++++++++++++++ docs/ci/GIT_SAFE_DIRECTORY_EXPLAINED.md | 26 ++-- docs/ci/GO_MODULE_PERMISSIONS.md | 6 +- docs/ci/KUBECTL_TLS_DEBUG_REPORT.md | 81 +++++++++++ docs/ci/WINDOWS_DEVCONTAINER_SETUP.md | 156 ++++++---------------- test/e2e/E2E_DEBUGGING.md | 4 +- test/e2e/kind/cluster-template.yaml | 5 +- test/e2e/kind/start-cluster.sh | 14 ++ 13 files changed, 288 insertions(+), 442 deletions(-) delete mode 100644 .devcontainer/FINDINGS.md delete mode 100644 .devcontainer/KUBECTL_TLS_DEBUG_REPORT.md create mode 100644 docs/ci/FINDINGS.md create mode 100644 docs/ci/KUBECTL_TLS_DEBUG_REPORT.md diff --git a/.devcontainer/FINDINGS.md b/.devcontainer/FINDINGS.md deleted file mode 100644 index 21f7d98a..00000000 --- a/.devcontainer/FINDINGS.md +++ /dev/null @@ -1,135 +0,0 @@ -## Findings: `make lint`, cache behavior, and workspace paths - -### 1) `make lint` did not use a warm module cache in this run - -`make lint` executes: - -```make -lint: - $(GOLANGCI_LINT) run -``` - -There is no `GOMODCACHE` override in `Makefile`, so cache behavior depends on the runtime environment defaults. - -Evidence collected during debugging: - -- `go env` reported: - - `GOMODCACHE=/go/pkg/mod` - - `GOCACHE=/home/vscode/.cache/go-build` -- Running `go list ./...` showed many `go: downloading ...` lines, which indicates cache misses (or unavailable cache entries) for current dependencies. -- In restricted execution, writes/access under `/go/pkg/mod/cache/...` were blocked, and module fetches to `proxy.golang.org` were also blocked, which prevented normal dependency resolution. -- This produced a misleading top-level `golangci-lint` error (`no go files to analyze`) even though Go files exist. - -Conclusion: in this environment, `make lint` did not have an effectively usable warm module cache path for dependency resolution. - -### 2) `/workspace` is valid in a devcontainer, but not always the active workspace mount - -Observed runtime paths: - -- Active repo path: `/workspaces/gitops-reverser2` -- `/workspace` exists, but only contains files copied during image build steps. - -Why this happens: - -- `Dockerfile` build steps create image-layer content (here under `/workspace`). -- VS Code Dev Containers then bind-mount your real host repo into the running container (commonly under `/workspaces/` unless overridden). - -Implication in this repo: - -- `.devcontainer/devcontainer.json` `postCreateCommand` currently runs `chown` on `/workspace`. -- That command is valid, but it does not affect the mounted repo at `/workspaces/gitops-reverser2` in this session. - -### 3) Main cause of the lint failure seen in Codex - -Primary cause was execution constraints in this Codex session (restricted network and restricted writable roots), not an intrinsic Go/lint config break in the repository. - -When lint was run with elevated permissions (normal module/network access), it completed and reported actionable issues. - -### 4) Best-practice model: bind mounts for source, volumes for caches - -Use this mental model: - -- Source code: bind mount (live, editable, synced with host filesystem). -- Tool and dependency caches: Docker volumes (persistent across container rebuilds, independent of source tree). - -For Go specifically: - -- `GOMODCACHE` should map to `/go/pkg/mod` (module download cache). -- `GOCACHE` should map to `/home/vscode/.cache/go-build` (compiled package/build cache). - -### 5) Recommended improvements for this repo - -1. Make workspace targeting explicit (recommended to remove ambiguity). - -```json -{ - "workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/${localWorkspaceFolderBasename},type=bind", - "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}" -} -``` - -2. Avoid hardcoding `/workspace` in post-create logic. - -Use `${containerWorkspaceFolder}` or relative paths: - -```json -{ - "postCreateCommand": "sudo chown -R vscode:vscode ${containerWorkspaceFolder} || true" -} -``` - -3. Persist Go caches via named volumes. - -```json -{ - "mounts": [ - "source=gomodcache,target=/go/pkg/mod,type=volume", - "source=gobuildcache,target=/home/vscode/.cache/go-build,type=volume", - "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" - ] -} -``` - -Notes: - -- The cache targets above are intentionally mapped to Go defaults in this container. -- Earlier advice that swapped these two cache targets is incorrect. - -### 6) Clean local-to-container path strategy - -Recommended baseline: - -- Host repo path: keep your normal path, for example `~/git/gitops-reverser2`. -- Container repo path: standardize on `/workspaces/gitops-reverser2`. -- Do not depend on `/workspace` for active development files. - -This gives: - -- Normal local Git workflow on host. -- Predictable in-container path for scripts/tools. -- Fewer permission/path surprises when onboarding or troubleshooting. - -### 7) Shared Dockerfile for devcontainer and CI: benefits and constraints - -Current setup uses `.devcontainer/Dockerfile` in both local Dev Container and GitHub Actions CI (`.github/workflows/ci.yml`), which is beneficial: - -- Single source of truth for tool versions. -- Less drift between local and CI behavior. - -But it requires discipline: - -- Keep stage intent clear (`ci` stage for CI runtime, `dev` stage for local extras). -- Avoid dev-only assumptions in shared base stages (for example hardcoded workspace paths). -- Keep runtime-mount concerns in `devcontainer.json` (workspace mount, post-create behavior), not in CI-oriented image logic. - -Practical rule: - -- Image should provide tools. -- `devcontainer.json` should define developer runtime ergonomics. -- CI workflow should choose the appropriate image stage and avoid relying on local-mount semantics. - -### 8) Practical balance (local machine vs container) - -- Keep source code on the host via bind mount for normal editor/Git workflow. -- Keep heavy generated caches and dependencies in container volumes for speed and reproducibility. -- Keep absolute paths out of scripts unless they are the canonical runtime paths for this specific devcontainer configuration. diff --git a/.devcontainer/KUBECTL_TLS_DEBUG_REPORT.md b/.devcontainer/KUBECTL_TLS_DEBUG_REPORT.md deleted file mode 100644 index 6d73edf3..00000000 --- a/.devcontainer/KUBECTL_TLS_DEBUG_REPORT.md +++ /dev/null @@ -1,151 +0,0 @@ -# kubectl TLS Debug Report (DevContainer) - -Date: 2026-02-12 -Scope: Debug `kubectl` failures inside the VS Code devcontainer. - -## Symptom - -Inside devcontainer: - -```bash -kubectl get nodes -``` - -Output: - -```text -tls: failed to verify certificate: x509: certificate signed by unknown authority -``` - -## Commands Run and Results - -### 1) Check kubectl context/config - -```bash -kubectl config current-context -kubectl config get-contexts -kubectl config view --minify --raw -``` - -Result: -- Current context was `kind-gitops-reverser-test-e2e` -- Cluster server was `https://127.0.0.1:44431` - ---- - -### 2) Attempt kubeconfig reset + re-export - -```bash -kubectl config delete-context kind-gitops-reverser-test-e2e || true -kubectl config delete-cluster kind-gitops-reverser-test-e2e || true -kubectl config delete-user kind-gitops-reverser-test-e2e || true -kind export kubeconfig --name gitops-reverser-test-e2e -kubectl get nodes -``` - -Result: -- Context/cluster/user entries were deleted and re-created successfully. -- `kubectl get nodes` still failed with TLS verification error against `127.0.0.1:44431`. - ---- - -### 3) Compare kubeconfig CA vs Kind control-plane CA - -```bash -kubectl config view --raw -o jsonpath='{.clusters[?(@.name=="kind-gitops-reverser-test-e2e")].cluster.certificate-authority-data}' \ - | base64 -d | openssl x509 -noout -fingerprint -sha256 -subject -issuer -dates -``` - -```bash -docker exec gitops-reverser-test-e2e-control-plane \ - openssl x509 -in /etc/kubernetes/pki/ca.crt -noout -fingerprint -sha256 -subject -issuer -dates -``` - -Result: -- Both matched: - - `sha256 Fingerprint=E1:1E:2C:CC:76:B7:6E:A7:7F:A7:F2:EB:D4:54:3D:E9:29:7C:26:EA:69:A8:A9:58:12:86:BF:77:39:D7:67:36` - - `subject=CN = kubernetes` - ---- - -### 4) Inspect certificate actually served on localhost endpoint - -```bash -openssl s_client -connect 127.0.0.1:44431 -showcerts /dev/null \ - | awk '/-----BEGIN CERTIFICATE-----/{f=1} f{print} /-----END CERTIFICATE-----/{exit}' \ - | openssl x509 -noout -fingerprint -sha256 -subject -issuer -dates -``` - -Result: -- Served cert fingerprint was different: - - `sha256 Fingerprint=1C:66:DF:93:B3:D1:AE:57:CF:28:A6:31:48:77:84:9E:A3:A6:6D:E7:1F:E6:0C:15:54:F5:EB:72:55:DB:F7:4C` - - `subject=CN = kube-apiserver` - - `issuer=CN = kubernetes` - -Interpretation: -- kubeconfig trusts CA `E1:...:67:36` -- endpoint `127.0.0.1:44431` presented a chain anchored differently in this environment -- explains x509 verification failure - ---- - -### 5) Verify target container and published port - -```bash -docker ps -a --format 'table {{.ID}}\t{{.Names}}\t{{.Status}}\t{{.Ports}}' -docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}} {{json .NetworkSettings.Ports}}' gitops-reverser-test-e2e-control-plane -``` - -Result: -- Container: `gitops-reverser-test-e2e-control-plane` -- Port mapping showed: `127.0.0.1:44431->6443/tcp` -- Container IP: `172.19.0.2` - ---- - -### 6) Check cluster health from inside control-plane container - -```bash -docker exec gitops-reverser-test-e2e-control-plane \ - kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes -``` - -Result: - -```text -NAME STATUS ROLES AGE VERSION -gitops-reverser-test-e2e-control-plane Ready control-plane ... v1.35.0 -``` - -Interpretation: -- Kind cluster itself is healthy. -- Failure is specific to devcontainer-local endpoint/trust path for `127.0.0.1:44431`. - ---- - -### 7) Additional connectivity test - -```bash -kubectl --insecure-skip-tls-verify=true get nodes -o wide -``` - -Result: - -```text -error: You must be logged in to the server (the server has asked for the client to provide credentials) -``` - -Interpretation: -- Endpoint is reachable. -- Credentials/cert trust path does not match what current kubeconfig expects. - -## Conclusion - -Inside the devcontainer, `kubectl` points to `https://127.0.0.1:44431`, but that endpoint presents a cert chain that does not validate with the CA currently in kubeconfig for this Kind cluster. -The cluster is healthy; issue is endpoint/cert mismatch in the devcontainer runtime networking path. - -## User Observation (Host CLI) - -From host machine CLI, access works and is served on a different port. -This is consistent with endpoint mapping differences between host runtime and devcontainer runtime. - diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 9b12a0ea..570704a6 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -8,24 +8,21 @@ "workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/${localWorkspaceFolderBasename},type=bind", "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}", "features": { + "ghcr.io/devcontainers/features/docker-outside-of-docker:1": {}, "ghcr.io/devcontainers/features/common-utils:2": { "userUid": "automatic", "userGid": "automatic", "username": "vscode" }, - "ghcr.io/devcontainers/features/docker-in-docker:2": { - "moby": false, - "dockerDashComposeVersion": "v2" - }, "ghcr.io/devcontainers/features/git:1": {} }, "runArgs": [ - "--group-add=docker" + "--group-add=docker", + "--add-host=host.docker.internal:host-gateway" ], "forwardPorts": [ 13000, - 19090, - 6443 + 19090 ], "portsAttributes": { "13000": { @@ -35,10 +32,6 @@ "19090": { "label": "Prometheus", "onAutoForward": "notify" - }, - "6443": { - "label": "Kind API", - "onAutoForward": "notify" } }, "customizations": { @@ -71,14 +64,14 @@ "postCreateCommand": "bash .devcontainer/post-create.sh '${containerWorkspaceFolder}'", "remoteUser": "vscode", "mounts": [ + "source=ghconfig,target=/home/vscode/.config/gh,type=volume", + "source=${localEnv:HOME}${localEnv:USERPROFILE}/.gitconfig,target=/home/vscode/.gitconfig-host,type=bind,readonly,consistency=cached", "source=gomodcache,target=/go/pkg/mod,type=volume", "source=gobuildcache,target=/home/vscode/.cache/go-build,type=volume", - "source=${localEnv:HOME}${localEnv:USERPROFILE}/.gitconfig,target=/home/vscode/.gitconfig,type=bind,consistency=cached", "source=codexconfig,target=/home/vscode/.codex,type=volume" ], "containerEnv": { "HOST_PROJECT_PATH": "${localWorkspaceFolder}", - "PROJECT_PATH": "/workspaces/${localWorkspaceFolderBasename}", - "DOCKER_API_VERSION": "1.44" + "PROJECT_PATH": "/workspaces/${localWorkspaceFolderBasename}" } } diff --git a/.devcontainer/post-create.sh b/.devcontainer/post-create.sh index 84f83146..7f30b16d 100644 --- a/.devcontainer/post-create.sh +++ b/.devcontainer/post-create.sh @@ -11,6 +11,18 @@ log() { workspace_dir="${1:-${containerWorkspaceFolder:-${WORKSPACE_FOLDER:-$(pwd)}}}" log "Using workspace directory: ${workspace_dir}" +# Keep ~/.gitconfig writable inside the container while still importing host settings. +if [ -f /home/vscode/.gitconfig-host ]; then + log "Configuring git to include /home/vscode/.gitconfig-host" + touch /home/vscode/.gitconfig + if git config --global --get-all include.path | grep -Fxq "/home/vscode/.gitconfig-host"; then + log "Host gitconfig include already present" + else + git config --global --add include.path /home/vscode/.gitconfig-host + log "Added host gitconfig include" + fi +fi + # Ensure Go-related caches exist and are writable by vscode log "Ensuring Go cache directories exist" sudo mkdir -p \ diff --git a/docs/ci/CI_NON_ROOT_USER_ANALYSIS.md b/docs/ci/CI_NON_ROOT_USER_ANALYSIS.md index b2476fad..39e1366d 100644 --- a/docs/ci/CI_NON_ROOT_USER_ANALYSIS.md +++ b/docs/ci/CI_NON_ROOT_USER_ANALYSIS.md @@ -61,7 +61,7 @@ ```bash # Current CI workflow mounts: -v $HOME/.kube:/root/.kube # ← Would need to change to non-root home --v ${{ github.workspace }}:/workspace # ← Ownership mismatches +-v ${{ github.workspace }}:/__w/... # ← Ownership mismatches ``` ❌ **GitHub Actions Checkout Complications** @@ -71,14 +71,14 @@ ❌ **Docker-in-Docker Challenges** ```yaml -# E2E tests use Docker socket ---network host --v $HOME/.kube:/root/.kube # ← Root path assumptions +# E2E tests use Docker access and kubeconfig paths +-v /var/run/docker.sock:/var/run/docker.sock +-v $HOME/.kube:/root/.kube # ← Root-home assumptions ``` ❌ **Cache and Artifact Permissions** - Go module cache (`/go/pkg/mod`) -- Build artifacts in `/workspace` +- Build artifacts in mounted workspace paths - GitHub Actions cache restoration - All would need careful permission management @@ -291,4 +291,4 @@ Switch to non-root CI only if: - [`GO_MODULE_PERMISSIONS.md`](GO_MODULE_PERMISSIONS.md) - How we solved dev container permissions - [`WINDOWS_DEVCONTAINER_SETUP.md`](WINDOWS_DEVCONTAINER_SETUP.md) - Windows-specific permission handling - [`.devcontainer/Dockerfile`](../.devcontainer/Dockerfile) - Current implementation -- [`.github/workflows/ci.yml`](../.github/workflows/ci.yml) - CI pipeline configuration \ No newline at end of file +- [`.github/workflows/ci.yml`](../.github/workflows/ci.yml) - CI pipeline configuration diff --git a/docs/ci/FINDINGS.md b/docs/ci/FINDINGS.md new file mode 100644 index 00000000..2df67b0c --- /dev/null +++ b/docs/ci/FINDINGS.md @@ -0,0 +1,107 @@ +## CI/Devcontainer Findings (Current Baseline) + +Last updated: 2026-02-13 + +This folder documents why the repository uses its current devcontainer and CI behavior, especially around Go caches, workspace paths, and Kind access from inside the container. + +### 1) Workspace path model + +Current devcontainer intentionally uses: + +- `workspaceMount`: `source=${localWorkspaceFolder},target=/workspaces/${localWorkspaceFolderBasename},type=bind` +- `workspaceFolder`: `/workspaces/${localWorkspaceFolderBasename}` + +Implications: + +- Active source tree is `/workspaces/`. +- `/workspace` may exist in image layers, but it is not the active bind mount for day-to-day development in this repo. + +### 2) Post-create ownership model + +`devcontainer.json` runs: + +```json +"postCreateCommand": "bash .devcontainer/post-create.sh '${containerWorkspaceFolder}'" +``` + +The script resolves the workspace path dynamically and fixes ownership for: + +- the mounted workspace +- `/home/vscode` cache areas used by tools + +This avoids hardcoded path assumptions and keeps Linux/macOS/Windows setups more consistent. + +### 3) Go cache persistence model + +The repository persists heavy Go caches using named Docker volumes: + +- `/go/pkg/mod` (`gomodcache`) +- `/home/vscode/.cache/go-build` (`gobuildcache`) + +Why: + +- Faster rebuild/reopen cycles +- Stable module/build caching independent of repo bind mount +- Fewer permission regressions than putting caches in the workspace tree + +### 4) Kind + kubectl access model inside devcontainer + +The current working model is: + +- Devcontainer does **not** use `--network=host` +- Devcontainer run args include: + - `--group-add=docker` + - `--add-host=host.docker.internal:host-gateway` +- Kind cluster config sets: + - `networking.apiServerAddress: "0.0.0.0"` +- `test/e2e/kind/start-cluster.sh` rewrites kubeconfig server endpoints from + `127.0.0.1|localhost|0.0.0.0` to `host.docker.internal:` and sets + `tls-server-name=localhost` + +Why this is required: + +- If Docker publishes Kind API server on host loopback only (`127.0.0.1`), it is not reachable via `host.docker.internal` from the container. +- Binding on `0.0.0.0` plus kubeconfig rewrite makes in-container `kubectl` stable without host networking. + +### 5) CI root vs non-root stance + +Current recommendation remains: + +- CI build containers can run as root (ephemeral build context) +- Production runtime must run non-root (already implemented) + +Rationale: + +- Keeps CI simpler and less fragile +- Avoids unnecessary permission workarounds +- Preserves security boundary at runtime where it matters most + +### 6) Git safe.directory note + +`safe.directory` in CI is a normal response to UID mismatch between checkout ownership and container process user. This is not, by itself, evidence that CI must be non-root. + +### 7) Practical verification checklist + +After devcontainer rebuild/reopen: + +```bash +# 1) Kind setup +make setup-cluster + +# 2) Confirm API publish bind (expected 0.0.0.0 or ::) +docker inspect gitops-reverser-test-e2e-control-plane --format '{{json .NetworkSettings.Ports}}' + +# 3) Confirm kubeconfig server rewrite +kubectl config view --minify | sed -n '/server:/p;/tls-server-name:/p' + +# 4) Confirm cluster access +kubectl get nodes +``` + +### 8) Related docs in this folder + +- `KUBECTL_TLS_DEBUG_REPORT.md` - incident timeline and final fix +- `GO_MODULE_PERMISSIONS.md` - why `/go` permissions are managed with shared group + ACLs +- `WINDOWS_DEVCONTAINER_SETUP.md` - Windows-specific mount behavior and expected differences +- `CI_NON_ROOT_USER_ANALYSIS.md` - tradeoffs for CI user model +- `GIT_SAFE_DIRECTORY_EXPLAINED.md` - why `safe.directory` is required in containerized CI diff --git a/docs/ci/GIT_SAFE_DIRECTORY_EXPLAINED.md b/docs/ci/GIT_SAFE_DIRECTORY_EXPLAINED.md index 45c58e43..4d6726fb 100644 --- a/docs/ci/GIT_SAFE_DIRECTORY_EXPLAINED.md +++ b/docs/ci/GIT_SAFE_DIRECTORY_EXPLAINED.md @@ -168,7 +168,7 @@ git config --global --add safe.directory '*' **Better:** Explicitly list trusted paths ```bash -git config --global --add safe.directory /workspace +git config --global --add safe.directory /workspaces/ git config --global --add safe.directory /__w/gitops-reverser/gitops-reverser ``` @@ -290,12 +290,14 @@ jobs: **devcontainer.json:** ```json { - "remoteUser": "root", - "postCreateCommand": "git config --global --add safe.directory /workspace" + "remoteUser": "vscode", + "postCreateCommand": "git config --global --add safe.directory ${containerWorkspaceFolder}" } ``` -**Why:** VS Code mounts workspace (owned by host user) into container (running as root) +**Why:** VS Code mounts host workspace into container and ownership can differ from the active user. + +In this repository, local devcontainer flows usually run as `vscode` and often do not need manual `safe.directory`. CI container jobs are the primary place where this setting is required. ### Example 3: Docker Compose Development @@ -305,10 +307,10 @@ services: dev: image: golang:1.25 volumes: - - .:/workspace # Host files → container + - .:/workspaces/ # Host files → container command: | sh -c " - git config --global --add safe.directory /workspace + git config --global --add safe.directory /workspaces/ make test " ``` @@ -333,11 +335,11 @@ services: ```bash # Test in container -docker run --rm -v $(pwd):/workspace golang:1.25 sh -c " - cd /workspace +docker run --rm -v $(pwd):/workspaces/ golang:1.25 sh -c " + cd /workspaces/ git status # Should fail - git config --global --add safe.directory /workspace + git config --global --add safe.directory /workspaces/ git status # Should work " ``` @@ -349,7 +351,7 @@ docker run --rm -v $(pwd):/workspace golang:1.25 sh -c " git config --global --get-all safe.directory # Output example: -/workspace +/workspaces/ /__w/gitops-reverser/gitops-reverser ``` @@ -357,7 +359,7 @@ git config --global --get-all safe.directory ```bash # Remove specific directory -git config --global --unset-all safe.directory /workspace +git config --global --unset-all safe.directory /workspaces/ # Remove all git config --global --remove-section safe @@ -418,4 +420,4 @@ To add an exception for this directory, call: This tells Git: "I trust this specific repository despite the UID mismatch, because I know it's safe in this ephemeral CI container environment." -**It's a pragmatic security trade-off that makes sense in containerized workflows!** \ No newline at end of file +**It's a pragmatic security trade-off that makes sense in containerized workflows!** diff --git a/docs/ci/GO_MODULE_PERMISSIONS.md b/docs/ci/GO_MODULE_PERMISSIONS.md index 600fe7db..7badcbcc 100644 --- a/docs/ci/GO_MODULE_PERMISSIONS.md +++ b/docs/ci/GO_MODULE_PERMISSIONS.md @@ -11,11 +11,11 @@ **If you're on Windows and experiencing permission issues with the workspace directory**, see [`WINDOWS_DEVCONTAINER_SETUP.md`](WINDOWS_DEVCONTAINER_SETUP.md) for Windows-specific guidance. -The ACL solution described in this document works perfectly for the `/go` directory (container filesystem) but **does not apply to the `/workspace` directory when mounted from Windows**. Windows filesystems don't support Linux ACLs, so a different approach is needed for the workspace. +The ACL solution described in this document works perfectly for the `/go` directory (container filesystem) but **does not apply to the `/workspaces/` directory when mounted from Windows**. Windows filesystems don't support Linux ACLs, so a different approach is needed for the workspace. **TL;DR for Windows users:** - The `/go` directory (Go modules cache) works fine with ACLs ✅ -- The `/workspace` directory (your code) needs the `postCreateCommand` fix ✅ +- The `/workspaces/` directory (your code) relies on the post-create ownership fix ✅ - Best solution: Use WSL2 and clone the repo inside WSL2 for full Linux compatibility ## Correct Implementation Order @@ -302,4 +302,4 @@ drwxrwsr-x+ root godev /go/pkg/mod/newdir # Files created in newdir will now correctly inherit godev group ``` -This is why the solution requires **both** setgid and ACLs working together. \ No newline at end of file +This is why the solution requires **both** setgid and ACLs working together. diff --git a/docs/ci/KUBECTL_TLS_DEBUG_REPORT.md b/docs/ci/KUBECTL_TLS_DEBUG_REPORT.md new file mode 100644 index 00000000..d5e07a99 --- /dev/null +++ b/docs/ci/KUBECTL_TLS_DEBUG_REPORT.md @@ -0,0 +1,81 @@ +# kubectl/Kind Connectivity Debug Report (Devcontainer) + +Date: 2026-02-13 +Scope: Why `kubectl get nodes` failed inside devcontainer after Kind cluster creation, and what fixed it. + +## Symptom + +Inside devcontainer, after `make setup-cluster`: + +```bash +kubectl get nodes +``` + +failed with connection errors to `host.docker.internal:`. + +## What We Observed + +1. Kind cluster creation succeeded and reported healthy control plane. +2. Kubeconfig rewrite logic changed server endpoint from `127.0.0.1:` to `host.docker.internal:`. +3. Docker port publish for Kind control-plane showed loopback-only host bind: + +```text +"6443/tcp":[{"HostIp":"127.0.0.1","HostPort":""}] +``` + +## Root Cause + +The kubeconfig rewrite alone was not enough. + +When Kind publishes API server on host loopback (`127.0.0.1`), that port is reachable from the host itself but not from another container via `host.docker.internal`. + +So the devcontainer tried to connect to `host.docker.internal:`, but host had that port bound only to loopback, resulting in connection refused. + +## Final Fix Applied + +### A) Devcontainer networking model + +In `.devcontainer/devcontainer.json`: + +- removed `--network=host` +- kept `--group-add=docker` +- added `--add-host=host.docker.internal:host-gateway` + +### B) Kind API server bind address + +In `test/e2e/kind/cluster-template.yaml`: + +```yaml +networking: + apiServerAddress: "0.0.0.0" +``` + +This ensures host publish is reachable from devcontainer via `host.docker.internal`. + +### C) Kubeconfig rewrite in cluster setup script + +In `test/e2e/kind/start-cluster.sh`, after `kind export kubeconfig`: + +- detect kubeconfig server host in `{127.0.0.1, localhost, 0.0.0.0}` +- rewrite to `https://host.docker.internal:` +- set `tls-server-name=localhost` + +## Verification Steps + +```bash +make setup-cluster +docker inspect gitops-reverser-test-e2e-control-plane --format '{{json .NetworkSettings.Ports}}' +kubectl config view --minify | sed -n '/server:/p;/tls-server-name:/p' +kubectl get nodes +``` + +Expected: + +- Docker `HostIp` for `6443/tcp` is `0.0.0.0` or `::` +- kubeconfig server points to `https://host.docker.internal:` +- `tls-server-name: localhost` is set +- `kubectl get nodes` succeeds inside devcontainer + +## Why We Kept This Design + +This removes the need for `--network=host` while keeping Kind management from inside the devcontainer working reliably. It is easier to reason about, more explicit, and avoids host-network side effects. diff --git a/docs/ci/WINDOWS_DEVCONTAINER_SETUP.md b/docs/ci/WINDOWS_DEVCONTAINER_SETUP.md index cdb260e4..0d469cfc 100644 --- a/docs/ci/WINDOWS_DEVCONTAINER_SETUP.md +++ b/docs/ci/WINDOWS_DEVCONTAINER_SETUP.md @@ -1,148 +1,70 @@ -# Windows DevContainer Setup Guide +# Windows Devcontainer Setup -## Problem +Last updated: 2026-02-13 -On Windows, the devcontainer works differently than on Linux due to how Docker Desktop handles volume mounts: +## Why Windows behaves differently -1. **Container filesystem (`/go`)**: Full Linux filesystem with ACL support ✅ -2. **Mounted workspace (`/workspace`)**: Windows filesystem mounted via Docker, limited Unix permission support ❌ +When the repo is on the Windows filesystem and mounted into a Linux devcontainer, the mounted workspace does not behave exactly like a native Linux filesystem. -## Symptoms +Typical effects: -- Cannot write files in `/workspace` directory -- Permission denied errors when running `go mod tidy` or other commands -- ACL commands fail on mounted volumes +- ownership/permission friction on the mounted workspace +- slower file I/O than WSL-native storage +- Linux ACL-based fixes that work under `/go` do not fully apply to the mounted source tree -## Root Cause +## Current repo behavior -Windows uses NTFS/ReFS filesystems which don't support Linux ACLs. When Docker Desktop mounts a Windows directory into a Linux container, it uses a compatibility layer that: -- Simulates Unix permissions -- Cannot support `setfacl` or advanced ACLs -- May have permission mapping issues between Windows and Linux users +This repo uses: -## Solution +- active workspace path: `/workspaces/` +- `remoteUser`: `vscode` +- post-create hook: `.devcontainer/post-create.sh` (called with `${containerWorkspaceFolder}`) -The solution is to ensure the workspace directory has proper ownership and permissions for the `vscode` user, without relying on ACLs. +The post-create script attempts to normalize ownership for the mounted workspace and `/home/vscode` caches. -### Updated Dockerfile Approach +## Recommended setup (Windows) -The Dockerfile already handles this correctly: +1. Use WSL2. +2. Clone the repository inside the Linux filesystem (for example under `~/git/...` in Ubuntu). +3. Open from WSL in VS Code, then reopen in container. -```dockerfile -# In dev stage - ensure vscode user can write to workspace -RUN chown -R vscode:vscode /workspace && \ - chmod -R 755 /workspace -``` - -However, this only affects the **empty** `/workspace` directory in the image. When you mount your actual Windows workspace, it **overrides** this with the Windows filesystem. - -### Windows-Specific Configuration - -For Windows users, add this to your `.devcontainer/devcontainer.json`: - -```json -{ - "remoteUser": "vscode", - "containerEnv": { - "WORKSPACE_OWNER": "vscode" - }, - "postCreateCommand": "sudo chown -R vscode:vscode /workspace || true", - "mounts": [ - "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" - ] -} -``` - -The `postCreateCommand` runs **after** the workspace is mounted, ensuring proper ownership. +This is the most reliable and fastest setup. -### Alternative: Use WSL2 Backend - -For the best experience on Windows, use WSL2: - -1. **Install WSL2** with Ubuntu or Debian -2. **Clone the repository inside WSL2** (not in Windows filesystem) -3. **Open in VSCode** using the WSL extension -4. **Use the devcontainer** - it will work exactly like on Linux - -This approach: -- ✅ Full Linux filesystem support -- ✅ ACLs work properly -- ✅ Better performance -- ✅ No permission mapping issues - -### Quick Fix for Existing Setup - -If you're already in the devcontainer and experiencing permission issues: +## Quick checks inside devcontainer ```bash -# Run this inside the devcontainer -sudo chown -R vscode:vscode /workspace -sudo chmod -R 755 /workspace - -# Verify -ls -la /workspace -# Should show vscode:vscode ownership +pwd +ls -ld . +id ``` -## Why `/go` Works But `/workspace` Doesn't +Expected: -| Directory | Location | ACL Support | Why | -|-----------|----------|-------------|-----| -| `/go` | Container filesystem | ✅ Yes | Part of the Linux container image | -| `/workspace` | Mounted from Windows | ❌ No | Windows filesystem mounted via Docker | +- current directory under `/workspaces/` +- effective user is `vscode` +- workspace is writable by `vscode` -The `/go` directory (where Go modules are cached) uses the container's Linux filesystem, so ACLs work perfectly. The `/workspace` directory is mounted from your Windows filesystem, so it doesn't support Linux ACLs. +## If workspace is still not writable -## Recommended Setup for Windows Users - -### Option 1: WSL2 (Recommended) +Run: ```bash -# In WSL2 terminal -cd ~ -git clone -cd gitops-reverser -code . # Opens in VSCode with WSL extension -# Then reopen in container +bash .devcontainer/post-create.sh "${containerWorkspaceFolder:-$(pwd)}" ``` -### Option 2: Windows with Post-Create Fix - -Update `.devcontainer/devcontainer.json`: - -```json -{ - "postCreateCommand": "sudo chown -R vscode:vscode /workspace && sudo chmod -R 755 /workspace || true" -} -``` - -### Option 3: Run as Root (Not Recommended) - -Change `remoteUser` to `root` in `devcontainer.json`, but this is not recommended for security reasons. - -## Verification - -After setup, verify permissions: +Then verify: ```bash -# Check workspace ownership -ls -la /workspace -# Should show: drwxr-xr-x vscode vscode - -# Check you can write files -touch /workspace/test.txt -# Should succeed without errors - -# Check Go operations work +touch .permission-check && rm .permission-check go mod tidy -# Should complete without permission errors - -# Clean up test file -rm /workspace/test.txt ``` +## Notes about `/go` vs workspace + +- `/go` is container filesystem and uses Linux semantics; ACL/setgid strategy documented in `GO_MODULE_PERMISSIONS.md` applies there. +- `/workspaces/` is a bind mount from host; behavior depends on host filesystem and Docker Desktop integration. + ## References -- [Docker Desktop WSL2 Backend](https://docs.docker.com/desktop/wsl/) -- [VSCode Remote - WSL](https://code.visualstudio.com/docs/remote/wsl) -- [Docker Volume Permissions](https://docs.docker.com/storage/bind-mounts/#configure-bind-propagation) \ No newline at end of file +- [Docker Desktop + WSL2](https://docs.docker.com/desktop/features/wsl/) +- [VS Code Remote - WSL](https://code.visualstudio.com/docs/remote/wsl) diff --git a/test/e2e/E2E_DEBUGGING.md b/test/e2e/E2E_DEBUGGING.md index b3c18615..ba3fa69a 100644 --- a/test/e2e/E2E_DEBUGGING.md +++ b/test/e2e/E2E_DEBUGGING.md @@ -61,7 +61,7 @@ go_goroutines{job="gitops-reverser-metrics"} ``` Host Machine (port 13000, 19090) - ↕ (exposed via --network=host) + ↕ (VS Code forwarded ports from devcontainer) DevContainer ↕ (kubectl port-forward) Kind Cluster @@ -117,4 +117,4 @@ make setup-port-forwards # Start port-forwards (Gitea:13000, Prometheus:19090 make cleanup-port-forwards # Stop all port-forwards make e2e-setup # Setup Gitea + Prometheus + port-forwards make test-e2e # Run e2e tests (includes port-forwards) -make e2e-cleanup # Clean up all infrastructure \ No newline at end of file +make e2e-cleanup # Clean up all infrastructure diff --git a/test/e2e/kind/cluster-template.yaml b/test/e2e/kind/cluster-template.yaml index a8209ce9..ce596cfe 100644 --- a/test/e2e/kind/cluster-template.yaml +++ b/test/e2e/kind/cluster-template.yaml @@ -4,13 +4,14 @@ kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 networking: + # Bind the API server on all host interfaces so devcontainer clients can + # reach it via host.docker.internal. apiServerAddress: "0.0.0.0" - apiServerPort: 6443 nodes: - role: control-plane # Mount the entire audit directory using $PROJECT_PATH path for Docker-in-Docker (or $HOST_PROJECT_PATH if you use mount to the docker socket) extraMounts: - - hostPath: ${PROJECT_PATH}/test/e2e/kind/audit + - hostPath: ${HOST_PROJECT_PATH}/test/e2e/kind/audit containerPath: /etc/kubernetes/audit readOnly: false diff --git a/test/e2e/kind/start-cluster.sh b/test/e2e/kind/start-cluster.sh index 11094a99..a46c0b31 100755 --- a/test/e2e/kind/start-cluster.sh +++ b/test/e2e/kind/start-cluster.sh @@ -38,4 +38,18 @@ echo "✅ Kind cluster created successfully" echo "📋 Configuring kubeconfig for cluster '$CLUSTER_NAME'..." kind export kubeconfig --name "$CLUSTER_NAME" +current_cluster_name="$(kubectl config view --minify -o jsonpath='{.clusters[0].name}')" +current_server="$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')" + +if [[ "$current_server" =~ ^https://(127\.0\.0\.1|localhost|0\.0\.0\.0):([0-9]+)$ ]]; then + apiserver_port="${BASH_REMATCH[2]}" + echo "🔁 Rewriting kubeconfig server endpoint to host.docker.internal:${apiserver_port}..." + kubectl config set-cluster "$current_cluster_name" \ + --server="https://host.docker.internal:${apiserver_port}" \ + --tls-server-name=localhost >/dev/null + echo "✅ kubeconfig endpoint updated for devcontainer networking" +else + echo "ℹ️ kubeconfig server is '$current_server' (no rewrite needed)" +fi + echo "✅ Cluster setup complete!" From c1456184832939fd12f5a0cca230bb39d4fc20d3 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Fri, 13 Feb 2026 09:17:52 +0000 Subject: [PATCH 29/32] ci: Also switch back here offcourse --- .github/workflows/ci.yml | 8 ++++---- test/e2e/kind/cluster-template.yaml | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index b0593610..453c733f 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -262,9 +262,9 @@ jobs: - name: Generate Kind cluster config from template env: - PROJECT_PATH: ${{ github.workspace }} + HOST_PROJECT_PATH: ${{ github.workspace }} run: | - echo "🔧 Generating cluster config with PROJECT_PATH=${PROJECT_PATH}" + echo "🔧 Generating cluster config with HOST_PROJECT_PATH=${HOST_PROJECT_PATH}" envsubst < test/e2e/kind/cluster-template.yaml > test/e2e/kind/cluster.yaml echo "✅ Generated configuration:" cat test/e2e/kind/cluster.yaml @@ -326,9 +326,9 @@ jobs: - name: Generate Kind cluster config from template env: - PROJECT_PATH: ${{ github.workspace }} + HOST_PROJECT_PATH: ${{ github.workspace }} run: | - echo "🔧 Generating cluster config with PROJECT_PATH=${PROJECT_PATH}" + echo "🔧 Generating cluster config with HOST_PROJECT_PATH=${HOST_PROJECT_PATH}" envsubst < test/e2e/kind/cluster-template.yaml > test/e2e/kind/cluster.yaml echo "✅ Generated configuration:" cat test/e2e/kind/cluster.yaml diff --git a/test/e2e/kind/cluster-template.yaml b/test/e2e/kind/cluster-template.yaml index ce596cfe..2b9619bc 100644 --- a/test/e2e/kind/cluster-template.yaml +++ b/test/e2e/kind/cluster-template.yaml @@ -9,7 +9,7 @@ networking: apiServerAddress: "0.0.0.0" nodes: - role: control-plane - # Mount the entire audit directory using $PROJECT_PATH path for Docker-in-Docker (or $HOST_PROJECT_PATH if you use mount to the docker socket) + # Mount the entire audit directory using $HOST_PROJECT_PATH. extraMounts: - hostPath: ${HOST_PROJECT_PATH}/test/e2e/kind/audit containerPath: /etc/kubernetes/audit From 34402f6a29d0e814003b0a5a6aab34adf8cafba0 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Fri, 13 Feb 2026 11:26:18 +0000 Subject: [PATCH 30/32] fix: Make "--metric-insecure" a reality and local testability of all flows --- Makefile | 75 ++++++++- charts/gitops-reverser/README.md | 10 +- .../templates/certificates.yaml | 28 +++- .../gitops-reverser/templates/deployment.yaml | 42 +++-- .../gitops-reverser/templates/services.yaml | 3 +- charts/gitops-reverser/values.yaml | 13 +- config/kustomization.yaml | 4 +- docs/ci/E2E_IMAGE_ARTIFACT_REUSE_DESIGN.md | 154 ++++++++++++++++++ test/e2e/e2e_suite_test.go | 14 +- test/e2e/kind/start-cluster.sh | 13 +- test/e2e/scripts/install-smoke.sh | 55 ++++++- 11 files changed, 352 insertions(+), 59 deletions(-) create mode 100644 docs/ci/E2E_IMAGE_ARTIFACT_REUSE_DESIGN.md diff --git a/Makefile b/Makefile index 4375d61a..326c4eb5 100644 --- a/Makefile +++ b/Makefile @@ -69,6 +69,7 @@ test: manifests generate fmt vet setup-envtest ## Run tests. KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(shell pwd)/bin -p path)" go test $$(go list ./... | grep -v /e2e) -coverprofile cover.out KIND_CLUSTER ?= gitops-reverser-test-e2e +E2E_LOCAL_IMAGE ?= gitops-reverser:e2e-local .PHONY: setup-cluster setup-cluster: ## Set up a Kind cluster for e2e tests if it does not exist @@ -80,11 +81,40 @@ setup-cluster: ## Set up a Kind cluster for e2e tests if it does not exist .PHONY: cleanup-cluster cleanup-cluster: ## Tear down the Kind cluster used for e2e tests - @$(KIND) delete cluster --name $(KIND_CLUSTER) + @if $(KIND) get clusters 2>/dev/null | grep -q "^$(KIND_CLUSTER)$$"; then \ + echo "🧹 Deleting Kind cluster '$(KIND_CLUSTER)'"; \ + $(KIND) delete cluster --name $(KIND_CLUSTER); \ + else \ + echo "ℹ️ Kind cluster '$(KIND_CLUSTER)' does not exist; skipping cleanup"; \ + fi + +.PHONY: e2e-build-load-image +e2e-build-load-image: ## Build local image and load it into the Kind cluster used by local e2e flows + @if [ -n "$(PROJECT_IMAGE)" ]; then \ + echo "🐳 Building local image $(PROJECT_IMAGE)"; \ + $(CONTAINER_TOOL) build -t $(PROJECT_IMAGE) .; \ + echo "📦 Loading image $(PROJECT_IMAGE) into Kind cluster $(KIND_CLUSTER)"; \ + $(KIND) load docker-image $(PROJECT_IMAGE) --name $(KIND_CLUSTER); \ + else \ + echo "🐳 Building local image $(E2E_LOCAL_IMAGE)"; \ + $(CONTAINER_TOOL) build -t $(E2E_LOCAL_IMAGE) .; \ + echo "📦 Loading image $(E2E_LOCAL_IMAGE) into Kind cluster $(KIND_CLUSTER)"; \ + $(KIND) load docker-image $(E2E_LOCAL_IMAGE) --name $(KIND_CLUSTER); \ + fi .PHONY: test-e2e test-e2e: setup-cluster cleanup-webhook setup-e2e manifests setup-port-forwards ## Run end-to-end tests in Kind cluster, note that vet, fmt and generate are not run! - KIND_CLUSTER=$(KIND_CLUSTER) PROJECT_IMAGE=$(PROJECT_IMAGE) go test ./test/e2e/ -v -ginkgo.v + @echo "ℹ️ test-e2e reuses the existing Kind cluster (no cluster cleanup in this target)"; \ + if [ -n "$(PROJECT_IMAGE)" ]; then \ + echo "ℹ️ Entry point selected pre-built image (CI-friendly): $(PROJECT_IMAGE)"; \ + echo "ℹ️ Skipping local image build/load for pre-built image path"; \ + KIND_CLUSTER=$(KIND_CLUSTER) PROJECT_IMAGE="$(PROJECT_IMAGE)" go test ./test/e2e/ -v -ginkgo.v; \ + else \ + echo "ℹ️ Entry point selected local fallback image: $(E2E_LOCAL_IMAGE)"; \ + echo "ℹ️ Building/loading local image into existing cluster"; \ + $(MAKE) e2e-build-load-image KIND_CLUSTER=$(KIND_CLUSTER); \ + KIND_CLUSTER=$(KIND_CLUSTER) PROJECT_IMAGE="$(E2E_LOCAL_IMAGE)" go test ./test/e2e/ -v -ginkgo.v; \ + fi .PHONY: cleanup-webhook cleanup-webhook: ## Preventive cleanup of ValidatingWebhookConfiguration potenially left by previous test runs @@ -186,7 +216,7 @@ ENVTEST_K8S_VERSION ?= $(shell go list -m -f "{{ .Version }}" k8s.io/api | awk - # Gitea E2E Configuration GITEA_NAMESPACE ?= gitea-e2e - GITEA_CHART_VERSION ?= 12.5.0 # https://gitea.com/gitea/helm-gitea +GITEA_CHART_VERSION ?= 12.5.0 # https://gitea.com/gitea/helm-gitea .PHONY: setup-envtest setup-envtest: ## Setup envtest binaries for unit tests @@ -255,13 +285,40 @@ wait-cert-manager: setup-cert-manager ## Wait for cert-manager pods to become re @$(KUBECTL) wait --for=condition=ready pod -l app.kubernetes.io/instance=cert-manager -n cert-manager --timeout=300s ## Smoke test: install from local Helm chart and verify rollout -## Only tested in GH for now +.PHONY: test-e2e-install +test-e2e-install: ## Smoke test install with E2E_INSTALL_MODE=helm|manifest + @MODE="$(E2E_INSTALL_MODE)"; \ + if [ "$$MODE" != "helm" ] && [ "$$MODE" != "manifest" ]; then \ + echo "❌ Invalid E2E_INSTALL_MODE='$$MODE' (expected: helm|manifest)"; \ + exit 1; \ + fi; \ + PROJECT_IMAGE_VALUE="$(PROJECT_IMAGE)"; \ + if [ -n "$$PROJECT_IMAGE_VALUE" ]; then \ + echo "ℹ️ Entry point selected pre-built image (probably running in CI): $$PROJECT_IMAGE_VALUE"; \ + echo "ℹ️ Skipping cluster cleanup for pre-built image path"; \ + KIND_CLUSTER=$(KIND_CLUSTER) $(MAKE) setup-cluster setup-e2e wait-cert-manager; \ + else \ + PROJECT_IMAGE_VALUE="$(E2E_LOCAL_IMAGE)"; \ + echo "🧹 Local fallback path: cleaning cluster to test a clean install"; \ + KIND_CLUSTER=$(KIND_CLUSTER) $(MAKE) cleanup-cluster; \ + echo "ℹ️ Entry point selected local fallback image: $$PROJECT_IMAGE_VALUE"; \ + KIND_CLUSTER=$(KIND_CLUSTER) PROJECT_IMAGE="$$PROJECT_IMAGE_VALUE" $(MAKE) setup-cluster setup-e2e wait-cert-manager e2e-build-load-image; \ + fi; \ + echo "ℹ️ Running install smoke mode: $$MODE"; \ + PROJECT_IMAGE="$$PROJECT_IMAGE_VALUE" bash test/e2e/scripts/install-smoke.sh "$$MODE"; \ + +## Smoke test: install from local Helm chart and verify rollout .PHONY: test-e2e-install-helm -test-e2e-install-helm: setup-e2e wait-cert-manager - @bash test/e2e/scripts/install-smoke.sh helm +test-e2e-install-helm: + @$(MAKE) test-e2e-install E2E_INSTALL_MODE=helm PROJECT_IMAGE="$(PROJECT_IMAGE)" KIND_CLUSTER="$(KIND_CLUSTER)" ## Smoke test: install from generated dist/install.yaml and verify rollout -## Only tested in GH for now .PHONY: test-e2e-install-manifest -test-e2e-install-manifest: setup-e2e wait-cert-manager - @bash test/e2e/scripts/install-smoke.sh manifest +test-e2e-install-manifest: + @if [ -n "$(PROJECT_IMAGE)" ]; then \ + echo "ℹ️ test-e2e-install-manifest using existing artifact (PROJECT_IMAGE set, CI/pre-built path)"; \ + else \ + echo "ℹ️ test-e2e-install-manifest local path: regenerating dist/install.yaml via build-installer"; \ + $(MAKE) build-installer; \ + fi + @$(MAKE) test-e2e-install E2E_INSTALL_MODE=manifest PROJECT_IMAGE="$(PROJECT_IMAGE)" KIND_CLUSTER="$(KIND_CLUSTER)" diff --git a/charts/gitops-reverser/README.md b/charts/gitops-reverser/README.md index b8e3004f..d3cd37a9 100644 --- a/charts/gitops-reverser/README.md +++ b/charts/gitops-reverser/README.md @@ -178,7 +178,7 @@ webhook: | `image.repository` | Container image repository | `ghcr.io/configbutler/gitops-reverser` | | `webhook.validating.failurePolicy` | Webhook failure policy (Ignore/Fail) | `Ignore` | | `servers.admission.tls.enabled` | Serve admission webhook with TLS (disable only for local/testing) | `true` | -| `servers.audit.enabled` | Enable dedicated audit ingress listener | `true` | +| `servers.admission.tls.secretName` | Secret name for admission TLS cert/key | `-admission-server-cert` | | `servers.audit.port` | Audit container port | `9444` | | `servers.audit.tls.enabled` | Serve audit ingress with TLS | `true` | | `servers.audit.maxRequestBodyBytes` | Max accepted audit request size | `10485760` | @@ -187,7 +187,9 @@ webhook: | `servers.audit.timeouts.idle` | Audit-server idle timeout | `60s` | | `servers.audit.tls.secretName` | Secret name for audit TLS cert/key | `-audit-server-cert` | | `servers.metrics.bindAddress` | Metrics listener bind address | `:8080` | -| `servers.metrics.tls.enabled` | Serve metrics with TLS | `true` | +| `servers.metrics.tls.enabled` | Serve metrics with TLS | `false` | +| `servers.metrics.tls.certPath` | Metrics TLS certificate mount path | `/tmp/k8s-metrics-server/metrics-server-certs` | +| `servers.metrics.tls.secretName` | Secret name for metrics TLS cert/key | `-metrics-server-cert` | | `service.clusterIP` | Optional fixed ClusterIP for single controller Service | `""` | | `service.ports.admission` | Service port for admission webhook | `9443` | | `service.ports.audit` | Service port for audit ingress | `9444` | @@ -260,7 +262,9 @@ kubectl logs -n gitops-reverser-system -l app.kubernetes.io/name=gitops-reverser ```bash kubectl port-forward -n gitops-reverser-system svc/gitops-reverser 8080:8080 -curl -k https://localhost:8080/metrics +curl http://localhost:8080/metrics +# If metrics TLS is enabled: +# curl -k https://localhost:8080/metrics ``` ## Upgrading diff --git a/charts/gitops-reverser/templates/certificates.yaml b/charts/gitops-reverser/templates/certificates.yaml index ecaece57..8e6ad918 100644 --- a/charts/gitops-reverser/templates/certificates.yaml +++ b/charts/gitops-reverser/templates/certificates.yaml @@ -26,14 +26,14 @@ spec: issuerRef: kind: {{ .Values.certificates.certManager.issuer.kind }} name: {{ .Values.certificates.certManager.issuer.name }} - secretName: {{ include "gitops-reverser.fullname" . }}-admission-server-cert + secretName: {{ .Values.servers.admission.tls.secretName | default (printf "%s-admission-server-cert" (include "gitops-reverser.fullname" .)) }} usages: - digital signature - key encipherment - server auth privateKey: rotationPolicy: Always -{{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} +{{- if .Values.servers.audit.tls.enabled }} --- apiVersion: cert-manager.io/v1 kind: Certificate @@ -61,4 +61,28 @@ spec: privateKey: rotationPolicy: Always {{- end }} +{{- if .Values.servers.metrics.tls.enabled }} +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: {{ include "gitops-reverser.fullname" . }}-metrics-server-cert + namespace: {{ .Release.Namespace }} + labels: + {{- include "gitops-reverser.labels" . | nindent 4 }} +spec: + dnsNames: + - {{ include "gitops-reverser.fullname" . }}.{{ .Release.Namespace }}.svc + - {{ include "gitops-reverser.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local + issuerRef: + kind: {{ .Values.certificates.certManager.issuer.kind }} + name: {{ .Values.certificates.certManager.issuer.name }} + secretName: {{ .Values.servers.metrics.tls.secretName | default (printf "%s-metrics-server-cert" (include "gitops-reverser.fullname" .)) }} + usages: + - digital signature + - key encipherment + - server auth + privateKey: + rotationPolicy: Always +{{- end }} {{- end }} diff --git a/charts/gitops-reverser/templates/deployment.yaml b/charts/gitops-reverser/templates/deployment.yaml index eddd94c7..49e8a379 100644 --- a/charts/gitops-reverser/templates/deployment.yaml +++ b/charts/gitops-reverser/templates/deployment.yaml @@ -11,6 +11,8 @@ metadata: control-plane: controller-manager spec: replicas: {{ .Values.replicaCount }} + strategy: + {{- toYaml .Values.deploymentStrategy | nindent 4 }} selector: matchLabels: {{- include "gitops-reverser.selectorLabels" . | nindent 6 }} @@ -43,30 +45,31 @@ spec: - --metrics-bind-address={{ .Values.servers.metrics.bindAddress }} {{- if not .Values.servers.metrics.tls.enabled }} - --metrics-insecure - {{- end }} + {{- else }} - --metrics-cert-path={{ .Values.servers.metrics.tls.certPath }} - --metrics-cert-name={{ .Values.servers.metrics.tls.certName }} - --metrics-cert-key={{ .Values.servers.metrics.tls.certKey }} - {{- if not .Values.servers.admission.tls.enabled }} - - --webhook-insecure {{- end }} + {{- if .Values.servers.admission.tls.enabled }} - --webhook-cert-path={{ .Values.servers.admission.tls.certPath }} - --webhook-cert-name={{ .Values.servers.admission.tls.certName }} - --webhook-cert-key={{ .Values.servers.admission.tls.certKey }} - {{- if and .Values.servers.audit.enabled (not .Values.servers.audit.tls.enabled) }} + {{- else }} + - --webhook-insecure + {{- end }} + {{- if .Values.servers.audit.tls.enabled }} + - --audit-cert-path={{ .Values.servers.audit.tls.certPath }} + - --audit-cert-name={{ .Values.servers.audit.tls.certName }} + - --audit-cert-key={{ .Values.servers.audit.tls.certKey }} + {{- else }} - --audit-insecure {{- end }} - --audit-listen-address={{ .Values.servers.audit.listenAddress }} - --audit-port={{ .Values.servers.audit.port }} - - --audit-max-request-body-bytes={{ .Values.servers.audit.maxRequestBodyBytes }} + - --audit-max-request-body-bytes={{ int64 .Values.servers.audit.maxRequestBodyBytes }} - --audit-read-timeout={{ .Values.servers.audit.timeouts.read }} - --audit-write-timeout={{ .Values.servers.audit.timeouts.write }} - --audit-idle-timeout={{ .Values.servers.audit.timeouts.idle }} - {{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} - - --audit-cert-path={{ .Values.servers.audit.tls.certPath }} - - --audit-cert-name={{ .Values.servers.audit.tls.certName }} - - --audit-cert-key={{ .Values.servers.audit.tls.certKey }} - {{- end }} {{- if .Values.logging.level }} - --zap-log-level={{ .Values.logging.level }} {{- end }} @@ -86,11 +89,9 @@ spec: - name: admission containerPort: {{ .Values.servers.admission.port }} protocol: TCP - {{- if .Values.servers.audit.enabled }} - name: audit containerPort: {{ .Values.servers.audit.port }} protocol: TCP - {{- end }} - name: metrics containerPort: {{ .Values.servers.metrics.port }} protocol: TCP @@ -132,7 +133,12 @@ spec: mountPath: {{ .Values.servers.admission.tls.certPath }} readOnly: true {{- end }} - {{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} + {{- if .Values.servers.metrics.tls.enabled }} + - name: metrics-cert + mountPath: {{ .Values.servers.metrics.tls.certPath }} + readOnly: true + {{- end }} + {{- if .Values.servers.audit.tls.enabled }} - name: audit-cert mountPath: {{ .Values.servers.audit.tls.certPath }} readOnly: true @@ -150,10 +156,16 @@ spec: {{- if .Values.servers.admission.tls.enabled }} - name: admission-cert secret: - secretName: {{ include "gitops-reverser.fullname" . }}-admission-server-cert + secretName: {{ .Values.servers.admission.tls.secretName | default (printf "%s-admission-server-cert" (include "gitops-reverser.fullname" .)) }} + defaultMode: 420 + {{- end }} + {{- if .Values.servers.metrics.tls.enabled }} + - name: metrics-cert + secret: + secretName: {{ .Values.servers.metrics.tls.secretName | default (printf "%s-metrics-server-cert" (include "gitops-reverser.fullname" .)) }} defaultMode: 420 {{- end }} - {{- if and .Values.servers.audit.enabled .Values.servers.audit.tls.enabled }} + {{- if .Values.servers.audit.tls.enabled }} - name: audit-cert secret: secretName: {{ .Values.servers.audit.tls.secretName | default (printf "%s-audit-server-cert" (include "gitops-reverser.fullname" .)) }} diff --git a/charts/gitops-reverser/templates/services.yaml b/charts/gitops-reverser/templates/services.yaml index acb468f1..063c787d 100644 --- a/charts/gitops-reverser/templates/services.yaml +++ b/charts/gitops-reverser/templates/services.yaml @@ -7,6 +7,7 @@ metadata: {{- include "gitops-reverser.labels" . | nindent 4 }} app.kubernetes.io/component: controller prometheus.io/scrape: "true" + prometheus.io/scheme: {{ ternary "https" "http" .Values.servers.metrics.tls.enabled | quote }} prometheus.io/port: "{{ .Values.service.ports.metrics }}" spec: type: ClusterIP @@ -18,12 +19,10 @@ spec: port: {{ .Values.service.ports.admission }} targetPort: {{ .Values.servers.admission.port }} protocol: TCP - {{- if .Values.servers.audit.enabled }} - name: audit port: {{ .Values.service.ports.audit }} targetPort: {{ .Values.servers.audit.port }} protocol: TCP - {{- end }} - name: metrics port: {{ .Values.service.ports.metrics }} targetPort: {{ .Values.servers.metrics.port }} diff --git a/charts/gitops-reverser/values.yaml b/charts/gitops-reverser/values.yaml index 1056d294..0a9f2fb9 100644 --- a/charts/gitops-reverser/values.yaml +++ b/charts/gitops-reverser/values.yaml @@ -4,6 +4,11 @@ # High Availability configuration - runs 1 replicas by default (HA support is not good enough yet) replicaCount: 1 +deploymentStrategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 0 + maxUnavailable: 1 image: repository: ghcr.io/configbutler/gitops-reverser @@ -56,7 +61,6 @@ controllerManager: # HTTPS servers servers: admission: - enabled: true port: 9443 tls: # Controls webhook TLS wiring in the controller process. @@ -65,9 +69,9 @@ servers: certPath: "/tmp/k8s-admission-server/admission-server-certs" certName: "tls.crt" certKey: "tls.key" + secretName: "" audit: - enabled: true listenAddress: 0.0.0.0 port: 9444 tls: @@ -89,9 +93,10 @@ servers: tls: # Serve metrics over HTTPS when true, HTTP when false. enabled: false - certPath: "" + certPath: "/tmp/k8s-metrics-server/metrics-server-certs" certName: "tls.crt" certKey: "tls.key" + secretName: "" # Webhook behavior webhook: @@ -183,7 +188,7 @@ monitoring: # Must match the effective metrics transport: # - https when servers.metrics.tls.enabled=true # - http when servers.metrics.tls.enabled=false - scheme: https + scheme: http # Service exposure service: diff --git a/config/kustomization.yaml b/config/kustomization.yaml index 38a204a4..570f21c2 100644 --- a/config/kustomization.yaml +++ b/config/kustomization.yaml @@ -10,5 +10,5 @@ resources: - webhook.yaml images: - name: gitops-reverser - newName: example.com/gitops-reverser - newTag: v0.0.1 + newName: gitops-reverser + newTag: e2e-local diff --git a/docs/ci/E2E_IMAGE_ARTIFACT_REUSE_DESIGN.md b/docs/ci/E2E_IMAGE_ARTIFACT_REUSE_DESIGN.md new file mode 100644 index 00000000..d4c8857d --- /dev/null +++ b/docs/ci/E2E_IMAGE_ARTIFACT_REUSE_DESIGN.md @@ -0,0 +1,154 @@ +# E2E Image and Artifact Reuse Design + +## Status +- State: Implemented +- Scope: CI e2e, CI install smoke (helm + manifest), devcontainer/local e2e, IDE direct e2e runs + +## Problem Statement +We need one behavior model that satisfies these constraints: +- CI must reuse artifacts built earlier in the pipeline (image, packaged Helm chart, generated `dist/install.yaml`). +- Local/devcontainer runs should be easy (`make test-e2e`) and should auto-build a local image when no prebuilt image is provided. +- IDE/debugger runs (`go test ./test/e2e/...`) should remain usable without manual pre-steps. +- Image selection logic should be centralized and avoid duplicated implementation. + +## Goals +- Use a single decision input: `PROJECT_IMAGE`. +- If `PROJECT_IMAGE` is set: reuse it and do not rebuild. +- If `PROJECT_IMAGE` is not set: build/load local image once per run path. +- Keep orchestration primarily in `Makefile`. +- Keep Go `BeforeSuite` as IDE fallback only. +- Keep cluster behavior explicit: + - `test-e2e` reuses existing cluster state (fast path). + - install smoke local fallback path performs clean install validation. + +## Non-Goals +- No digest-aware image override logic for Helm values beyond repository/tag split. +- No new CI jobs or artifact formats. +- No changes to release publishing. + +## Decision Model +`PROJECT_IMAGE` is the source of truth: +- `PROJECT_IMAGE` present: + - Treat as prebuilt image. + - Skip local image build/load steps. + - Skip cluster cleanup in install smoke. + - Inject into test/install flows. +- `PROJECT_IMAGE` absent: + - Use local fallback image `$(E2E_LOCAL_IMAGE)` (`gitops-reverser:e2e-local` by default). + - Build image locally and load it into Kind. + - For install smoke, clean cluster first to validate clean install behavior. + - Use that image for test/install flows. + +## Execution Flows + +### 1) CI: e2e test suite +- Workflow passes `PROJECT_IMAGE` from `docker-build` output. +- `make test-e2e` sees `PROJECT_IMAGE` and skips rebuild/load. +- Go tests run with that exact image. +- Cluster is reused (no cleanup in this target). + +Outcome: +- No duplicate image builds in CI. + +### 2) CI: install smoke (`helm` and `manifest`) +- Workflow reuses release bundle artifact (`gitops-reverser.tgz`, `dist/install.yaml`). +- Workflow passes the same prebuilt `PROJECT_IMAGE`. +- `make test-e2e-install-helm` and `make test-e2e-install-manifest` skip cluster cleanup and local image rebuild/load. +- Helm mode injects repository/tag via `--set image.repository` and `--set image.tag`. +- Manifest mode applies `dist/install.yaml`, then overrides deployment image via `kubectl set image`. +- `test-e2e-install-manifest` does not regenerate `dist/install.yaml` when `PROJECT_IMAGE` is set. + +Outcome: +- Reuse of both chart/manifest artifacts and prebuilt image in CI. + +### 3) Devcontainer/local: full e2e via Make +- Run `make test-e2e` with no `PROJECT_IMAGE`. +- Make reuses existing cluster. +- Make auto-builds and Kind-loads `$(E2E_LOCAL_IMAGE)`, then runs tests with it. + +Outcome: +- Single command, no manual image prep. + +### 4) Devcontainer/local: install smoke via Make +- Run `make test-e2e-install-helm` or `make test-e2e-install-manifest` with no `PROJECT_IMAGE`. +- Make cleans cluster first (clean-install validation), then sets up e2e infra. +- Make auto-builds and Kind-loads `$(E2E_LOCAL_IMAGE)`, then runs smoke install using that image. +- For `test-e2e-install-manifest`, `build-installer` is run first to regenerate `dist/install.yaml`. + +Outcome: +- Automatic local behavior with explicit clean-install validation for smoke tests. + +### 5) IDE/debugger direct Go run +- Run `go test ./test/e2e/...` directly (no Make entrypoint). +- `BeforeSuite` checks `PROJECT_IMAGE`. +- If missing, it calls Make targets to prepare cluster + local image. + +Outcome: +- IDE path works without requiring developers to remember pre-steps. + +## Implementation Mapping + +### Makefile +- `E2E_LOCAL_IMAGE`: single local fallback image variable. +- `e2e-build-load-image`: local image build + Kind load. +- `test-e2e`: reuses cluster; branches image behavior based on `PROJECT_IMAGE`. +- `test-e2e-install`: shared install-smoke entry with `PROJECT_IMAGE` branching: + - prebuilt image path: skip cleanup. + - local fallback path: cleanup cluster, setup infra, build/load local image. +- `test-e2e-install-helm`: wrapper to `test-e2e-install`. +- `test-e2e-install-manifest`: + - local path: run `build-installer` first. + - prebuilt path: use existing manifest artifact. + +### Go (`test/e2e/e2e_suite_test.go`) +- `BeforeSuite`: + - if `PROJECT_IMAGE` is set: no prep + - else: call Make for cluster/image prep (IDE fallback) + +### Kind cluster bootstrap (`test/e2e/kind/start-cluster.sh`) +- Reuses existing Kind cluster if present (no delete/recreate in script). +- Creates cluster only when missing. +- Still exports/re-writes kubeconfig endpoint for devcontainer networking. + +### Smoke script (`test/e2e/scripts/install-smoke.sh`) +- Helm mode: + - parse `PROJECT_IMAGE` into repo/tag and override chart values. +- Manifest mode: + - apply `dist/install.yaml` + - if `PROJECT_IMAGE` set, patch deployment image with `kubectl set image`. +- Readiness/diagnostics selector: + - derive pod selector dynamically from `deployment/gitops-reverser` `.spec.selector.matchLabels`. + - avoid hardcoded label assumptions across helm/manifest paths. + +## Why This Split +- Makefile remains the main orchestration layer. +- Go keeps a minimal safety-net role for IDE/direct execution. +- CI avoids redundant work by honoring prebuilt artifacts and prebuilt image. + +## Tradeoffs +- We keep a small amount of orchestration in two places (Make + Go fallback), but avoid duplicated image build logic. +- Manifest image override happens post-apply (`kubectl set image`) rather than regenerating `dist/install.yaml` per image. + +## Command Matrix +- CI e2e: `PROJECT_IMAGE= make test-e2e` +- CI smoke helm: `PROJECT_IMAGE= make test-e2e-install-helm` +- CI smoke manifest: `PROJECT_IMAGE= make test-e2e-install-manifest` +- Local full e2e: `make test-e2e` +- Local smoke helm: `make test-e2e-install-helm` +- Local smoke manifest: `make test-e2e-install-manifest` +- IDE direct: `go test ./test/e2e/...` + +## Failure Modes and Diagnostics +- Wrong image in pods: + - Check deployment image: `kubectl -n gitops-reverser get deploy gitops-reverser -o yaml | rg image:` +- Image pull failures in Kind: + - Ensure local build/load ran or `PROJECT_IMAGE` points to reachable registry. +- Manifest smoke using stale image: + - Local path: verify `build-installer` ran before smoke target. + - CI/prebuilt path: verify artifact source and `kubectl set image` override message. +- Pod readiness says "no matching resources": + - Verify selector in smoke logs (`Pod selector: ...`) and deployment selector labels. + +## Future Improvements +- Add a small shared Make macro/helper to reduce repeated `PROJECT_IMAGE` branching across e2e entrypoints. +- Optionally add an explicit `E2E_AUTO_PREPARE_IMAGE=false` switch for strict mode in advanced local workflows. diff --git a/test/e2e/e2e_suite_test.go b/test/e2e/e2e_suite_test.go index 4112476c..bdd973ad 100644 --- a/test/e2e/e2e_suite_test.go +++ b/test/e2e/e2e_suite_test.go @@ -42,7 +42,7 @@ func getProjectImage() string { if img := os.Getenv("PROJECT_IMAGE"); img != "" { return img } - return "example.com/gitops-reverser:v0.0.1" + return "gitops-reverser:e2e-local" } // TestE2E runs the end-to-end (e2e) test suite for the project. These tests execute in an isolated, @@ -63,15 +63,11 @@ var _ = BeforeSuite(func() { return } - // Local testing: ALWAYS rebuild to ensure latest code changes are included - By("building the manager(Operator) image for local testing (forcing rebuild)") - cmd := exec.Command("make", "docker-build", fmt.Sprintf("IMG=%s", projectImage)) + // IDE/direct go test path: ensure cluster exists and local image is built+loaded via Makefile. + By("PROJECT_IMAGE is not set; preparing cluster/image through Makefile for local run") + cmd := exec.Command("make", "setup-cluster", "e2e-build-load-image", fmt.Sprintf("PROJECT_IMAGE=%s", projectImage)) _, err := utils.Run(cmd) - ExpectWithOffset(1, err).NotTo(HaveOccurred(), "Failed to build the manager(Operator) image") - - By("loading the manager(Operator) image on Kind (forcing reload)") - err = utils.LoadImageToKindClusterWithName(projectImage) - ExpectWithOffset(1, err).NotTo(HaveOccurred(), "Failed to load the manager(Operator) image into Kind") + ExpectWithOffset(1, err).NotTo(HaveOccurred(), "Failed to build/load manager image via Makefile") }) var _ = AfterSuite(func() { diff --git a/test/e2e/kind/start-cluster.sh b/test/e2e/kind/start-cluster.sh index a46c0b31..42963085 100755 --- a/test/e2e/kind/start-cluster.sh +++ b/test/e2e/kind/start-cluster.sh @@ -24,17 +24,14 @@ echo "✅ Generated configuration:" cat "$CONFIG_FILE" echo "" -# Recreate cluster on every run so kube-apiserver always picks up current -# audit webhook policy/config files from the mounted directory. if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then - echo "♻️ Recreating existing Kind cluster '$CLUSTER_NAME' to refresh audit webhook config..." - kind delete cluster --name "$CLUSTER_NAME" + echo "♻️ Reusing existing Kind cluster '$CLUSTER_NAME' (no delete/recreate)" +else + echo "🚀 Creating Kind cluster '$CLUSTER_NAME' with audit webhook support..." + kind create cluster --name "$CLUSTER_NAME" --config "$CONFIG_FILE" --wait 5m + echo "✅ Kind cluster created successfully" fi -echo "🚀 Creating Kind cluster '$CLUSTER_NAME' with audit webhook support..." -kind create cluster --name "$CLUSTER_NAME" --config "$CONFIG_FILE" --wait 5m -echo "✅ Kind cluster created successfully" - echo "📋 Configuring kubeconfig for cluster '$CLUSTER_NAME'..." kind export kubeconfig --name "$CLUSTER_NAME" diff --git a/test/e2e/scripts/install-smoke.sh b/test/e2e/scripts/install-smoke.sh index 417007b9..c6a8c87c 100755 --- a/test/e2e/scripts/install-smoke.sh +++ b/test/e2e/scripts/install-smoke.sh @@ -5,29 +5,71 @@ MODE="${1:-}" NAMESPACE="gitops-reverser" HELM_CHART_SOURCE="${HELM_CHART_SOURCE:-charts/gitops-reverser}" WAIT_TIMEOUT="${WAIT_TIMEOUT:-60s}" +PROJECT_IMAGE="${PROJECT_IMAGE:-}" if [[ -z "${MODE}" ]]; then echo "usage: $0 " exit 1 fi +get_controller_pod_selector() { + local selector + selector="$(kubectl -n "${NAMESPACE}" get deployment gitops-reverser \ + -o jsonpath='{range $k,$v := .spec.selector.matchLabels}{$k}={$v},{end}' 2>/dev/null || true)" + selector="${selector%,}" + + if [[ -z "${selector}" ]]; then + # Fallback selector used by chart/manifests if deployment query is not available yet. + selector="app.kubernetes.io/name=gitops-reverser" + fi + + printf '%s' "${selector}" +} + install_helm() { + local helm_image_args=() + + if [[ -n "${PROJECT_IMAGE}" ]]; then + # Helm chart image is repository + tag. For smoke tests, parse PROJECT_IMAGE and override both. + local image_no_digest image_repo image_tag + image_no_digest="${PROJECT_IMAGE%%@*}" + if [[ "${image_no_digest##*/}" == *:* ]]; then + image_repo="${image_no_digest%:*}" + image_tag="${image_no_digest##*:}" + else + image_repo="${image_no_digest}" + image_tag="latest" + fi + helm_image_args+=(--set "image.repository=${image_repo}" --set "image.tag=${image_tag}") + echo "Overriding chart image from PROJECT_IMAGE=${PROJECT_IMAGE}" + fi + echo "Installing from Helm chart (mode=helm, source=${HELM_CHART_SOURCE})" helm upgrade --install "name-is-cool-but-not-relevant" "${HELM_CHART_SOURCE}" \ --namespace "${NAMESPACE}" \ --create-namespace \ - --set fullnameOverride=gitops-reverser + --set fullnameOverride=gitops-reverser \ + "${helm_image_args[@]}" } install_manifest() { echo "Installing from generated dist/install.yaml (mode=manifest)" kubectl apply -f dist/install.yaml + + if [[ -n "${PROJECT_IMAGE}" ]]; then + echo "Overriding manifest deployment image from PROJECT_IMAGE=${PROJECT_IMAGE}" + kubectl -n "${NAMESPACE}" set image deployment/gitops-reverser manager="${PROJECT_IMAGE}" + fi } print_debug_info() { + local pod_selector + pod_selector="$(get_controller_pod_selector)" + echo echo "Install smoke test diagnostics (${MODE})" echo "Namespace: ${NAMESPACE}" + echo "Pod selector: ${pod_selector}" echo "Deployment status:" kubectl -n "${NAMESPACE}" get deployment gitops-reverser -o wide || true echo @@ -38,10 +80,10 @@ print_debug_info() { kubectl -n "${NAMESPACE}" get pods -o wide || true echo echo "Controller-manager pod describe:" - kubectl -n "${NAMESPACE}" describe pod -l control-plane=controller-manager || true + kubectl -n "${NAMESPACE}" describe pod -l "${pod_selector}" || true echo echo "Controller-manager logs (last 200 lines):" - kubectl -n "${NAMESPACE}" logs -l control-plane=controller-manager --tail=200 --all-containers=true || true + kubectl -n "${NAMESPACE}" logs -l "${pod_selector}" --tail=200 --all-containers=true || true echo echo "Recent namespace events:" kubectl -n "${NAMESPACE}" get events --sort-by=.metadata.creationTimestamp | tail -n 50 || true @@ -59,6 +101,9 @@ run_or_debug() { } verify_installation() { + local pod_selector + pod_selector="$(get_controller_pod_selector)" + run_or_debug \ "Waiting for deployment rollout (timeout=${WAIT_TIMEOUT})" \ kubectl -n "${NAMESPACE}" rollout status deployment/gitops-reverser --timeout="${WAIT_TIMEOUT}" @@ -68,8 +113,8 @@ verify_installation() { kubectl -n "${NAMESPACE}" wait --for=condition=available deployment/gitops-reverser --timeout="${WAIT_TIMEOUT}" run_or_debug \ - "Checking pod readiness (timeout=${WAIT_TIMEOUT})" \ - kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l control-plane=controller-manager --timeout="${WAIT_TIMEOUT}" + "Checking pod readiness (selector=${pod_selector}, timeout=${WAIT_TIMEOUT})" \ + kubectl -n "${NAMESPACE}" wait --for=condition=ready pod -l "${pod_selector}" --timeout="${WAIT_TIMEOUT}" echo "Checking CRDs" kubectl get crd \ From 7c24aef849ddd0b487241495fa2a01c24de31252 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Fri, 13 Feb 2026 11:39:31 +0000 Subject: [PATCH 31/32] chore: Updating docs and CI labels --- .github/workflows/ci.yml | 3 +- docs/ci/PROMETHEUS_E2E_HELM_EVALUATION.md | 75 +++++++++++++++++++++++ 2 files changed, 76 insertions(+), 2 deletions(-) create mode 100644 docs/ci/PROMETHEUS_E2E_HELM_EVALUATION.md diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 453c733f..d8c1d0f4 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -127,7 +127,7 @@ jobs: " lint-helm: - name: Lint Helm Chart + name: Lint and build Helm Chart (and generate single-file installer) runs-on: ubuntu-latest needs: build-ci-container container: @@ -174,7 +174,6 @@ jobs: dist/install.yaml gitops-reverser.tgz if-no-files-found: error - retention-days: 1 lint: name: Lint Go Code diff --git a/docs/ci/PROMETHEUS_E2E_HELM_EVALUATION.md b/docs/ci/PROMETHEUS_E2E_HELM_EVALUATION.md new file mode 100644 index 00000000..4d607b35 --- /dev/null +++ b/docs/ci/PROMETHEUS_E2E_HELM_EVALUATION.md @@ -0,0 +1,75 @@ +# Prometheus E2E Helm/ServiceMonitor Evaluation + +## Status +- Date: 2026-02-13 +- Decision: **Do not migrate now** +- Scope: `test/e2e` Prometheus setup only + +## Context +Current e2e Prometheus setup is manifest/script based: +- setup entrypoint: `Makefile` target `setup-prometheus-e2e` +- deploy script: `test/e2e/scripts/setup-prometheus.sh` +- manifests: `test/e2e/prometheus/deployment.yaml`, `test/e2e/prometheus/rbac.yaml` + +Current tests also assume specific Prometheus naming/labels and scrape job naming: +- pod label wait: `app=prometheus` +- service port-forward target: `svc/prometheus:19090` +- PromQL assertions: `job='gitops-reverser-metrics'` + +## Evaluated Plan + +### Option A: Move to standalone Prometheus Helm chart +Use `prometheus-community/prometheus` with pinned chart version, e2e values file, and Helm lifecycle (`upgrade`, revision history, rollback). + +Required changes: +- Replace manifest apply/delete flow in `Makefile` Prometheus targets with Helm install/uninstall. +- Replace `setup-prometheus.sh` behavior with Helm-driven setup. +- Add e2e values file and pin chart version. +- Update port-forward/pod-ready checks that currently assume manual names/labels. + +### Option B: Add ServiceMonitor-based scraping +Use an operator-based chart (typically `kube-prometheus-stack`) because ServiceMonitor discovery is provided by Prometheus Operator, not by standalone Prometheus chart. + +Required changes (in addition to Option A-level migration): +- Install operator CRDs/controllers via Helm in e2e. +- Ensure ServiceMonitor exists for both install paths: + - Helm install smoke path (chart templated ServiceMonitor already exists behind values flag). + - Kustomize e2e path (`make deploy`) requires separate ServiceMonitor manifest. +- Update PromQL tests to avoid hardcoded `job='gitops-reverser-metrics'` assumptions. + +## Pros and Cons + +### Pros of migrating to Helm +- Native `helm upgrade` workflow and revision/rollback history. +- Better consistency with existing e2e dependency setup style (similar to Gitea). +- Centralized values-based configuration. + +### Pros of adding ServiceMonitor path +- Cleaner scrape target management than static scrape config. +- Better alignment with common Kubernetes monitoring practices. +- Reuses chart-level `monitoring.serviceMonitor` support where applicable. + +### Cons / Risks +- Standalone Prometheus chart does **not** provide ServiceMonitor consumption. +- ServiceMonitor requires operator stack, increasing e2e complexity and startup time. +- Existing e2e scripts/tests are coupled to current names/labels/job-name; migration requires non-trivial refactors. +- Adds another chart/version dependency surface in CI and local flows. + +## Decision +We decided to **not do this now**. + +Rationale: +- Current setup is stable and intentionally minimal for e2e signal validation. +- Migration introduces meaningful complexity (especially for ServiceMonitor support). +- The value is primarily operational ergonomics rather than test coverage expansion. + +## Revisit Criteria +Re-open this migration when at least one of the following becomes a priority: +- Need Helm revision/rollback behavior for routine e2e debugging. +- Need ServiceMonitor-driven discovery parity with production environments. +- Need more dynamic scrape target management across multiple test topologies. + +## Next Step (Deferred) +If revisited, prefer a two-phase approach: +1. Phase 1: standalone Prometheus Helm chart migration (no ServiceMonitor). +2. Phase 2: operator-based monitoring stack + ServiceMonitor migration and test query normalization. From 0f7d48cab89d0be11a3729f690f36b1f9c77a9f8 Mon Sep 17 00:00:00 2001 From: Simon Koudijs Date: Fri, 13 Feb 2026 11:46:23 +0000 Subject: [PATCH 32/32] docs: Cleaning up --- ...udit-ingress-first-steps-execution-plan.md | 403 ------------------ .../audit-ingress-separate-server-options.md | 314 -------------- 2 files changed, 717 deletions(-) delete mode 100644 docs/design/audit-ingress-first-steps-execution-plan.md delete mode 100644 docs/design/audit-ingress-separate-server-options.md diff --git a/docs/design/audit-ingress-first-steps-execution-plan.md b/docs/design/audit-ingress-first-steps-execution-plan.md deleted file mode 100644 index fd5e8282..00000000 --- a/docs/design/audit-ingress-first-steps-execution-plan.md +++ /dev/null @@ -1,403 +0,0 @@ -# Audit ingress first steps execution plan - -## Status - -Execution-focused handoff plan for implementation agent. - -Scope is fixed to: - -- single deployment -- extra in-binary audit-server listener -- path-based cluster recognition -- Kind remains the e2e cluster target - -This document intentionally excludes alternative architecture discussion. - ---- - -## 1. Required outcome - -Implement an initial production-ready split where: - -- admission webhook keeps running on current admission-server path [`/process-validating-webhook`](cmd/main.go:191) -- audit ingress moves to a separate server in the same binary on a different port -- audit ingress is exposed via a dedicated Service -- cluster identity is derived from request path segment -- ingress TLS requirements for audit are independently configurable - ---- - -## 2. Current code and chart risks that must be addressed - -### 2.1 Coupling risks - -- Both admission and audit handlers are registered on one admission-server listener in [`cmd/main.go`](cmd/main.go:101) and [`cmd/main.go`](cmd/main.go:204) -- One service endpoint currently fronts this surface in [`charts/gitops-reverser/templates/services.yaml`](charts/gitops-reverser/templates/services.yaml:3) -- One cert lifecycle currently serves this surface in [`charts/gitops-reverser/templates/certificates.yaml`](charts/gitops-reverser/templates/certificates.yaml:16) - -### 2.2 TLS posture risks - -- e2e audit kubeconfig currently uses insecure skip verify in [`test/e2e/kind/audit/webhook-config.yaml`](test/e2e/kind/audit/webhook-config.yaml:14) -- cluster docs use insecure skip verify in [`docs/audit-setup/cluster/audit/webhook-config.yaml`](docs/audit-setup/cluster/audit/webhook-config.yaml:11) - -### 2.3 Audit ingress hardening gaps in code - -In [`internal/webhook/audit_handler.go`](internal/webhook/audit_handler.go:86): - -- no request body size limit before decode -- no explicit server-level timeouts -- no concurrency guard for burst traffic -- no path-based cluster ID parser - -### 2.4 E2E and docs drift already visible - -- Kind README references DNS endpoint while config uses fixed IP and path in [`test/e2e/kind/README.md`](test/e2e/kind/README.md:31) vs [`test/e2e/kind/audit/webhook-config.yaml`](test/e2e/kind/audit/webhook-config.yaml:12) -- Helm README contains defaults not fully aligned with values in [`charts/gitops-reverser/README.md`](charts/gitops-reverser/README.md:183) and [`charts/gitops-reverser/values.yaml`](charts/gitops-reverser/values.yaml:6) - ---- - -## 3. Implementation contract for first step - -### 3.1 Runtime topology - -Implement two servers in one process: - -- admission-server - - existing controller-runtime admission-server listener - - keeps current cert and service behavior -- audit-server - - dedicated `http.Server` listener on separate port - - independent TLS config inputs - - serves audit paths with cluster path segment - -### 3.2 Path-based cluster recognition contract - -Accepted path format: - -- `/audit-webhook/{clusterID}` - -Rules: - -- path prefix is fixed to `/audit-webhook` in this phase (not configurable) -- reject requests without `{clusterID}` -- accept any non-empty `{clusterID}` and handle newly seen cluster IDs -- emit structured logs with resolved `clusterID` -- add metric label for `cluster_id` - -### 3.3 TLS policy contract for first step - -For phase 1: - -- keep server TLS mandatory for audit ingress -- support strict CA verification by source cluster configuration -- do not require mTLS in this phase -- preserve option to add mTLS later without path changes - ---- - -## 4. Concrete code work items - -### 4.1 Add audit-server config model in main - -Target file: [`cmd/main.go`](cmd/main.go:253) - -Add new app config fields for audit-server, separate from admission-server fields: - -- audit listen address and port -- audit cert path, cert name, cert key -- audit max request body bytes -- audit read timeout -- audit write timeout -- audit idle timeout - -Add flags in [`parseFlags()`](cmd/main.go:270) for above. - -### 4.2 Implement dedicated audit-server bootstrap - -Target file: [`cmd/main.go`](cmd/main.go:77) - -Add functions to: - -- build audit `http.ServeMux` -- register handler pattern on fixed `/audit-webhook/` -- initialize TLS cert watcher for audit cert files -- construct dedicated `http.Server` with explicit timeouts -- add graceful shutdown using manager context - -Implementation note: - -- audit-server should be started via manager runnable so lifecycle follows manager start and stop. - -### 4.3 Extend audit handler with path identity and guardrails - -Target file: [`internal/webhook/audit_handler.go`](internal/webhook/audit_handler.go:50) - -Add config fields: - -- max request body bytes - -Add behavior: - -- parse and validate cluster ID from request path -- reject invalid path with `400` -- limit body read size before decode -- include cluster ID in all processing logs -- include cluster ID metric attribute for [`metrics.AuditEventsReceivedTotal`](internal/webhook/audit_handler.go:172) - -### 4.4 Keep admission webhook behavior untouched - -Do not change current validating webhook registration semantics in [`cmd/main.go`](cmd/main.go:190) and chart registration in [`charts/gitops-reverser/templates/admission-webhook.yaml`](charts/gitops-reverser/templates/admission-webhook.yaml:16). - ---- - -## 5. Helm chart work items - -### 5.1 Values schema additions - -Target file: [`charts/gitops-reverser/values.yaml`](charts/gitops-reverser/values.yaml:65) - -Add explicit `auditIngress` block with: - -- `enabled` -- `port` -- `tls.certPath` -- `tls.certName` -- `tls.certKey` -- `tls.secretName` -- `timeouts.read` -- `timeouts.write` -- `timeouts.idle` -- `maxRequestBodyBytes` -- optional fixed `clusterIP` for Kind e2e compatibility - -Keep existing webhook block for admission as-is in first phase. - -### 5.2 Deployment args and ports - -Target file: [`charts/gitops-reverser/templates/deployment.yaml`](charts/gitops-reverser/templates/deployment.yaml:41) - -Add container args for audit-server flags and add second named container port for audit ingress. - -Mount dedicated audit TLS secret path in addition to admission cert mount. - -### 5.3 Dedicated audit service template - -Target file: [`charts/gitops-reverser/templates/services.yaml`](charts/gitops-reverser/templates/services.yaml:3) - -Add new service resource: - -- name suffix `-audit` -- port 443 to target audit container port -- optional fixed clusterIP setting -- selector consistent with leader-only routing requirement - -Keep current service for admission webhook unchanged. - -### 5.4 Dedicated audit certificate - -Target file: [`charts/gitops-reverser/templates/certificates.yaml`](charts/gitops-reverser/templates/certificates.yaml:16) - -Add second certificate resource for audit service DNS names and audit secret. - -Keep existing serving cert for admission webhook. - -### 5.5 Chart docs updates - -Target file: [`charts/gitops-reverser/README.md`](charts/gitops-reverser/README.md:177) - -Update: - -- config table with new `auditIngress` settings -- architecture section to show two service surfaces -- examples for per-cluster path URLs in audit kubeconfig -- fix stale defaults and names where currently inconsistent with values - ---- - -## 6. Kustomize and default manifests work items - -### 6.1 Add audit service and optional fixed IP patch - -Relevant files: - -- [`config/webhook/service.yaml`](config/webhook/service.yaml:1) -- [`config/default/webhook_service_fixed_ip_patch.yaml`](config/default/webhook_service_fixed_ip_patch.yaml:1) -- [`config/default/kustomization.yaml`](config/default/kustomization.yaml:44) - -Actions: - -- add separate audit service manifest -- add separate fixed IP patch for audit service for Kind startup constraints -- keep admission service patch independent - -### 6.2 Add audit certificate resource - -Relevant files: - -- [`config/certmanager/certificate-webhook.yaml`](config/certmanager/certificate-webhook.yaml:1) -- [`config/default/kustomization.yaml`](config/default/kustomization.yaml:126) - -Actions: - -- add second cert for audit service DNS names -- add replacement wiring for audit service name and namespace into cert DNS entries -- keep admission CA injection for validating webhook intact - -### 6.3 Add manager patch entries for audit-server args and mounts - -Relevant file: [`config/default/manager_webhook_patch.yaml`](config/default/manager_webhook_patch.yaml:1) - -Actions: - -- add audit-specific args -- add audit TLS volume and mount -- add audit container port - ---- - -## 7. Test plan updates required - -### 7.1 Unit tests - -#### Audit handler tests - -Target file: [`internal/webhook/audit_handler_test.go`](internal/webhook/audit_handler_test.go:50) - -Add table-driven cases for: - -- valid path with cluster ID -- missing cluster ID path -- newly seen cluster ID is accepted -- body larger than configured max bytes -- non-POST path handling remains unchanged - -### 7.2 Main bootstrap tests - -Add new tests for config parsing and audit-server bootstrap behavior. - -Suggested new file: - -- `cmd/main_audit_server_test.go` - -Cover: - -- default flag values -- custom audit flag parsing -- invalid timeout parsing behavior if introduced -- audit-server runnable registration - -### 7.3 E2E changes on Kind - -#### Keep Kind as the only cluster target - -Relevant files: - -- [`Makefile`](Makefile:69) -- [`test/e2e/kind/cluster-template.yaml`](test/e2e/kind/cluster-template.yaml:1) -- [`test/e2e/kind/audit/webhook-config.yaml`](test/e2e/kind/audit/webhook-config.yaml:1) -- [`test/e2e/e2e_test.go`](test/e2e/e2e_test.go:367) -- [`test/e2e/helpers.go`](test/e2e/helpers.go:159) - -Required changes: - -1. Audit webhook URL path must include cluster ID - - update to `/audit-webhook/` in Kind webhook config - -2. Audit service endpoint target - - point webhook config to the new dedicated audit service fixed IP - -3. Certificate readiness checks - - extend helper to wait for new audit cert secret in addition to existing secrets - -4. E2E validation assertions - - keep current audit metric checks - - add validation that cluster IDs from path are accepted, including newly seen IDs - -5. Optional strict TLS uplift for e2e - - phase 1 may keep insecure skip verify for bootstrap simplicity - - add plan note and TODO test for certificate-authority based verification once secret extraction is automated - -### 7.4 Smoke checks for service split - -Add e2e checks to verify: - -- admission service and audit service both exist -- audit service resolves to leader endpoint only -- audit ingress works on dedicated port and path - ---- - -## 8. Observability and operational requirements - -### 8.1 Logging - -Every accepted audit request must log: - -- cluster ID -- remote address -- request path -- event count -- processing outcome - -### 8.2 Metrics - -Extend audit metric labels to include cluster dimension. - -Ensure cardinality protection: - -- sanitize and constrain cluster ID format/length before labeling - -### 8.3 Error handling - -Audit-server must return: - -- `400` for malformed path or body -- `405` for method mismatch -- `500` only for internal processing errors - ---- - -## 9. Backward compatibility behavior - -Phase 1 behavior for cluster path migration: - -- no fallback to bare `/audit-webhook` endpoint -- configuration and docs must explicitly require `/audit-webhook/{clusterID}` - -Reason: - -- prevents ambiguous identity -- avoids silent insecure defaults - ---- - -## 10. Acceptance criteria for coding agent - -Implementation is complete only when all are true: - -1. Separate in-binary audit-server is active on separate port with separate service exposure. -2. Audit endpoint requires path-based cluster ID on fixed `/audit-webhook/{clusterID}` and accepts newly seen cluster IDs. -3. Admission webhook behavior remains unchanged. -4. Helm and kustomize manifests include independent audit TLS and service resources. -5. Kind e2e setup is updated and passing with new audit path contract. -6. Tests cover path validation and certificate readiness adjustments. -7. Documentation for setup and e2e reflects new service and URL contract. -8. Validation pipeline passes in this sequence: - - `make fmt` - - `make generate` - - `make manifests` - - `make vet` - - `make lint` - - `make test` - - `make test-e2e` - ---- - -## 11. Handoff checklist - -- Update docs for cluster audit config in [`docs/audit-setup/cluster/audit/webhook-config.yaml`](docs/audit-setup/cluster/audit/webhook-config.yaml:1) -- Update Kind docs in [`test/e2e/kind/README.md`](test/e2e/kind/README.md:1) -- Update chart docs in [`charts/gitops-reverser/README.md`](charts/gitops-reverser/README.md:1) -- Keep architecture alternatives in [`docs/design/audit-ingress-separate-server-options.md`](docs/design/audit-ingress-separate-server-options.md:1) and keep this document implementation-only - -This plan is ready to hand to a coding agent for direct execution. diff --git a/docs/design/audit-ingress-separate-server-options.md b/docs/design/audit-ingress-separate-server-options.md deleted file mode 100644 index 36720055..00000000 --- a/docs/design/audit-ingress-separate-server-options.md +++ /dev/null @@ -1,314 +0,0 @@ -# Audit ingress separation and cluster differentiation options - -## Status - -Design proposal only, updated with webhook ingress best-practice alignment. - -## Context - -Today both endpoints are served by the same controller-runtime admission-server listener and the same Service: - -- Admission webhook endpoint [`/process-validating-webhook`](cmd/main.go:191) -- Audit endpoint [`/audit-webhook`](cmd/main.go:204) -- Single leader-only Service on port [`9443`](charts/gitops-reverser/templates/services.yaml:18) targeting one admission-server port [`9443`](charts/gitops-reverser/values.yaml:70) - -This coupling limits independent exposure and independent TLS policy for incoming audit traffic. - -## Objectives - -- Move audit ingress to a separate audit-server listener and separate port. -- Allow explicit configuration of incoming TLS requirements for audit traffic. -- Support audit streaming from external or secondary clusters. -- Provide cluster differentiation options with trade-offs. -- Align ingress and webhook controls with production best practices. - -## Non-goals - -- No implementation sequencing in this document. -- No API schema changes in this document. - -## Design principles - -- Isolate admission reliability from audit ingest throughput concerns. -- Make TLS posture explicit and configurable per ingress surface. -- Keep path-based cluster identity as the initial model, with hardening controls. -- Separate admission and audit operational knobs. -- Keep defaults safe while still operable in day-1 deployments. - ---- - -## Best-practice baseline to adopt - -From [`docs/design/best-practices-webhook-ingress.md`](docs/design/best-practices-webhook-ingress.md), these controls are most relevant: - -- Listener controls: `readTimeout`, `writeTimeout`, `idleTimeout`, `maxRequestBodyBytes` -- TLS controls: dedicated certs, CA trust, hot-reload support -- Registration controls for admission: `failurePolicy`, `timeoutSeconds`, selectors, tight rules -- Runtime controls: concurrency limit, metrics, request ID logging -- Audit-specific controls: queue, backpressure policy, dedup hinting, and separate endpoint or deployment - ---- - -## Separation options for audit-server - -### Option A: Same pod, second HTTP server, separate Service and port - -Run a second server process inside the manager binary for audit-server ingest. - -Pros: - -- Lowest operational complexity. -- Independent port and TLS policy from admission. -- Minimal workload topology change. - -Cons: - -- Pod-level resource contention is still possible. -- Shared rollout and failure domain remains. - -### Option B: Same pod, sidecar audit gateway - -Add sidecar proxy for TLS termination and edge controls; manager receives internal traffic. - -Pros: - -- Mature L7 controls, rate limiting, and request-size guards. -- Useful when ingress policy complexity increases. - -Cons: - -- More config and operational surface in the same pod. - -### Option C: Separate Deployment for audit receiver - -Run audit receiver separately from controller manager. - -Pros: - -- Strongest fault and scaling isolation. -- Cleanest model for multi-cluster ingest growth. -- Independent SLO tuning for admission and audit. - -Cons: - -- Highest release and operations complexity. - -### Recommended architectural target - -- Near-term: Option A. -- Long-term scalable target: Option C. - -This matches your request for simplicity now, while preserving a clean path to stronger isolation later. - ---- - -## Incoming TLS requirements for audit ingress - -### TLS policy modes - -1. `strict` - - Source cluster verifies server cert chain and hostname. - - Production baseline. - -2. `pinned-ca` - - Like strict, with explicit pinned CA and dedicated audit cert lifecycle. - -3. `insecure` - - Dev and isolated tests only. - -4. `mtls` - - Server verification plus client cert auth. - - Strongest identity, highest operational overhead. - -### Default profile for your current preference - -- Separate audit endpoint and separate port. -- Default `strict`. -- `mtls` optional hardening profile. -- Forbid `insecure` outside explicit non-prod environments. - -### Config surface recommendation - -Use separate top-level config blocks to avoid mixing concerns: - -- `admissionWebhooks` -- `auditIngress` - -Suggested `auditIngress` fields: - -- `enabled` -- `listenAddress` -- `port` -- `pathPrefix` -- `tls.mode` -- `tls.secretName` -- `tls.clientCASecretName` -- `timeouts.read` -- `timeouts.write` -- `timeouts.idle` -- `maxRequestBodyBytes` -- `concurrency.maxInFlight` -- `queue.enabled` -- `queue.size` -- `queue.durability` -- `backpressure.mode` -- `identity.mode` -- `identity.allowedClusters` -- `network.allowedCIDRs` - ---- - -## Cluster differentiation options - -### Option 1: Path-based identity - -Examples: - -- `/audit-webhook/cluster-a` -- `/audit-webhook/cluster-b` - -Pros: - -- Very simple and native to audit webhook URL setup. -- No client cert lifecycle required. - -Cons: - -- Path value is not strong identity on its own. - -Required controls: - -- Strict allowlist for accepted cluster IDs. -- Reject unknown path IDs. -- Source network restrictions and logging. - -### Option 2: Header-based identity through trusted proxy - -Pros: - -- Centralized edge identity mapping. - -Cons: - -- Depends on trusted proxy boundary. - -### Option 3: Host or SNI based identity - -Pros: - -- Useful in DNS-centric ingress designs. - -Cons: - -- More cert and DNS complexity. - -### Option 4: mTLS subject-based identity - -Pros: - -- Strong cryptographic source identity. - -Cons: - -- Highest cert issuance and rotation burden. - -### Recommended cluster identity path - -Given your stated preference: - -1. Start with path-based identity. -2. Enforce strict allowlist. -3. Enforce network restrictions. -4. Keep mTLS available as a security profile switch. - ---- - -## Delivery and reliability notes for audit ingestion - -Audit ingest differs from admission webhook behavior and should assume: - -- bursts -- duplicates -- out-of-order arrival -- potential data loss under severe backpressure depending on source settings - -Minimum design controls: - -- bounded queue -- explicit full-queue behavior -- optional batching downstream -- dedup hint support using audit event metadata when available - ---- - -## Decision matrix - -| Dimension | Path identity no mTLS | Path identity optional mTLS | Mandatory mTLS | -|---|---|---|---| -| Operational simplicity | Highest | Medium | Lowest | -| Security assurance | Medium | Medium to high | Highest | -| Cluster onboarding friction | Lowest | Medium | Highest | -| Cert lifecycle burden | Low | Medium | High | -| Fit for your current goal | Best | Good next step | Too heavy initially | - ---- - -## Helm chart assessment: current state - -### What is good today - -- Leader-only webhook routing is already implemented in [`charts/gitops-reverser/templates/services.yaml`](charts/gitops-reverser/templates/services.yaml). -- TLS certificate automation exists through cert-manager in [`charts/gitops-reverser/templates/certificates.yaml`](charts/gitops-reverser/templates/certificates.yaml). -- Webhook cert mounting and runtime flags are wired in [`charts/gitops-reverser/templates/deployment.yaml`](charts/gitops-reverser/templates/deployment.yaml). -- Admission webhook settings expose useful controls in [`charts/gitops-reverser/values.yaml`](charts/gitops-reverser/values.yaml). - -### What should be improved for your target architecture - -1. Split config surfaces - - Current config mixes admission and audit under [`webhook`](charts/gitops-reverser/values.yaml:66). - - Introduce explicit `admissionWebhooks` and `auditIngress` blocks. - -2. Separate service exposure - - Today one leader-only service handles webhook traffic. - - Add dedicated audit service and port; keep admission service independent. - -3. Separate certificate lifecycle - - Current certificate SANs and secret are tied to leader-only service. - - Add separate cert and secret for audit service DNS names. - -4. Add ingress runtime safety knobs - - Missing explicit timeout and max-body controls for audit ingress. - - Add `maxInFlight` and queue settings for burst handling. - -5. Add audit identity and access controls - - Add path-prefix and allowlist settings in chart values. - - Add CIDR allowlist controls and corresponding policy templates. - -6. Improve docs consistency - - Chart README currently states defaults that differ from actual values in at least one place. - - Align values table and examples to the current chart behavior. - -### Risk notes in current chart - -- A single webhook port and service remains a coupling point for admission and audit traffic. -- No first-class audit ingress queue or backpressure knobs are represented in chart values. -- Security posture for cross-cluster audit traffic is not explicit as a separate concern. - ---- - -## Reference architecture sketch - -```mermaid -graph TD - A[Source cluster A api server] --> P[/audit-webhook/cluster-a] - B[Source cluster B api server] --> Q[/audit-webhook/cluster-b] - P --> S[audit-server separate port] - Q --> S - S --> V[Cluster ID allowlist validator] - V --> R[Queue and backpressure controls] - R --> E[Event pipeline] -``` - -## Final position - -A separate audit-server listener on a separate port with configurable incoming TLS policy is the correct direction. For cluster differentiation, path-based identity is a practical default when it is combined with strict allowlist and network restrictions. The chart should be evolved to treat audit ingress as its own product surface with dedicated TLS, exposure, identity, and reliability controls.