Skip to content

feat: add two-phase extension upgrade with spec.schemaVersion#328

Open
WentingWu666666 wants to merge 22 commits intodocumentdb:mainfrom
WentingWu666666:developer/two-phase-extension-upgrade
Open

feat: add two-phase extension upgrade with spec.schemaVersion#328
WentingWu666666 wants to merge 22 commits intodocumentdb:mainfrom
WentingWu666666:developer/two-phase-extension-upgrade

Conversation

@WentingWu666666
Copy link
Copy Markdown
Collaborator

Summary

Add spec.schemaVersion field to DocumentDBSpec to decouple binary (image) upgrades from schema (ALTER EXTENSION) upgrades. This provides a rollback-safe window between deploying a new binary and committing the schema change.

Closes #271

Three Modes

spec.schemaVersion Behavior Use Case
Not set (default) Two-phase: schema stays at current version until user explicitly sets schemaVersion Production: manual control, rollback safety
"auto" Auto-finalize: schema updates to match binary version automatically Dev/test: simple, no rollback safety
Explicit version (e.g. "0.112.0") Schema updates to exactly that version (must be <= binary) Pin to specific version

User Flow (Two-Phase)

# Step 1: Update binary only (rollback-safe)
spec:
  documentDBVersion: "0.112.0"
  # schemaVersion: not set -> schema stays at old version

# Step 2: After validation, finalize schema
spec:
  documentDBVersion: "0.112.0"
  schemaVersion: "0.112.0"   # triggers ALTER EXTENSION UPDATE

Changes

  • api/preview/documentdb_types.go: Add SchemaVersion to DocumentDBSpec with validation pattern
  • internal/controller/documentdb_controller.go: Add determineSchemaTarget(), modify upgradeDocumentDBIfNeeded() to gate ALTER EXTENSION on spec.schemaVersion
  • internal/utils/util.go: Add SemverToExtensionVersion() inverse conversion
  • CRDs: Regenerated in both config/crd/ and helm-chart/crds/
  • Tests: 5 new tests for two-phase modes + 3 updated existing tests
  • Docs: Updated upgrades.md with Schema Version Control section

Backward Compatibility

Breaking change: Default behavior changes from auto-update to two-phase. Existing users upgrading to this operator version will need to set schemaVersion: "auto" to restore previous behavior, or adopt two-phase upgrades. This is intentional -- safe by default for a database operator.

@WentingWu666666 WentingWu666666 force-pushed the developer/two-phase-extension-upgrade branch 4 times, most recently from c0d35b1 to 525700b Compare March 26, 2026 18:08
wentingwu000 and others added 18 commits March 31, 2026 12:10
Add spec.schemaVersion field to DocumentDBSpec to decouple binary (image)
upgrades from schema (ALTER EXTENSION) upgrades. This provides a rollback-safe
window between deploying a new binary and committing the schema change.

Three modes:
- Empty (default): Two-phase mode. Schema stays at current version until user
  explicitly sets schemaVersion. Safe by default for production.
- "auto": Auto-finalize. Schema updates to match binary version automatically.
  Simple mode for development and testing.
- Explicit version: Schema updates to exactly that version. Must be <= binary.

Changes:
- api/preview/documentdb_types.go: Add SchemaVersion to DocumentDBSpec
- internal/controller/documentdb_controller.go: Add determineSchemaTarget()
  function, modify upgradeDocumentDBIfNeeded() to gate ALTER EXTENSION on
  spec.schemaVersion value
- internal/utils/util.go: Add SemverToExtensionVersion() inverse conversion
- Regenerated CRDs (config/crd + helm chart)
- Added unit tests for all three modes and edge cases
- Created public upgrade documentation (docs/operator-public-documentation/)
- Added Upgrading DocumentDB page to mkdocs.yml navigation

Closes documentdb#271

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Add a ValidatingWebhookConfiguration that enforces:
- schemaVersion must be <= binary version (on create and update)
- Image rollback below installed schema version is blocked (on update)

Components added:
- internal/webhook/documentdb_webhook.go: ValidateCreate/ValidateUpdate handlers
- internal/webhook/documentdb_webhook_test.go: 18 unit tests
- Helm template 10_documentdb_webhook.yaml: Issuer, Certificate, Service,
  ValidatingWebhookConfiguration with cert-manager CA injection
- Updated 09_documentdb_operator.yaml: webhook port, cert volume mount, args
- Updated cmd/main.go: webhook registration

The webhook runs inside the existing operator process on port 9443 using
cert-manager for TLS (same pattern as the sidecar injector). failurePolicy
is set to Fail for database safety. The controller retains defense-in-depth
checks as a secondary safety net.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Restructure webhook to use two validation function slices following the
CNPG ClusterCustomValidator pattern:

- validate(db)  spec-level rules run on both create and update
  Contains: validateSchemaVersionNotExceedsBinary
- validateChanges(new, old)  update-only rules comparing old vs new
  Contains: validateImageRollback

This makes it easy to add new validation rules  just append a function
to the appropriate slice. Each validator has a consistent signature
returning field.ErrorList, and errors from all validators are aggregated.

Also adds var _ webhook.CustomValidator = &DocumentDBValidator{} compile
check and uses apierrors.NewInvalid for proper Kubernetes error format.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
- Remove Step 1b (image rollback blocking) from controller  webhook handles it
- Simplify determineSchemaTarget: keep lightweight defense-in-depth guard
- Rename Pg-suffixed variables to full names (schemaExtensionVersion, etc.)
- Refactor webhook tests to Ginkgo/Gomega (matching CNPG and controller patterns)
- Add suite_test.go for webhook package

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
- Reframe upgrade types as Operator (control plane) vs DocumentDB (data plane)
- Explain documentDBVersion vs schemaVersion relationship clearly
- Reorganize data plane upgrade as step-by-step walkthrough with production/dev tabs
- Add Multi-Region Upgrades section with coordination guidance
- Move Advanced Image Overrides to its own section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
CNPG default is in-place restart. Switchover promotes a replica first,
then restarts the old primary as replica, minimizing write downtime.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
- Move CNPG callout from overview to operator upgrade section
- Fix overview table header and clarify optional schemaVersion
- Reword cluster upgrade intro for clarity
- Reorder multi-region steps: backup first, binary, then schema
- Minor wording improvements throughout

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Add Steps 5-8 to the upgrade & rollback CI workflow:
- Step 5: Re-upgrade binary after rollback tests
- Step 6: Schema finalization via spec.schemaVersion
- Step 7: Webhook rejection for rollback below schema
- Step 8: Webhook rejection for schema exceeding binary

Also strengthen Step 2 to assert schema version is unchanged
after binary upgrade (validates two-phase default behavior).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
@WentingWu666666 WentingWu666666 force-pushed the developer/two-phase-extension-upgrade branch from 6c308c6 to 1ba5f67 Compare March 31, 2026 16:11
Add tests for uncovered error branches:
- Webhook: type assertion errors in ValidateCreate/ValidateUpdate
- Webhook: empty binary version in validateImageRollback
- Webhook: unparseable versions in validateSchemaVersionNotExceedsBinary
- Webhook: unparseable versions in validateImageRollback
- Controller: unparseable binary version in determineSchemaTarget

Patch coverage: 86.1% -> 91.7% (99/108 statements)

Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
@WentingWu666666 WentingWu666666 marked this pull request as ready for review March 31, 2026 17:00
Copilot AI review requested due to automatic review settings March 31, 2026 17:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds spec.schemaVersion to decouple DocumentDB binary (image) upgrades from irreversible schema upgrades (ALTER EXTENSION ... UPDATE), enabling a two-phase, rollback-safer upgrade flow. This is implemented end-to-end across API/CRDs, controller logic, admission validation, tests, docs, Helm, and CI.

Changes:

  • Introduces spec.schemaVersion with modes: default two-phase (manual finalize), "auto" (auto-finalize), and explicit pin (<= binary).
  • Updates controller upgrade logic to gate ALTER EXTENSION based on spec.schemaVersion, and adds conversion helpers for version formats.
  • Adds a validating webhook to enforce schema<=binary and block image rollbacks below installed schema; updates Helm manifests and CI/docs accordingly.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
operator/src/internal/webhook/suite_test.go Adds Ginkgo suite entrypoint for webhook tests.
operator/src/internal/webhook/documentdb_webhook.go Implements validating webhook for schema/binary constraints and rollback blocking.
operator/src/internal/webhook/documentdb_webhook_test.go Adds unit tests for webhook validation behaviors.
operator/src/internal/utils/util.go Adds SemverToExtensionVersion() conversion helper.
operator/src/internal/utils/util_test.go Adds unit tests for semver→extension version conversion.
operator/src/internal/controller/documentdb_controller.go Adds determineSchemaTarget() and gates ALTER EXTENSION on spec.schemaVersion.
operator/src/internal/controller/documentdb_controller_test.go Adds/updates controller tests for two-phase schema upgrade behaviors.
operator/src/internal/cnpg/cnpg_cluster.go Sets CNPG PrimaryUpdateMethod to Switchover.
operator/src/config/crd/bases/documentdb.io_dbs.yaml Regenerates CRD with new spec.schemaVersion field and description.
operator/src/cmd/main.go Registers the validating webhook with the controller manager.
operator/src/api/preview/documentdb_types.go Adds SchemaVersion to DocumentDBSpec with validation pattern and docs.
operator/documentdb-helm-chart/templates/10_documentdb_webhook.yaml Adds cert-manager TLS + Service + ValidatingWebhookConfiguration resources.
operator/documentdb-helm-chart/templates/09_documentdb_operator.yaml Mounts webhook TLS secret and configures webhook cert path/port in the operator deployment.
operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml Regenerates Helm-packaged CRD with spec.schemaVersion.
docs/operator-public-documentation/preview/operations/upgrades.md Documents two-phase schema upgrades and rollback rules.
.github/workflows/test-upgrade-and-rollback.yml Extends upgrade/rollback CI flow to validate two-phase behavior + webhook enforcement.

- Webhook: reject (not silently allow) when version comparison fails
  with explicit schemaVersion set or during rollback validation
- resolveBinaryVersion: handle digest-only image references (@sha256:)
  and strip architecture suffixes from tags (e.g., 0.112.0-amd64)
- CI: extract semver from image tag, stripping arch suffix
- Controller: reduce SchemaUpdateAvailable event spam by only emitting
  when schema upgrade is newly detected, and downgrade log to V(1)
- Add tests for new extractSemver helper, digest refs, and arch suffixes

Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
When spec.schemaVersion is set to an explicit version but neither
spec.documentDBVersion nor spec.documentDBImage is specified, the
webhook cannot validate the schemaVersion <= binary invariant.

Previously this was silently allowed (controller handled it at
runtime). Now the webhook rejects with a clear error message
telling the user to also set documentDBVersion or documentDBImage.

Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Copy link
Copy Markdown
Collaborator

@xgerman xgerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #328feat: add two-phase extension upgrade with spec.schemaVersion

Files changed: 16 (+1761, -65) | Closes: #271


Overview

The two-phase approach is the right design — "safe by default" for a database operator. Architecture is sound. The webhook, controller changes, and test coverage are well-structured. Below are findings organized by severity.


🔴 Critical (1)

C1. failurePolicy: Fail blocks all DocumentDB operations if cert-manager is unhealthy

File: operator/documentdb-helm-chart/templates/10_documentdb_webhook.yaml, line 77
File: operator/src/internal/webhook/documentdb_webhook.go, line 25

The webhook uses failurePolicy: Fail. If cert-manager has not issued the certificate yet (e.g., during initial Helm install, cert-manager pod restart, or quota issues), the webhook Service has no valid TLS cert → the API server cannot reach it → all DocumentDB CREATE and UPDATE operations are rejected cluster-wide.

This is particularly dangerous on fresh installs: the Helm chart creates the ValidatingWebhookConfiguration, Certificate, and DocumentDB CRD in one release — but cert-manager Certificate issuance is asynchronous. There is a race window where the webhook is registered but not yet serving.

Recommendation: Either:

  • (a) Add a values.yaml toggle (webhook.enabled: true / webhook.failurePolicy: Fail|Ignore) with a default of Ignore during preview, switching to Fail at GA, OR
  • (b) Document the cert-manager dependency explicitly in the upgrade guide and README prerequisites, AND add a Helm pre-install/pre-upgrade hook that waits for the Certificate to be Ready before proceeding, OR
  • (c) At minimum, add a comment in the template explaining why Fail is chosen and the cert-manager timing dependency.

🟠 Major (5)

M1. Breaking change needs CHANGELOG entry and migration section

The PR body acknowledges this is a breaking change — default behavior shifts from auto-update to two-phase. However:

  • No CHANGELOG.md entry is included
  • The upgrade docs mention "safe-by-default" but do not have a dedicated "Migrating from pre-0.X to 0.X" section
  • Existing users who run helm upgrade will find their schema version frozen

Recommendation: Add a CHANGELOG entry under ## [Unreleased] with a ### Breaking Changes section. Add a migration callout box in upgrades.md explaining: "If you are upgrading from operator version < X.Y.Z, add spec.schemaVersion: "auto" to your DocumentDB resources to preserve previous behavior."

M2. PrimaryUpdateMethod: cnpgv1.PrimaryUpdateMethodSwitchover is unrelated to schema versioning

File: operator/src/internal/cnpg/cnpg_cluster.go, line 885

This sets the CNPG cluster primary update strategy to switchover (vs. restart). While likely a good change, it is orthogonal to two-phase schema upgrades and should be in a separate PR or at minimum called out in the PR description under a separate "Also includes" section.

Risk: If this PR needs to be reverted for a schema-version bug, the switchover change gets reverted too.

M3. SQL construction uses fmt.Sprintf — add safety comment

File: operator/src/internal/controller/documentdb_controller.go, line ~1055

return targetPgVersion, fmt.Sprintf("ALTER EXTENSION documentdb UPDATE TO '%s'", targetPgVersion)

The input is CRD-validated (pattern: ^(auto|[0-9]+\.[0-9]+\.[0-9]+)?$) and then transformed by SemverToExtensionVersion (digits + dots + hyphen only), so SQL injection risk is effectively zero. However, this pattern (string formatting SQL) will get flagged by security scanners and could concern future contributors.

Recommendation: Add a comment explaining why string formatting is safe here:

// Safe: targetPgVersion is derived from CRD-validated spec.schemaVersion
// (pattern: ^(auto|[0-9]+\.[0-9]+\.[0-9]+)?$) via SemverToExtensionVersion.
// ALTER EXTENSION ... TO does not support parameterized queries in PostgreSQL.

M4. resolveBinaryVersion returns empty for digest-only images without documentDBVersion

File: operator/src/internal/webhook/documentdb_webhook.go, lines ~191-207

if strings.Contains(ref, "@sha256:") {
    return db.Spec.DocumentDBVersion
}

If a user specifies documentDBImage: "registry.io/image@sha256:abc..." without setting documentDBVersion, the function returns "". The webhook then compares "" against the schema version, which may produce confusing validation errors.

Recommendation: Add a defensive check in the webhook: if resolveBinaryVersion returns "", reject with a clear message like "cannot determine binary version from image reference; set spec.documentDBVersion explicitly when using digest-only image references."

M5. Webhook test suite uses full envtest — consider lighter approach

File: operator/src/internal/webhook/suite_test.go

The test suite sets up a full envtest environment (API server + etcd) for webhook tests that are primarily unit-testing admission logic. This adds significant test execution time and CI resource usage.

Recommendation: Consider whether the admission handler tests truly need envtest, or if they can use a fake client. The extractSemver, resolveBinaryVersion, and validation functions are pure logic and should be tested without envtest.


🟡 Minor (6)

m1. Event emission guard may fire confusingly on brand-new clusters

File: operator/src/internal/controller/documentdb_controller.go, line ~1035

if r.Recorder != nil && documentdb.Status.SchemaVersion != util.ExtensionVersionToSemver(installedVersion) {

On initial cluster creation, status.schemaVersion is empty and installedVersion is the just-installed version. The condition "" != "0.109.0" is true, so the event fires — creating a confusing "Schema update available" event on brand-new clusters where there is nothing to upgrade.

Recommendation: Add a check: only emit the event when installedVersion != binaryVersion (i.e., when the binary is actually ahead of the schema).

m2. extractSemver does not validate that parts[0] and parts[1] are numeric

File: operator/src/internal/webhook/documentdb_webhook.go, lines ~215-231

The function validates that parts[2] starts with digits, but parts[0] and parts[1] could be non-numeric (e.g., tag "abc.def.123" would return "abc.def.123"). While unlikely in practice, it is a gap in defensive validation.

Recommendation: Add a quick strconv.Atoi check for parts[0] and parts[1], or use a semver regex.

m3. Two-phase mode test does not verify the "SchemaUpdateAvailable" event

File: operator/src/internal/controller/documentdb_controller_test.go

The test "should skip schema upgrade when schemaVersion is not set (two-phase mode)" verifies no SQL is called and status is correct, but does not assert that a SchemaUpdateAvailable event was recorded. This is a key user-visible behavior of two-phase mode.

Recommendation: Add an assertion on the fake recorder to verify the event message and reason.

m4. Helm template hardcodes webhook service name

File: operator/documentdb-helm-chart/templates/10_documentdb_webhook.yaml

The service name documentdb-webhook-service is hardcoded. Other templates in the chart use {{ .Release.Name }} for naming. Consider using a consistent naming pattern (e.g., {{ .Release.Name }}-webhook) for multi-instance deployments.

m5. SemverToExtensionVersion does not validate input format

File: operator/src/internal/utils/util.go

The function uses strings.LastIndex(version, ".") to replace the last dot with a hyphen. If the input does not contain a dot (e.g., "auto" or garbage input), LastIndex returns -1 and the function returns the input unchanged. While the caller validates input, a defensive check would improve robustness.

Recommendation: Add input validation or document the expected format in the function comment.

m6. Webhook Go file kubebuilder marker duplicates Helm template

File: operator/src/internal/webhook/documentdb_webhook.go, line 25

The //+kubebuilder:webhook marker generates a kustomize webhook config, but the actual deployment uses the Helm template (10_documentdb_webhook.yaml). The marker serves as documentation but could lead to confusion about the source of truth.

Recommendation: Add a comment clarifying that the Helm template is the authoritative webhook configuration.


🟢 Nitpick (4)

n1. upgrades.md uses future tense in multiple places

Per Microsoft Style Guide, prefer present tense:

  • "the operator will wait""the operator waits"
  • "you will need""you need"

n2. Content tabs use inconsistent heading levels

The === "Two-phase" tabs contain #### headings inside, which may not render in mkdocs nav. Consider bold text instead of headings inside tabs.

n3. CI workflow steps have redundant echo statements

File: .github/workflows/test-upgrade-and-rollback.yml

Steps 5-8 have descriptive names, but the echo statements inside repeat the step name. The redundant echo "=== Step N: ..." lines can be removed since GitHub Actions shows the step name.

n4. admissionReviewVersions only lists v1

Acceptable for K8s 1.30+ target per project compatibility matrix. No action needed unless earlier K8s versions need support.


✅ Positive Feedback

  • Two-phase design is excellent. This is exactly the right pattern for a database operator — schema changes are irreversible, so decoupling them from binary upgrades provides a critical safety net.
  • Test coverage is thorough. 6 new controller tests + 20+ webhook tests cover all three modes and edge cases. Existing tests were updated to add SchemaVersion: "auto" to maintain backward compatibility — good attention to detail.
  • Webhook implementation is clean. resolveBinaryVersion handles multiple image reference formats (tag, digest, arch suffix) gracefully. The extractSemver function is well-focused.
  • Documentation is well-structured. The upgrades.md rewrite with content tabs for Production/Rolling/Development modes is excellent UX.
  • PR description is exemplary. Clear summary, mode table, user flow example, and explicit backward compatibility callout.

Summary

Severity Count Action
🔴 Critical 1 Address webhook failurePolicy race with cert-manager
🟠 Major 5 CHANGELOG, PrimaryUpdateMethod separation, SQL comment, digest fallback, test weight
�� Minor 6 Event guard, validation gaps, naming consistency
🟢 Nitpick 4 Documentation style

Overall assessment: Well-designed feature with high code quality and solid test coverage. The main concerns are operational (C1: cert-manager timing) and communication (M1: breaking change). Recommend addressing C1 and M1 before merge; remaining items can be follow-ups.

/request-changes

- C1: Add readiness/liveness/startup probes + optional cert secret to
  operator deployment (CNPG pattern). Keep failurePolicy: Fail with
  webhook StartedChecker() readyz gate.
- M1: Add [Unreleased] CHANGELOG entry + migration callout in upgrades.md
- M3: Add SQL safety comment for ALTER EXTENSION fmt.Sprintf
- m1: Fix event guard to check binaryVersion != installedVersion
  (avoids firing on new clusters with empty status)
- m2: Add strconv.Atoi validation for major/minor in extractSemver
- m3: Add SchemaUpdateAvailable event assertion to two-phase test
- m6: Add comment clarifying kubebuilder marker vs Helm template
- n1: Verified docs already use present tense  no changes needed
- Added single-instance comments to webhook Service and config

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Copy link
Copy Markdown
Collaborator Author

@WentingWu666666 WentingWu666666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! Here's how I addressed each finding in commit 9c4822b:

Fixed

ID Finding Resolution
C1 FailurePolicy: Fail blocks ops if cert-manager unhealthy Kept Fail (aligns with CNPG pattern). Added readiness/liveness/startup probes + optional: true on cert secret + webhookServer.StartedChecker() readyz gate. Pod stays out of Service endpoints until TLS cert is loaded.
M1 Breaking change needs CHANGELOG Added [Unreleased] section with feature + breaking change entries. Added migration callout in upgrades.md.
M3 SQL construction safety Added comment explaining why fmt.Sprintf is safe - input is CRD-regex-validated + webhook-parsed, contains only digits/dots/hyphens.
M4 Digest fallback returns empty version Already fixed in prior commit (288f5c4) webhook rejects explicit schemaVersion when binary version is unresolvable.
m1 Event fires on new clusters Changed guard to binaryVersion != installedVersion instead of status mismatch.
m2 ExtractSemver missing numeric validation Added strconv.Atoi checks for major and minor parts.
m3 No event assertion in test Added SchemaUpdateAvailable event assertion to two-phase mode test.
m6 Kubebuilder marker confusion Added comment clarifying Helm template is authoritative for deployments; marker is for make run.
n1 Future tense in docs Verified docs already use present tense throughout - no changes needed.

Won't fix (with rationale)

ID Finding Rationale
M2 PrimaryUpdateMethod in separate PR Switchover is directly related - it enables rolling restarts without downtime during binary phase. Keeping it in this PR.
M5 Webhook tests should use envtest Incorrect premise - suite_test.go is a plain Ginkgo suite with no envtest/API server/etcd. The webhook tests are pure unit tests.
m4 Hardcoded webhook service name Acceptable for current single-instance design. Added comments noting what to template if multi-instance is needed in the future.
m5 SemverToExtensionVersion input validation Callers already validate via CRD regex pattern. Adding redundant validation inside would violate single-responsibility.
n2 Heading levels in tabs mkdocs-material handles tab-scoped headings correctly.
n3 Redundant echo in CI Useful for CI log readability when debugging step boundaries.
n4 admissionReviewVersions should include v1beta1 Our minimum is K8s 1.30+ where v1 is the standard. v1beta1 was removed in K8s 1.22.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement safety-gap pattern for rollback safety by decoupling image update from schema update

4 participants