Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
47c25e1
feat: add two-phase extension upgrade with spec.schemaVersion
wentingwu000 Mar 26, 2026
85b99e9
feat: add validating webhook for DocumentDB version safety
wentingwu000 Mar 26, 2026
af34418
refactor: adopt CNPG validation function list pattern
wentingwu000 Mar 26, 2026
b936f9c
refactor: simplify controller by moving validations to webhook
wentingwu000 Mar 26, 2026
ddf3f9d
docs: fix release tag format in CRD upgrade command (remove v prefix)
wentingwu000 Mar 27, 2026
1d859d6
docs: improve upgrade documentation with control/data plane framing
wentingwu000 Mar 27, 2026
9cf200d
docs: replace control/data plane with operator/clusters framing
wentingwu000 Mar 27, 2026
f23885e
docs: add rolling safety gap upgrade pattern to walkthrough
wentingwu000 Mar 27, 2026
ab49d4c
docs: use sequential versions in rolling safety gap example
wentingwu000 Mar 27, 2026
761f081
docs: clarify rollback rules schema is permanent, version floor enfo…
wentingwu000 Mar 27, 2026
71d48ac
feat: set primaryUpdateMethod to switchover for safer rolling updates
wentingwu000 Mar 28, 2026
9b11bd5
docs: improve rollback tab titles with clearer scenarios
wentingwu000 Mar 28, 2026
291459f
docs: reframe multi-region upgrades as binary-first-then-schema
wentingwu000 Mar 28, 2026
0e2fef6
docs: remove How It Works Internally section from upgrade docs
wentingwu000 Mar 31, 2026
e8b4f28
docs: improve upgrade docs fluency and structure
wentingwu000 Mar 31, 2026
6965e8b
docs: add links to CHANGELOG and release notes
wentingwu000 Mar 31, 2026
488ca6d
docs: add health check commands to pre-upgrade checklist
wentingwu000 Mar 31, 2026
1ba5f67
ci: add two-phase schema upgrade and webhook validation tests
wentingwu000 Mar 31, 2026
2173a84
test: improve patch coverage to meet 90% threshold
wentingwu000 Mar 31, 2026
2e87863
fix: address PR review comments
wentingwu000 Mar 31, 2026
288f5c4
fix: reject explicit schemaVersion when binary version unknown
wentingwu000 Mar 31, 2026
9c4822b
fix: address xgerman's PR review (C1, M1, M3, m1-m3, m6, n1)
wentingwu000 Apr 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
269 changes: 267 additions & 2 deletions .github/workflows/test-upgrade-and-rollback.yml
Original file line number Diff line number Diff line change
Expand Up @@ -623,13 +623,18 @@ jobs:
fi
echo "✓ Gateway image upgraded to $NEW_GATEWAY"

# Verify DocumentDB schema version updated
# Verify DocumentDB schema version unchanged (two-phase default: schema stays at old version)
VERSION_AFTER=$(kubectl get documentdb $DB_NAME -n $DB_NS -o jsonpath='{.status.schemaVersion}')
echo "DocumentDB schema version after upgrade: $VERSION_AFTER"
if [[ -z "$VERSION_AFTER" ]]; then
echo "❌ status.schemaVersion is empty after upgrade"
exit 1
fi
if [[ "$VERSION_AFTER" != "$VERSION_BEFORE" ]]; then
echo "❌ Schema version changed from $VERSION_BEFORE to $VERSION_AFTER — expected unchanged (two-phase default)"
exit 1
fi
echo "✓ Schema version unchanged after binary upgrade: $VERSION_AFTER (two-phase default validated)"

# Verify status fields
echo ""
Expand Down Expand Up @@ -979,7 +984,263 @@ jobs:
fi
rm -f /tmp/pf_output.log

- name: Collect comprehensive logs on failure
# ============================================================
# Steps 5-8: Two-phase schema upgrade and webhook validation
# ============================================================

- name: "Step 5: Re-upgrade binary (setup for schema tests)"
run: |
echo "=== Step 5: Re-upgrade Binary ==="
echo "Re-upgrading extension and gateway images to new version for schema tests..."

NEW_EXTENSION="${{ env.DOCUMENTDB_IMAGE }}"
NEW_GATEWAY="${{ env.GATEWAY_IMAGE }}"

# Patch both images back to new version
echo "Patching images to new versions..."
echo " Extension: → $NEW_EXTENSION"
echo " Gateway: → $NEW_GATEWAY"
kubectl patch documentdb $DB_NAME -n $DB_NS --type='merge' \
-p "{\"spec\":{\"documentDBImage\":\"$NEW_EXTENSION\",\"gatewayImage\":\"$NEW_GATEWAY\"}}"

echo "Waiting for cluster to be healthy with new images..."
timeout 600 bash -c '
while true; do
DB_STATUS=$(kubectl get documentdb "$1" -n "$2" -o jsonpath="{.status.status}" 2>/dev/null)
CLUSTER_STATUS=$(kubectl get cluster "$1" -n "$2" -o jsonpath="{.status.phase}" 2>/dev/null)
echo "DocumentDB status: $DB_STATUS, CNPG phase: $CLUSTER_STATUS"
if [[ "$DB_STATUS" == "Cluster in healthy state" && "$CLUSTER_STATUS" == "Cluster in healthy state" ]]; then
HEALTHY_PODS=$(kubectl get cluster "$1" -n "$2" -o jsonpath="{.status.instancesStatus.healthy}" 2>/dev/null | jq length 2>/dev/null || echo "0")
if [[ "$HEALTHY_PODS" -ge "1" ]]; then
POD_IMAGES=$(kubectl get pods -n "$2" -l cnpg.io/cluster="$1" -o jsonpath="{.items[*].spec.volumes[*].image.reference}" 2>/dev/null)
if echo "$POD_IMAGES" | grep -q "$3"; then
echo "✓ Cluster healthy with $HEALTHY_PODS pods running new images"
break
else
echo "Pods not yet running new extension image, waiting..."
fi
fi
fi
sleep 10
done
' -- "$DB_NAME" "$DB_NS" "$NEW_EXTENSION"

# Verify schema version is still at baseline
VERSION_CURRENT=$(kubectl get documentdb $DB_NAME -n $DB_NS -o jsonpath='{.status.schemaVersion}')
echo "Schema version after re-upgrade: $VERSION_CURRENT"
echo "SCHEMA_BASELINE=$VERSION_CURRENT" >> $GITHUB_ENV

echo ""
echo "✅ Step 5 passed: Binary re-upgraded for schema tests"
echo " Extension: $NEW_EXTENSION"
echo " Gateway: $NEW_GATEWAY"
echo " Schema: $VERSION_CURRENT (baseline)"

- name: Setup port forwarding for re-upgrade verification
uses: ./.github/actions/setup-port-forwarding
with:
namespace: ${{ env.DB_NS }}
cluster-name: ${{ env.DB_NAME }}
port: ${{ env.DB_PORT }}
architecture: ${{ matrix.architecture }}
test-type: 'comprehensive'

- name: Verify data persistence after re-upgrade
run: |
echo "=== Data Persistence: Verifying after re-upgrade ==="
mongosh 127.0.0.1:$DB_PORT \
-u $DB_USERNAME \
-p $DB_PASSWORD \
--authenticationMechanism SCRAM-SHA-256 \
--tls \
--tlsAllowInvalidCertificates \
--eval '
db = db.getSiblingDB("upgrade_test_db");
var count = db.test_collection.countDocuments();
assert(count === 2, "Expected 2 documents but found " + count + " after re-upgrade");
print("✓ All " + count + " documents persisted through re-upgrade");
'
echo "✓ Data persistence verified after re-upgrade"

- name: Cleanup port forwarding after re-upgrade verification
if: always()
run: |
if [ -f /tmp/pf_pid ]; then
PF_PID=$(cat /tmp/pf_pid)
kill $PF_PID 2>/dev/null || true
rm -f /tmp/pf_pid
fi
rm -f /tmp/pf_output.log

- name: "Step 6: Schema Finalization (two-phase commit)"
run: |
echo "=== Step 6: Schema Finalization ==="
echo "Setting spec.schemaVersion to finalize the schema migration..."

NEW_EXTENSION="${{ env.DOCUMENTDB_IMAGE }}"
SCHEMA_BASELINE="${{ env.SCHEMA_BASELINE }}"

# Determine the new schema version from the new extension image tag
# Strip any architecture suffix (e.g., "0.112.0-amd64" → "0.112.0")
RAW_TAG=$(echo "$NEW_EXTENSION" | sed 's/.*://')
NEW_SCHEMA_VERSION=$(echo "$RAW_TAG" | grep -oP '^\d+\.\d+\.\d+')
if [[ -z "$NEW_SCHEMA_VERSION" ]]; then
echo "✗ Could not extract semver from image tag: $RAW_TAG"
exit 1
fi
echo "Baseline schema version: $SCHEMA_BASELINE"
echo "Target schema version: $NEW_SCHEMA_VERSION"

# Set spec.schemaVersion to trigger ALTER EXTENSION UPDATE
kubectl patch documentdb $DB_NAME -n $DB_NS --type='merge' \
-p "{\"spec\":{\"schemaVersion\":\"$NEW_SCHEMA_VERSION\"}}"

echo "Waiting for schema version to update..."
timeout 300 bash -c '
while true; do
STATUS_SCHEMA=$(kubectl get documentdb "$1" -n "$2" -o jsonpath="{.status.schemaVersion}" 2>/dev/null)
DB_STATUS=$(kubectl get documentdb "$1" -n "$2" -o jsonpath="{.status.status}" 2>/dev/null)
echo "status.schemaVersion: $STATUS_SCHEMA, status: $DB_STATUS"
if [[ "$STATUS_SCHEMA" == "$3" && "$DB_STATUS" == "Cluster in healthy state" ]]; then
echo "✓ Schema version updated to $STATUS_SCHEMA"
break
fi
sleep 10
done
' -- "$DB_NAME" "$DB_NS" "$NEW_SCHEMA_VERSION"

# Verify schema version changed
FINAL_SCHEMA=$(kubectl get documentdb $DB_NAME -n $DB_NS -o jsonpath='{.status.schemaVersion}')
if [[ "$FINAL_SCHEMA" == "$NEW_SCHEMA_VERSION" ]]; then
echo "✓ Schema finalized: $SCHEMA_BASELINE → $FINAL_SCHEMA"
else
echo "❌ Schema version should be $NEW_SCHEMA_VERSION but is $FINAL_SCHEMA"
exit 1
fi

echo "NEW_SCHEMA_VERSION=$NEW_SCHEMA_VERSION" >> $GITHUB_ENV
echo ""
echo "✅ Step 6 passed: Schema finalized to $NEW_SCHEMA_VERSION"

- name: Setup port forwarding for schema finalization verification
uses: ./.github/actions/setup-port-forwarding
with:
namespace: ${{ env.DB_NS }}
cluster-name: ${{ env.DB_NAME }}
port: ${{ env.DB_PORT }}
architecture: ${{ matrix.architecture }}
test-type: 'comprehensive'

- name: Verify data persistence after schema finalization
run: |
echo "=== Data Persistence: Verifying after schema finalization ==="
mongosh 127.0.0.1:$DB_PORT \
-u $DB_USERNAME \
-p $DB_PASSWORD \
--authenticationMechanism SCRAM-SHA-256 \
--tls \
--tlsAllowInvalidCertificates \
--eval '
db = db.getSiblingDB("upgrade_test_db");
var count = db.test_collection.countDocuments();
assert(count === 2, "Expected 2 documents but found " + count + " after schema finalization");
print("✓ All " + count + " documents persisted through schema finalization");
'
echo "✓ Data persistence verified after schema finalization"

- name: Cleanup port forwarding after schema finalization verification
if: always()
run: |
if [ -f /tmp/pf_pid ]; then
PF_PID=$(cat /tmp/pf_pid)
kill $PF_PID 2>/dev/null || true
rm -f /tmp/pf_pid
fi
rm -f /tmp/pf_output.log

- name: "Step 7: Webhook — Reject Rollback Below Schema"
run: |
echo "=== Step 7: Webhook — Reject Rollback Below Schema ==="
echo "Attempting to roll back documentDBImage below status.schemaVersion..."

OLD_EXTENSION="${{ env.DOCUMENTDB_OLD_IMAGE }}"
CURRENT_SCHEMA="${{ env.NEW_SCHEMA_VERSION }}"

echo "Current schema version: $CURRENT_SCHEMA"
echo "Attempting rollback to: $OLD_EXTENSION"

# This SHOULD fail — the webhook must reject rollback below schema version
PATCH_OUTPUT=$(kubectl patch documentdb $DB_NAME -n $DB_NS --type='merge' \
-p "{\"spec\":{\"documentDBImage\":\"$OLD_EXTENSION\"}}" 2>&1) && {
echo "❌ Webhook did NOT reject the rollback — patch succeeded unexpectedly"
echo "Output: $PATCH_OUTPUT"
exit 1
}

echo "Patch rejected (expected). Output:"
echo "$PATCH_OUTPUT"

# Verify the error message mentions rollback blocking
if echo "$PATCH_OUTPUT" | grep -qi "rollback blocked\|older than installed schema"; then
echo "✓ Webhook correctly rejected rollback with expected error message"
else
echo "⚠️ Patch was rejected but error message doesn't match expected pattern"
echo " (Still passing — the important thing is the rejection)"
fi

# Verify cluster state is unchanged
CURRENT_IMAGE=$(kubectl get documentdb $DB_NAME -n $DB_NS -o jsonpath='{.spec.documentDBImage}')
NEW_EXTENSION="${{ env.DOCUMENTDB_IMAGE }}"
if [[ "$CURRENT_IMAGE" == "$NEW_EXTENSION" ]]; then
echo "✓ Cluster state unchanged — documentDBImage still at $CURRENT_IMAGE"
else
echo "❌ documentDBImage changed unexpectedly to $CURRENT_IMAGE"
exit 1
fi

echo ""
echo "✅ Step 7 passed: Webhook correctly blocked rollback below schema version"

- name: "Step 8: Webhook — Reject Schema Exceeds Binary"
run: |
echo "=== Step 8: Webhook — Reject Schema Exceeds Binary ==="
echo "Attempting to set schemaVersion higher than binary version..."

# Use an artificially high version that exceeds any binary
INVALID_SCHEMA="99.999.0"
echo "Attempting schemaVersion: $INVALID_SCHEMA"

# This SHOULD fail — the webhook must reject schema > binary
PATCH_OUTPUT=$(kubectl patch documentdb $DB_NAME -n $DB_NS --type='merge' \
-p "{\"spec\":{\"schemaVersion\":\"$INVALID_SCHEMA\"}}" 2>&1) && {
echo "❌ Webhook did NOT reject the invalid schema version — patch succeeded unexpectedly"
echo "Output: $PATCH_OUTPUT"
exit 1
}

echo "Patch rejected (expected). Output:"
echo "$PATCH_OUTPUT"

# Verify the error message mentions schema exceeding binary
if echo "$PATCH_OUTPUT" | grep -qi "exceeds.*binary"; then
echo "✓ Webhook correctly rejected schema version with expected error message"
else
echo "⚠️ Patch was rejected but error message doesn't match expected pattern"
echo " (Still passing — the important thing is the rejection)"
fi

# Verify schema version is unchanged
CURRENT_SCHEMA=$(kubectl get documentdb $DB_NAME -n $DB_NS -o jsonpath='{.status.schemaVersion}')
EXPECTED_SCHEMA="${{ env.NEW_SCHEMA_VERSION }}"
if [[ "$CURRENT_SCHEMA" == "$EXPECTED_SCHEMA" ]]; then
echo "✓ Schema version unchanged: $CURRENT_SCHEMA"
else
echo "❌ Schema version changed unexpectedly to $CURRENT_SCHEMA"
exit 1
fi

echo ""
echo "✅ Step 8 passed: Webhook correctly blocked schema version exceeding binary"
if: failure()
uses: ./.github/actions/collect-logs
with:
Expand Down Expand Up @@ -1008,6 +1269,10 @@ jobs:
echo "- Step 2: Upgrade both extension and gateway images" >> $GITHUB_STEP_SUMMARY
echo "- Step 3: Rollback extension image" >> $GITHUB_STEP_SUMMARY
echo "- Step 4: Rollback gateway image" >> $GITHUB_STEP_SUMMARY
echo "- Step 5: Re-upgrade binary (setup for schema tests)" >> $GITHUB_STEP_SUMMARY
echo "- Step 6: Schema finalization (two-phase commit)" >> $GITHUB_STEP_SUMMARY
echo "- Step 7: Webhook — reject rollback below schema" >> $GITHUB_STEP_SUMMARY
echo "- Step 8: Webhook — reject schema exceeds binary" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY

if [[ "${{ job.status }}" == "success" ]]; then
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## [Unreleased]

### Major Features
- **Two-Phase Extension Upgrade**: New `spec.schemaVersion` field separates binary upgrades (`spec.documentDBVersion`) from irreversible schema migrations (`ALTER EXTENSION UPDATE`). The default behavior gives you a rollback-safe window — update the binary first, validate, then finalize the schema. Set `schemaVersion: "auto"` for single-step upgrades in development environments. See the [upgrade guide](docs/operator-public-documentation/preview/operations/upgrades.md) for details.

### Breaking Changes
- **Validating webhook added**: A new `ValidatingWebhookConfiguration` enforces that `spec.schemaVersion` never exceeds the binary version and blocks `spec.documentDBVersion` rollbacks below the committed schema version. This requires [cert-manager](https://cert-manager.io/) to be installed in the cluster (it is already a prerequisite for the sidecar injector). Existing clusters upgrading to this release will have the webhook activated automatically via `helm upgrade`.

## [0.2.0] - 2026-03-25

### Major Features
Expand Down
Loading
Loading