[SAP] Implement graceful shutdown for cinder services#314
Open
hemna wants to merge 1 commit into
Open
Conversation
5a72074 to
1f69e00
Compare
4c183ec to
d331087
Compare
1dd055a to
d2fddd7
Compare
a235a58 to
b77438d
Compare
Scsabiii
previously approved these changes
May 15, 2026
Three-phase graceful shutdown that allows in-flight volume and backup
operations to complete before the pod exits during Kubernetes rolling
updates. Covers both cinder-volume and cinder-backup services.
Phase 1: Skip consumer cancel (relying on pool.waitall in Phase 2).
Previous approach (Basic.Cancel) caused eventlet socket races
that disrupted outbound HTTP/RPC connections during drain.
The scheduler stops routing new work within service_down_time.
Phase 2: Block in pool.waitall() until all in-flight RPC handler
greenthreads in the GreenPool complete their operations.
Phase 3: Skip rpcserver.stop()/wait() (hangs on dead AMQP socket).
Process exits cleanly after stop() returns.
Additional mechanisms:
- Worker entry heartbeat in set_workers decorator: touches worker DB
entries every 10s during operations, preventing new pod init_host
_do_cleanup from resetting in-flight volumes to error.
- do_cleanup freshness check: skips worker entries updated within
service_down_time (60s), only cleans up truly stale/crashed entries.
- Backup restore heartbeat: touches backup.updated_at every 10s during
restore, preventing new backup pod init_host from resetting the
backup status and triggering BackupRestoreCancel.
- Backup _cleanup_one_backup freshness check: skips backups in
creating/restoring state if updated_at is recent.
- Backup _detach_device no-reraise: if detach fails during shutdown
(RPC timeout), log error but continue finalization. Data integrity
is preserved; dangling export cleaned up on next startup.
- Semaphore guard: prevents concurrent stop() calls on same Service.
- Heartbeat continues during drain: service stays up in DB.
- reject_if_draining decorator: rejects new RPC calls during shutdown
so scheduler routes to healthy backends.
Requires:
- dumb-init --single-child (Helm chart change in separate commit)
- terminationGracePeriodSeconds: 900 on pod spec
Tested operations surviving pod termination (qa-de-1):
- Volume create from image (41s to 8min drains)
- Volume delete (with driver delay)
- Volume extend (16->32GB with driver delay)
- Volume clone
- Snapshot create, snapshot delete
- Multiple concurrent operations (4 ops on same pod)
- Backup create (kill backup pod during Swift upload)
- Backup restore (kill backup pod during data transfer)
- Backup (kill volume pod during snapshot prep)
- Migration same-vCenter (vc-a-0 -> vc-a-1, metadata re-home)
- Migration cross-datastore (16GB FCD relocate between NFS datastores)
- Scheduler rerouting during drain
- Idle shutdown (clean exit <1s)
Change-Id: Icdd28affc73fd34491b656a68410dce8e46264d4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Graceful Shutdown for Cinder Volume & Backup Services
Implements graceful shutdown that allows in-flight volume and backup operations to complete before the pod exits during Kubernetes rolling updates. Covers both cinder-volume and cinder-backup services.
How It Works
Phase 1 — Skip consumer cancel:
conn.stop_consuming(). Doing so causes eventlet socket races ("simultaneous read on fileno") that disrupt outbound HTTP/RPC connections used by in-flight operations (e.g., Swift reads during backup restore).pool.waitall()in Phase 2._runnergreenthread remains blocked indrain_events()at 0% CPU — harmless.service_down_timesince we stop heartbeating.Phase 2 — Wait for in-flight operations:
GreenPool.waitall()blocks until all RPC handler greenthreads finishPhase 3 — Clean exit:
rpcserver.stop()/rpcserver.wait()(hangs on dead AMQP socket)stop()returnsAdditional Mechanisms
cinder/objects/cleanable.py):set_workersdecorator spawns a greenthread that touches worker DB entries every 10s during operations. Prevents new pod'sinit_host→_do_cleanupfrom resetting in-flight volumes to 'error'.do_cleanupfreshness check (cinder/manager.py): Skips worker entries updated withinservice_down_time(60s). Only cleans up truly stale/crashed entries.cinder/backup/manager.py): Touchesbackup.updated_atevery 10s during restore. Prevents new backup pod'sinit_hostfrom resetting the backup status and triggeringBackupRestoreCancel._cleanup_one_backupfreshness check: Skips backups in creating/restoring state ifupdated_atis recent._detach_deviceno-reraise: If detach fails during shutdown (RPC timeout), log error but continue finalization. Data integrity preserved; dangling export cleaned up on next startup.reject_if_drainingdecorator: Rejects new RPC calls during shutdown so scheduler routes to healthy backends.stop()calls on same Service instance.Requirements (separate changes)
dumb-init --single-childon cinder-volume AND cinder-backup container commands — ensures ProcessLauncher parent waits for all children before exitterminationGracePeriodSeconds: 900on pod specTest Results (qa-de-1, 2026-05-14 to 2026-05-15)
All tests use artificial delays in the FCD driver (not committed) to keep fast operations in-flight long enough for the pod kill to catch them.
test_idle_shutdowntest_inflight_volume_createtest_inflight_backuptest_scheduler_reroutestest_inflight_volume_deletetest_inflight_volume_clonetest_inflight_snapshot_createtest_inflight_snapshot_deletetest_inflight_backup_kill_volume_podtest_inflight_volume_extendtest_inflight_multiple_operationstest_inflight_restore_kill_backup_podtest_inflight_migrate_same_vctest_inflight_migrate_cross_vcKey Finding: Cross-Datastore Migration
The cross-datastore migration test confirms that even operations initiated during drain (after SIGTERM) complete successfully:
RelocateVStorageObject_Taskissued to vCenter 13s AFTER killFiles Changed
cinder/service.pycinder/manager.pydo_cleanupfreshness check for worker entriescinder/objects/cleanable.pyset_workersdecoratorcinder/volume/manager.pycinder/volume/flows/manager/create_volume.pycinder/backup/manager.pycinder/opts.pygraceful_shutdown_timeoutconfig optioncinder/tests/unit/test_manager.pycinder/tests/unit/test_service.pydoc/source/admin/graceful-shutdown-race-condition.rstsap-doc/graceful-shutdown-test-results.mdNo oslo.messaging source changes required
All changes are self-contained in cinder. The shutdown mechanism relies on
pool.waitall()— no manipulation of oslo.messaging internals needed.Debugging Findings
Full debugging notes: https://github.wdf.sap.corp/gist/09ac921da78820047bdea06651b32205