[SAP] Implement graceful shutdown for cinder services by hemna · Pull Request #314 · sapcc/cinder

hemna · 2026-02-20T20:05:32Z

Graceful Shutdown for Cinder Volume & Backup Services

Implements graceful shutdown that allows in-flight volume and backup operations to complete before the pod exits during Kubernetes rolling updates. Covers both cinder-volume and cinder-backup services.

How It Works

Phase 1 — Skip consumer cancel:

We intentionally do NOT send Basic.Cancel or call conn.stop_consuming(). Doing so causes eventlet socket races ("simultaneous read on fileno") that disrupt outbound HTTP/RPC connections used by in-flight operations (e.g., Swift reads during backup restore).
Instead, we rely solely on pool.waitall() in Phase 2.
The _runner greenthread remains blocked in drain_events() at 0% CPU — harmless.
The scheduler stops routing new work within service_down_time since we stop heartbeating.

Phase 2 — Wait for in-flight operations:

GreenPool.waitall() blocks until all RPC handler greenthreads finish
Worker entry heartbeat keeps entries fresh (prevents new pod cleanup interference)
Heartbeats continue (service stays "up" in DB)

Phase 3 — Clean exit:

Skip rpcserver.stop()/rpcserver.wait() (hangs on dead AMQP socket)
Process exits cleanly after stop() returns

Additional Mechanisms

Worker entry heartbeat (cinder/objects/cleanable.py): set_workers decorator spawns a greenthread that touches worker DB entries every 10s during operations. Prevents new pod's init_host → _do_cleanup from resetting in-flight volumes to 'error'.
do_cleanup freshness check (cinder/manager.py): Skips worker entries updated within service_down_time (60s). Only cleans up truly stale/crashed entries.
Backup restore heartbeat (cinder/backup/manager.py): Touches backup.updated_at every 10s during restore. Prevents new backup pod's init_host from resetting the backup status and triggering BackupRestoreCancel.
Backup _cleanup_one_backup freshness check: Skips backups in creating/restoring state if updated_at is recent.
Backup _detach_device no-reraise: If detach fails during shutdown (RPC timeout), log error but continue finalization. Data integrity preserved; dangling export cleaned up on next startup.
reject_if_draining decorator: Rejects new RPC calls during shutdown so scheduler routes to healthy backends.
Semaphore guard: Prevents concurrent stop() calls on same Service instance.

Requirements (separate changes)

dumb-init --single-child on cinder-volume AND cinder-backup container commands — ensures ProcessLauncher parent waits for all children before exit
terminationGracePeriodSeconds: 900 on pod spec

Test Results (qa-de-1, 2026-05-14 to 2026-05-15)

All tests use artificial delays in the FCD driver (not committed) to keep fast operations in-flight long enough for the pod kill to catch them.

#	Test	Operation	Kill Target	Result
1	`test_idle_shutdown`	Clean exit	Volume pod	✅ <1s
2	`test_inflight_volume_create`	Volume from image (16GB)	Volume pod	✅ (41s to 8min)
3	`test_inflight_backup`	Backup to Swift	Backup pod	✅
4	`test_scheduler_reroutes`	New work during drain	Volume pod	✅
5	`test_inflight_volume_delete`	Delete (with 30s delay)	Volume pod	✅
6	`test_inflight_volume_clone`	Clone volume	Volume pod	✅
7	`test_inflight_snapshot_create`	Snapshot create	Volume pod	✅
8	`test_inflight_snapshot_delete`	Snapshot delete	Volume pod	✅
9	`test_inflight_backup_kill_volume_pod`	Backup (kill volume pod)	Volume pod	✅
10	`test_inflight_volume_extend`	Extend 16→32GB	Volume pod	✅
11	`test_inflight_multiple_operations`	4 concurrent ops	Volume pod	✅
12	`test_inflight_restore_kill_backup_pod`	Backup restore	Backup pod	✅
13	`test_inflight_migrate_same_vc`	Migration (same vCenter)	Volume pod	✅
14	`test_inflight_migrate_cross_vc`	Migration (cross-datastore 16GB)	Volume pod	✅

Key Finding: Cross-Datastore Migration

The cross-datastore migration test confirms that even operations initiated during drain (after SIGTERM) complete successfully:

Pod killed while in 30s pre-relocate delay
FCD RelocateVStorageObject_Task issued to vCenter 13s AFTER kill
16GB copied between NFS datastores in ~14s (~1.1 GB/s, NetApp server-side copy)
Migration completed 29s after SIGTERM

Files Changed

File	Purpose
`cinder/service.py`	Three-phase shutdown, skip consumer cancel, pool.waitall
`cinder/manager.py`	`do_cleanup` freshness check for worker entries
`cinder/objects/cleanable.py`	Worker heartbeat greenthread in `set_workers` decorator
`cinder/volume/manager.py`	Direct flow execution, GS-DEBUG logging
`cinder/volume/flows/manager/create_volume.py`	CreateVolumeOnFinishTask unconditional write
`cinder/backup/manager.py`	Backup restore heartbeat + freshness check + no-reraise
`cinder/opts.py`	`graceful_shutdown_timeout` config option
`cinder/tests/unit/test_manager.py`	Unit tests
`cinder/tests/unit/test_service.py`	Unit tests
`doc/source/admin/graceful-shutdown-race-condition.rst`	Race condition documentation
`sap-doc/graceful-shutdown-test-results.md`	Test results

No oslo.messaging source changes required

All changes are self-contained in cinder. The shutdown mechanism relies on pool.waitall() — no manipulation of oslo.messaging internals needed.

Debugging Findings

Full debugging notes: https://github.wdf.sap.corp/gist/09ac921da78820047bdea06651b32205

Three-phase graceful shutdown that allows in-flight volume and backup operations to complete before the pod exits during Kubernetes rolling updates. Covers both cinder-volume and cinder-backup services. Phase 1: Skip consumer cancel (relying on pool.waitall in Phase 2). Previous approach (Basic.Cancel) caused eventlet socket races that disrupted outbound HTTP/RPC connections during drain. The scheduler stops routing new work within service_down_time. Phase 2: Block in pool.waitall() until all in-flight RPC handler greenthreads in the GreenPool complete their operations. Phase 3: Skip rpcserver.stop()/wait() (hangs on dead AMQP socket). Process exits cleanly after stop() returns. Additional mechanisms: - Worker entry heartbeat in set_workers decorator: touches worker DB entries every 10s during operations, preventing new pod init_host _do_cleanup from resetting in-flight volumes to error. - do_cleanup freshness check: skips worker entries updated within service_down_time (60s), only cleans up truly stale/crashed entries. - Backup restore heartbeat: touches backup.updated_at every 10s during restore, preventing new backup pod init_host from resetting the backup status and triggering BackupRestoreCancel. - Backup _cleanup_one_backup freshness check: skips backups in creating/restoring state if updated_at is recent. - Backup _detach_device no-reraise: if detach fails during shutdown (RPC timeout), log error but continue finalization. Data integrity is preserved; dangling export cleaned up on next startup. - Semaphore guard: prevents concurrent stop() calls on same Service. - Heartbeat continues during drain: service stays up in DB. - reject_if_draining decorator: rejects new RPC calls during shutdown so scheduler routes to healthy backends. Requires: - dumb-init --single-child (Helm chart change in separate commit) - terminationGracePeriodSeconds: 900 on pod spec Tested operations surviving pod termination (qa-de-1): - Volume create from image (41s to 8min drains) - Volume delete (with driver delay) - Volume extend (16->32GB with driver delay) - Volume clone - Snapshot create, snapshot delete - Multiple concurrent operations (4 ops on same pod) - Backup create (kill backup pod during Swift upload) - Backup restore (kill backup pod during data transfer) - Backup (kill volume pod during snapshot prep) - Migration same-vCenter (vc-a-0 -> vc-a-1, metadata re-home) - Migration cross-datastore (16GB FCD relocate between NFS datastores) - Scheduler rerouting during drain - Idle shutdown (clean exit <1s) Change-Id: Icdd28affc73fd34491b656a68410dce8e46264d4

hemna · 2026-05-15T22:23:49Z

https://github.wdf.sap.corp/gist/I530566/09ac921da78820047bdea06651b32205

hemna force-pushed the graceful-shutdown branch 3 times, most recently from 5a72074 to 1f69e00 Compare February 23, 2026 14:24

hemna force-pushed the graceful-shutdown branch 3 times, most recently from 4c183ec to d331087 Compare April 30, 2026 12:37

hemna force-pushed the graceful-shutdown branch 2 times, most recently from 1dd055a to d2fddd7 Compare May 14, 2026 22:25

hemna changed the title ~~[SAP] Try graceful shutdown~~ [SAP] Implement graceful shutdown for cinder services May 14, 2026

hemna force-pushed the graceful-shutdown branch 4 times, most recently from a235a58 to b77438d Compare May 14, 2026 22:41

Scsabiii previously approved these changes May 15, 2026

View reviewed changes

hemna dismissed Scsabiii’s stale review via 81fe034 May 15, 2026 14:21

hemna force-pushed the graceful-shutdown branch from 81fe034 to 399ad35 Compare May 15, 2026 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SAP] Implement graceful shutdown for cinder services#314

[SAP] Implement graceful shutdown for cinder services#314
hemna wants to merge 1 commit into
stable/2023.1-m3from
graceful-shutdown

hemna commented Feb 20, 2026 •

edited

Loading

Uh oh!

hemna commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hemna commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Graceful Shutdown for Cinder Volume & Backup Services

How It Works

Additional Mechanisms

Requirements (separate changes)

Test Results (qa-de-1, 2026-05-14 to 2026-05-15)

Key Finding: Cross-Datastore Migration

Files Changed

No oslo.messaging source changes required

Debugging Findings

Uh oh!

hemna commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hemna commented Feb 20, 2026 •

edited

Loading