Add periodic cleanup for orphaned Celery pidbox queues#3085
Open
majamassarini wants to merge 1 commit intopackit:mainfrom
Open
Add periodic cleanup for orphaned Celery pidbox queues#3085majamassarini wants to merge 1 commit intopackit:mainfrom
majamassarini wants to merge 1 commit intopackit:mainfrom
Conversation
98dfdbf to
69a55e2
Compare
Contributor
There was a problem hiding this comment.
Code Review
This pull request implements a maintenance task to clean up orphaned Celery pidbox reply queues in Redis by assigning a TTL to keys without one. The changes include a centralized Redis configuration utility, a new Prometheus metric for monitoring total Redis keys, and corresponding unit tests. Review feedback identifies a typo in a Redis environment variable and suggests using Redis pipelines to optimize the cleanup process by batching network operations.
Problem: Celery workers create pidbox (control) reply queues for worker management commands (inspect, ping, stats, etc.). These queues accumulate when workers crash or restart improperly, leading to: - 1,693+ orphaned *.reply.celery.pidbox keys in production - Keys with no TTL (TTL = -1) that persist indefinitely Root cause: Celery's Redis transport does not provide a native way to set TTL on pidbox reply queues when they're created. These are internal implementation details of Celery's broadcast/control mechanism, and there's no configuration option to automatically expire them. Solution: Heartbeat cleanup task Since we cannot tell Celery to natively set TTL on pidbox messages, we implement a periodic heartbeat task that: - Runs nightly at 12:30 AM via Celery beat - Scans for *.reply.celery.pidbox keys without TTL - Sets 1-hour expiration on orphaned queues - Tracks total Redis keys via Prometheus for monitoring Related to: packit/deployment#701 Should fix: packit#2983 Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Assisted-By: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
dc8a923 to
14d3b83
Compare
Contributor
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 48s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem: Celery workers create pidbox (control) reply queues for worker management commands (inspect, ping, stats, etc.). These queues accumulate when workers crash or restart improperly, leading to:
Root cause: Celery's Redis transport does not provide a native way to set TTL on pidbox reply queues when they're created. These are internal implementation details of Celery's broadcast/control mechanism, and there's no configuration option to automatically expire them.
Solution: Heartbeat cleanup task Since we cannot tell Celery to natively set TTL on pidbox messages, we implement a periodic heartbeat task that:
Related to: packit/deployment#701
Should fix: #2983