Skip to content

Add periodic cleanup for orphaned Celery pidbox queues#3085

Open
majamassarini wants to merge 1 commit intopackit:mainfrom
majamassarini:prevent-valkey-filling-up
Open

Add periodic cleanup for orphaned Celery pidbox queues#3085
majamassarini wants to merge 1 commit intopackit:mainfrom
majamassarini:prevent-valkey-filling-up

Conversation

@majamassarini
Copy link
Copy Markdown
Member

@majamassarini majamassarini commented Apr 1, 2026

Problem: Celery workers create pidbox (control) reply queues for worker management commands (inspect, ping, stats, etc.). These queues accumulate when workers crash or restart improperly, leading to:

  • 1,693+ orphaned *.reply.celery.pidbox keys in production
  • Keys with no TTL (TTL = -1) that persist indefinitely

Root cause: Celery's Redis transport does not provide a native way to set TTL on pidbox reply queues when they're created. These are internal implementation details of Celery's broadcast/control mechanism, and there's no configuration option to automatically expire them.

Solution: Heartbeat cleanup task Since we cannot tell Celery to natively set TTL on pidbox messages, we implement a periodic heartbeat task that:

  • Runs nightly at 12:30 AM via Celery beat
  • Scans for *.reply.celery.pidbox keys without TTL
  • Sets 1-hour expiration on orphaned queues
  • Tracks total Redis keys via Prometheus for monitoring

Related to: packit/deployment#701
Should fix: #2983

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a maintenance task to clean up orphaned Celery pidbox reply queues in Redis by assigning a TTL to keys without one. The changes include a centralized Redis configuration utility, a new Prometheus metric for monitoring total Redis keys, and corresponding unit tests. Review feedback identifies a typo in a Redis environment variable and suggests using Redis pipelines to optimize the cleanup process by batching network operations.

Problem:
Celery workers create pidbox (control) reply queues for worker management
commands (inspect, ping, stats, etc.). These queues accumulate when workers
crash or restart improperly, leading to:
- 1,693+ orphaned *.reply.celery.pidbox keys in production
- Keys with no TTL (TTL = -1) that persist indefinitely

Root cause:
Celery's Redis transport does not provide a native way to set TTL on pidbox
reply queues when they're created. These are internal implementation details
of Celery's broadcast/control mechanism, and there's no configuration option
to automatically expire them.

Solution: Heartbeat cleanup task
Since we cannot tell Celery to natively set TTL on pidbox messages, we
implement a periodic heartbeat task that:
- Runs nightly at 12:30 AM via Celery beat
- Scans for *.reply.celery.pidbox keys without TTL
- Sets 1-hour expiration on orphaned queues
- Tracks total Redis keys via Prometheus for monitoring

Related to: packit/deployment#701
Should fix: packit#2983

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Assisted-By: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@majamassarini majamassarini force-pushed the prevent-valkey-filling-up branch from dc8a923 to 14d3b83 Compare April 1, 2026 08:20
@centosinfra-prod-github-app
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

valkey-pvc requires periodic increases

2 participants