feat: Resilient Background Job Retry & Monitoring (#130)#466
Open
qiridigital wants to merge 1 commit intorohitdash08:mainfrom
Open
feat: Resilient Background Job Retry & Monitoring (#130)#466qiridigital wants to merge 1 commit intorohitdash08:mainfrom
qiridigital wants to merge 1 commit intorohitdash08:mainfrom
Conversation
- Model: BackgroundJob with status (PENDING/RUNNING/SUCCEEDED/DEAD), attempts, max_attempts, last_error, next_run_at, exponential backoff - Service: services/jobs.py - register_job_handler() decorator for named handlers - enqueue() to persist jobs - Exponential backoff: BASE_DELAY * 2^(attempt-1) seconds between retries - MAX_ATTEMPTS=5 before marking job DEAD - Background polling thread with configurable POLL_INTERVAL_SECONDS - Built-in send_reminder handler wired to existing reminder delivery - start_scheduler() / stop_scheduler() for lifecycle management - Routes: GET /jobs (list), GET /jobs/stats (counts), POST /jobs/enqueue, POST /jobs/:id/retry (reset DEAD jobs) — all admin-only - Schema: background_jobs table with status+next_run_at composite index - App: scheduler auto-starts on app creation (skipped in TESTING mode) - Tests: test_jobs.py — 13 tests covering enqueue, success, retry, dead-letter, unknown handler, admin-only access, stats, manual enqueue, retry endpoint, auth Closes rohitdash08#130
There was a problem hiding this comment.
Pull request overview
Implements a DB-backed background job queue with retry/backoff, plus admin endpoints to monitor/enqueue/retry jobs, and adds corresponding schema + tests (per issue #130).
Changes:
- Added
BackgroundJobmodel +background_jobstable/index inschema.sql. - Introduced
app.services.jobsfor enqueueing, running, retry/backoff, and a polling scheduler thread. - Added
/jobsadmin endpoints and a newtest_jobs.pysuite covering unit + API behaviors.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/backend/app/models.py | Adds BackgroundJob ORM model. |
| packages/backend/app/db/schema.sql | Adds background_jobs table + index. |
| packages/backend/app/services/jobs.py | Implements enqueue/run/retry logic and scheduler thread; registers built-in send_reminder handler. |
| packages/backend/app/routes/jobs.py | Adds admin-only monitoring/enqueue/retry endpoints. |
| packages/backend/app/routes/init.py | Registers the new jobs blueprint. |
| packages/backend/app/init.py | Starts the scheduler on app startup (non-testing). |
| packages/backend/tests/test_jobs.py | Adds unit + API tests for the job system. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+18
to
+19
| from .extensions import db | ||
| from .models import BackgroundJob |
| import json | ||
| import logging | ||
| import threading | ||
| import time |
Comment on lines
+6
to
+8
| - APScheduler-based scheduler that polls and dispatches jobs | ||
| - Prometheus metrics for monitoring | ||
| - Admin monitoring endpoint at GET /jobs |
Comment on lines
+55
to
+58
| # Start background job scheduler (skip in test environments) | ||
| if not app.config.get("TESTING"): | ||
| from .services.jobs import start_scheduler | ||
| start_scheduler(app) |
| @@ -0,0 +1,118 @@ | |||
| """Background job monitoring endpoints (admin + self-service).""" | |||
|
|
|||
| import json | |||
Comment on lines
+3
to
+4
| import json | ||
| from datetime import datetime, timedelta |
Comment on lines
+55
to
+58
| # Start background job scheduler (skip in test environments) | ||
| if not app.config.get("TESTING"): | ||
| from .services.jobs import start_scheduler | ||
| start_scheduler(app) |
Comment on lines
+25
to
+27
| MAX_ATTEMPTS = 5 | ||
| BASE_DELAY_SECONDS = 10 # first retry waits 10 s, doubles each time | ||
| POLL_INTERVAL_SECONDS = 30 |
Comment on lines
+166
to
+167
| from .services.reminders import send_reminder | ||
| from .models import Reminder as ReminderModel |
Comment on lines
+55
to
+58
| # Start background job scheduler (skip in test environments) | ||
| if not app.config.get("TESTING"): | ||
| from .services.jobs import start_scheduler | ||
| start_scheduler(app) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Resilient Background Job Retry & Monitoring as described in #130.
Acceptance Criteria
Model: BackgroundJob
Persistent job queue entry stored in the
background_jobstable:Service: services/jobs.py
@register_job_handler("name")decorator — register any callable as a handler for a named job typeenqueue(job_type, payload)— persist a new PENDING job (with optionalrun_atscheduling)delay = 10s * 2^(attempt-1)(10s, 20s, 40s, 80s, 160s)start_scheduler(app)— launches a daemon thread polling every 30s for runnable PENDING jobsstop_scheduler()— signals graceful shutdownsend_reminderhandler wired to existing email/WhatsApp deliveryRoutes: /jobs (admin-only)
/jobs/jobs/stats/jobs/enqueue/jobs/:id/retryTests (test_jobs.py) — 13 tests
/claim #130