Skip to content

feat: add webhook delivery persistence, idempotency keys, and retry queue#34

Open
brightcli-stack wants to merge 1 commit into
grantFoxin:mainfrom
brightcli-stack:feat/webhook-delivery-guarantees
Open

feat: add webhook delivery persistence, idempotency keys, and retry queue#34
brightcli-stack wants to merge 1 commit into
grantFoxin:mainfrom
brightcli-stack:feat/webhook-delivery-guarantees

Conversation

@brightcli-stack

Copy link
Copy Markdown

Fixes #30

What changed

  • Added eventId (UUID v4) to NotificationPayload — auto-generated if not provided
  • Added webhook_deliveries table to track every delivery attempt with status, attempt count, and timestamps
  • WebhookProvider.send() now persists delivery state before and after sending
  • Added BullMQ webhook-delivery queue with exponential backoff retries (4 attempts: 10s → 20s → 40s → 80s)
  • Created webhookDeliveryWorker with rate limiting (10 deliveries/min) and concurrency of 3
  • Included X-SentientFi-Event-Id header in webhook requests for consumer-side deduplication
  • Updated HMAC signature input to include eventId — consumers can use it as an idempotency key
  • Added migration 005_add_webhook_deliveries (up + down)

Why

Webhook notifications were delivered fire-and-forget with no persistence. If the consumer endpoint timed out, returned 5xx, or the backend restarted mid-delivery, the event was permanently lost. There was also no idempotency key, so consumers couldn't distinguish a legitimate retry from a duplicate.

How to test

  1. Configure a webhook endpoint that returns 500 on first attempt
  2. Trigger a rebalance notification
  3. Verify the delivery is recorded as pendingfailed in webhook_deliveries
  4. Verify the BullMQ retry queue attempts delivery with exponential backoff
  5. Verify X-SentientFi-Event-Id header is present on webhook requests
  6. Verify the same eventId is consistent across retries for the same event

…ueue

Webhook notifications were delivered fire-and-forget with no persistence,
no retry, and no idempotency key. If the consumer endpoint timed out,
returned 5xx, or the backend restarted mid-delivery, the event was
permanently lost.

Changes:
- Add eventId (UUID v4) to NotificationPayload for consumer-side
  deduplication and included it in HMAC signature input
- Add webhook_deliveries table to track delivery attempts with status
- Persist delivery attempts before and after sending
- Add BullMQ webhook-delivery queue with exponential backoff retries
  (4 attempts: 10s → 20s → 40s → 80s)
- Create webhookDeliveryWorker with rate limiting (10/min)
- Include X-SentientFi-Event-Id header in webhook requests

Fixes grantFoxin#30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Webhook notification system has no delivery guarantee, deduplication, or retry idempotency — events are silently dropped under transient failures

1 participant