Feature Description
Build a reusable outbound webhook delivery system that handles HMAC payload signing, retry with exponential backoff, dead letter queue, SSRF protection, and delivery logging. Existing alert notification code migrates onto it. The same primitive becomes available for future features that need to deliver events to external systems.
Problem/Use Case
Outbound HTTP delivery exists today inside the alert notification path, but it's specific to that one feature. Several upcoming features will need the same plumbing: digest reports (#155), workflow integrations, custom event subscriptions. Building each one independently would duplicate retry logic, signing, error handling, and SSRF defense — and the result would be inconsistent quality (the first one done well, the rest "good enough").
There's also a security dimension: outbound HTTP from a server-side application is an SSRF vector if not handled carefully. Centralizing it in one well-tested module is much safer than scattering it across features.
Proposed Solution
A webhookDispatcher module:
webhookDispatcher.enqueue({
url: string,
payload: unknown,
organizationId: string,
eventType: string,
signingSecret?: string,
headers?: Record<string, string>,
metadata?: Record<string, unknown>,
})
What it does:
- SSRF protection — full DNS resolution of the target host, reject if it resolves to a private/loopback/link-local address. Re-resolve at request time to defend against DNS rebinding.
- HMAC signing — when a signing secret is provided, sign the body with
HMAC-SHA256 and add X-Logtide-Signature and X-Logtide-Timestamp headers. Standard, documented, easy for receivers to verify.
- Retry with backoff — exponential backoff (e.g. 1s, 5s, 25s, 2m, 10m), max attempts configurable, only retry on transient failures (5xx, network errors, timeouts).
- Dead letter queue — after final failure, the delivery lands in a DLQ table with full request/response info for inspection and manual replay.
- Delivery log — every attempt (success or failure) is recorded with timestamp, status code, duration, response excerpt. Visible in a dashboard view.
- Per-organization concurrency limit — prevent one tenant's slow webhook receiver from saturating the worker pool.
Backed by BullMQ with deterministic job IDs to make idempotency easy on the consumer side. Each event has a stable id; receivers can deduplicate.
Alternatives Considered
- Continue scattering webhook delivery per feature. Worse SSRF posture, inconsistent retry behavior, no central observability. Rejected.
- Use an external service (e.g. Hookdeck, Svix). Adds a third-party dependency that breaks self-hosted/air-gapped deployments and contradicts the privacy-first philosophy of the project.
- Build minimal version now, add observability/DLQ later. Tempting, but the migration cost of moving alert notifications onto the new dispatcher is paid once. Better to land it complete.
Implementation Details (Optional)
- SSRF guard: resolve DNS, check that no resolved A/AAAA record is in private space (RFC 1918, ULA, link-local, loopback, multicast, reserved). For the actual HTTP request, pin to the resolved IP to prevent TOCTOU rebinding. Allow an opt-in
allowPrivateNetworks flag for trusted on-prem deployments.
- HMAC signing:
X-Logtide-Signature: t=<unix>,v1=<hex>, where the signed string is <unix>.<body>. Document the verification snippet for receivers.
- Job IDs:
webhook:${organizationId}:${eventType}:${eventId} — deterministic, deduplicates retries triggered by upstream errors.
- DLQ: a separate table
webhook_deliveries_failed with the full job, last response, last error. A dashboard view lists DLQ entries per org with a "retry" button (which re-enqueues with a fresh job).
- Delivery log: capacity-bounded — keep last N attempts per webhook (configurable, default 1000). Deeper history would need a separate storage decision.
- Existing alert notification code becomes the first consumer. No behavioral regression — same delivery semantics, just centralized.
- Coordinates with the lifecycle hooks issue:
beforeWebhookDispatch is the right place for downstream platforms to inspect or reject deliveries.
Priority
Target Users
- Operators integrating Logtide alerts with external systems (PagerDuty, Slack, custom internal tools)
- Teams building automation around Logtide events (CI triggers, ticket creation, custom workflows)
- Future features requiring reliable event delivery (digest reports, custom event subscriptions)
Contribution
Feature Description
Build a reusable outbound webhook delivery system that handles HMAC payload signing, retry with exponential backoff, dead letter queue, SSRF protection, and delivery logging. Existing alert notification code migrates onto it. The same primitive becomes available for future features that need to deliver events to external systems.
Problem/Use Case
Outbound HTTP delivery exists today inside the alert notification path, but it's specific to that one feature. Several upcoming features will need the same plumbing: digest reports (#155), workflow integrations, custom event subscriptions. Building each one independently would duplicate retry logic, signing, error handling, and SSRF defense — and the result would be inconsistent quality (the first one done well, the rest "good enough").
There's also a security dimension: outbound HTTP from a server-side application is an SSRF vector if not handled carefully. Centralizing it in one well-tested module is much safer than scattering it across features.
Proposed Solution
A
webhookDispatchermodule:What it does:
HMAC-SHA256and addX-Logtide-SignatureandX-Logtide-Timestampheaders. Standard, documented, easy for receivers to verify.Backed by BullMQ with deterministic job IDs to make idempotency easy on the consumer side. Each event has a stable id; receivers can deduplicate.
Alternatives Considered
Implementation Details (Optional)
allowPrivateNetworksflag for trusted on-prem deployments.X-Logtide-Signature: t=<unix>,v1=<hex>, where the signed string is<unix>.<body>. Document the verification snippet for receivers.webhook:${organizationId}:${eventType}:${eventId}— deterministic, deduplicates retries triggered by upstream errors.webhook_deliveries_failedwith the full job, last response, last error. A dashboard view lists DLQ entries per org with a "retry" button (which re-enqueues with a fresh job).beforeWebhookDispatchis the right place for downstream platforms to inspect or reject deliveries.Priority
Target Users
Contribution