feat(outbox): implement transactional outbox and async event relay#3
Merged
Conversation
Eliminates the dual-write problem in the player service by collapsing business logic and event writes into a single Postgres transaction. Events are asynchronously polled and dispatched to NATS via a dedicated `outbox-relay` service. - **Atomicity**: Injected `OutboxPublisher` to route player events into the `outbox_events` table within the same ACID transaction. - **Relay Architecture**: Implemented safe concurrent polling using `FOR UPDATE SKIP LOCKED` with a bounded worker pool in `outbox-relay`. - **Observability**: Captured and replayed W3C trace contexts inside JSONB `carrier` to ensure unbroken distributed tracing spans. - **Resilience**: Assured at-least-once delivery; consumers de-duplicate on stable `Event.ID`. Added integration tests for rollbacks and retries. - **Scope Limitation**: Documented that Redis-backed services (Matchmaking/Leaderboard) remain direct-publish due to lack of shared ACID transactions.
…vent Eliminates redundant exported name stuttering in the `outbox` package by renaming `OutboxEvent` to `Event`, adhering to idiomatic Go naming conventions. - Updated type definition, TableName, and model refs in `store.go`. - Refactored internal channels and helper signatures in `relay.go`. - Updated GORM `AutoMigrate` call in `cmd/player/main.go`. - Refactored `relay_test.go`, `postgres_test.go`, and `outbox_test.go`. - Kept `pkg/metrics` fields unchanged as they do not stutter.
Guarantee that a committed Postgres transaction can never lose its domain event. The player service now writes business rows and the event to outbox_events in one transaction; a dedicated outbox-relay process polls committed rows and publishes them to NATS JetStream, retrying until success (at-least-once). - outbox_events table (migration 0003 + schema.sql + GORM model) - OutboxPublisher implements events.Publisher, inserting on the business tx instead of publishing inline - Relay: bounded worker pool, FOR UPDATE SKIP LOCKED batching, graceful shutdown, PENDING->PUBLISHED lifecycle with attempt_count - New PlayerRegistered/PlayerUpdated events on the events.player stream - Trace continuity via stored W3C carrier; outbox.* spans - Prometheus metrics: pending, published_total, failures_total, publish_duration_seconds - OUTBOX_* config, docker-compose service, Prometheus scrape target - Unit tests (relay publish/retry/replay/trace) + integration tests (atomic write, rollback, RunBatch)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Eliminates the dual-write problem in the player service by collapsing business logic and event writes into a single Postgres transaction. Events are asynchronously polled and dispatched to NATS via a dedicated
outbox-relayservice.OutboxPublisherto route player events into theoutbox_eventstable within the same ACID transaction.FOR UPDATE SKIP LOCKEDwith a bounded worker pool inoutbox-relay.carrierto ensure unbroken distributed tracing spans.Event.ID. Added integration tests for rollbacks and retries.