You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add delivery state store for tracking per-consumer message state separately from the immutable log
- Add subscriber heartbeat store for liveness detection and fair partition leasing
- Refactor message store to append-only immutable log with contiguous watermark tracking
- Refactor partition lease store with heartbeat-based fair leasing
- Add batch query support for offset and delivery state operations
- Add GC throttling and dead code removal
- Fix make fmt to skip read-only pkg/mod directory
cfg.BatchSize = 1// One message in-flight at a time
86
+
cfg.VisibilityTimeoutMs = 120000// Must exceed max processing time
87
+
```
99
88
100
-
Each partition worker goroutine polls and delivers messages for its partition independently. This provides fault isolation — a slow or blocked partition does not affect other partitions.
89
+
This guarantees no concurrent processing within a partition. See the [RFC](../../doc/rfc/sql-queue-rfc.md#ordering-and-serialization) for details on ordering semantics and non-blocking nack behavior.
|`queue_subscriber_heartbeats`| Active subscriber tracking |`(consumer_group, topic, subscriber_name)`|
126
+
127
+
See `schema/` for full SQL definitions. See the [RFC](../../doc/rfc/sql-queue-rfc.md#database-schema) for field-level documentation.
128
+
129
+
### Store Architecture
130
+
131
+
Each table is backed by an internal store interface defined in `stores.go`. Stores:
132
+
- Query only their own table (no cross-table JOINs)
133
+
- Return errors via `fmt.Errorf` (no logging, no error classification)
134
+
- Use `metrics.Begin`/`Complete` for latency and success/failure tracking
135
+
136
+
The subscriber layer orchestrates cross-store operations (e.g., watermark advancement queries both `messageStore` and `deliveryStateStore`) and owns all logging and error classification.
137
+
138
+
### Goroutine Model
139
+
140
+
Each subscription has a **supervisor goroutine** (`managePartitions`) that discovers partitions, acquires leases, sends heartbeats, rebalances, and reconciles per-partition worker goroutines.
101
141
102
142
```
103
143
Subscribe()
104
-
└── managePartitions (supervisor)
105
-
├── partitionWorker("part-1") ← polls & delivers
106
-
├── partitionWorker("part-2") ← polls & delivers
107
-
└── partitionWorker("part-3") ← polls & delivers
144
+
└── managePartitions (supervisor) ← tracked by sub.wg
145
+
├── partitionWorker("part-1") ← tracked by sub.workerWg
146
+
├── partitionWorker("part-2")
147
+
└── partitionWorker("part-3")
108
148
```
109
149
110
-
### Shutdown Sequence
150
+
Each partition worker runs independently — polls the DB on a ticker, checks deliverability via `GetDeliveryState` per message, and sends deliveries to the shared channel. A slow or blocked partition does not affect other partitions.
111
151
112
-
Shutdown uses two `sync.WaitGroup`s to ensure correctness:
113
-
-`wg` tracks the supervisor goroutine (`managePartitions`)
114
-
-`workerWg` tracks all partition worker goroutines
152
+
### Shutdown Sequence
115
153
116
154
When `Close()` is called:
117
155
1. Subscription context is cancelled
118
-
2.`managePartitions` calls `stopAllWorkers` — cancels each worker and waits up to 5s per worker
119
-
3. Partition leases are released
120
-
4.`workerWg.Wait()` blocks until all workers have fully exited
121
-
5.`deliveryCh` is closed — safe because no workers can send after step 4
122
-
6.`managePartitions` returns, `wg.Done()` fires
123
-
7.`Close()`returns
156
+
2.`managePartitions` calls `stopAllWorkers` — cancels each worker's context, waits up to 30s
157
+
3. Partition leases are released (fresh context, not cancelled)
158
+
4.Subscriber heartbeat is deregistered
159
+
5.`workerWg.Wait()` — blocks until all workers have fully exited
160
+
6.`deliveryCh` is closed — safe because no senders remain after step 5
- Messages become visible after visibility timeout
151
-
- Other workers steal stale leases
152
-
- Resume from last acked offset
188
+
Stores do not log errors — they return them. The subscriber propagates all errors to the top call site (`managePartitions` or `run`), which logs once with full context (`topic`, `consumer_group`, `subscriber_name`).
153
189
154
-
## Partition Ordering
190
+
## Testing
155
191
156
-
Messages with same `PartitionKey` are processed in order by a single worker.
192
+
### Unit Tests
193
+
194
+
```bash
195
+
bazel test //extension/queue/mysql:mysql_test --test_output=streamed
196
+
bazel test //extension/queue/mysql/ctl/...:all --test_output=streamed
197
+
```
157
198
158
-
## Distributed Processing
199
+
### Integration Tests
200
+
201
+
Requires Docker running:
202
+
203
+
```bash
204
+
bazel test //test/integration/extension/queue/... --test_output=streamed
205
+
```
159
206
160
-
Multiple workers in the same consumer group share partitions. Workers in different consumer groups consume independently.
0 commit comments