Problem
pkg/ethrpc/submitter/submitter.go polls mempool rows in pending status and pushes each one through erc20.TransferFrom on Canton, then transitions the row to completed / failed. The drain query is a plain SELECT … WHERE status = 'pending' with no row-level locking, no claim/lease, and no leader election around Submitter.Start.
When the API server runs as more than one instance (the expected deployment shape once we scale horizontally), every instance:
- Selects the same batch of pending rows on each tick.
- Issues a Canton TransferFrom gRPC call for every one of them in parallel with the other instances.
- Races to write the outcome back to Postgres.
Current safety net (why nothing is corrupted today)
CompleteMempoolEntry and FailMempoolEntry both guard with WHERE status = 'pending', so the second writer is a silent no-op. So the row can't be flipped from completed to failed or vice versa.
- Canton command-id idempotency on the tx hash means duplicate submissions resolve to one Canton commit
Why it still matters
- Wasted Canton RPCs. N instances ⇒ ~N× the gRPC traffic and ~N× the load on Canton for the same backlog.
Goal
Each pending row should be processed by exactly one submitter instance per attempt. Transient failures must remain retryable (idempotent under command-id).
Problem
pkg/ethrpc/submitter/submitter.gopolls mempool rows in pending status and pushes each one througherc20.TransferFromon Canton, then transitions the row to completed / failed. The drain query is a plainSELECT … WHERE status = 'pending'with no row-level locking, no claim/lease, and no leader election aroundSubmitter.Start.When the API server runs as more than one instance (the expected deployment shape once we scale horizontally), every instance:
Current safety net (why nothing is corrupted today)
CompleteMempoolEntryandFailMempoolEntryboth guard with WHERE status = 'pending', so the second writer is a silent no-op. So the row can't be flipped from completed to failed or vice versa.Why it still matters
Goal
Each pending row should be processed by exactly one submitter instance per attempt. Transient failures must remain retryable (idempotent under command-id).