Skip to content

feat(submitter): make pending-entry drain safe and efficient across multiple instances #282

@dhyaniarun1993

Description

@dhyaniarun1993

Problem

pkg/ethrpc/submitter/submitter.go polls mempool rows in pending status and pushes each one through erc20.TransferFrom on Canton, then transitions the row to completed / failed. The drain query is a plain SELECT … WHERE status = 'pending' with no row-level locking, no claim/lease, and no leader election around Submitter.Start.

When the API server runs as more than one instance (the expected deployment shape once we scale horizontally), every instance:

  • Selects the same batch of pending rows on each tick.
  • Issues a Canton TransferFrom gRPC call for every one of them in parallel with the other instances.
  • Races to write the outcome back to Postgres.

Current safety net (why nothing is corrupted today)

  • CompleteMempoolEntry and FailMempoolEntry both guard with WHERE status = 'pending', so the second writer is a silent no-op. So the row can't be flipped from completed to failed or vice versa.
  • Canton command-id idempotency on the tx hash means duplicate submissions resolve to one Canton commit

Why it still matters

  • Wasted Canton RPCs. N instances ⇒ ~N× the gRPC traffic and ~N× the load on Canton for the same backlog.

Goal
Each pending row should be processed by exactly one submitter instance per attempt. Transient failures must remain retryable (idempotent under command-id).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: EnhancementAdded to issues and PRs when a change includes improvements or optimizations.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions