Skip to content

feat(ledger): Go PgLedger pgxpool impl + executor wiring (PR-2 of #95)#122

Merged
0xfandom merged 6 commits into
developfrom
feat/trade-ledger-go-impl
May 5, 2026
Merged

feat(ledger): Go PgLedger pgxpool impl + executor wiring (PR-2 of #95)#122
0xfandom merged 6 commits into
developfrom
feat/trade-ledger-go-impl

Conversation

@0xfandom

@0xfandom 0xfandom commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

PR-2 of the trade-ledger plan. Adds the Go side of the writer surface: PgLedger on pgxpool, executor wiring on bundle submission and per-builder result handling, inline pnl_daily roll-up, and aether_ledger_* metrics that mirror the Rust side exactly. After this lands, running aether against a fork or staging produces row trails in bundles, inclusion_results, and pnl_daily — closing the practical "DB integration on develop" goal.

PR-3 (CI Postgres + counter-vs-row reconciliation test) is intentionally not in scope per the agreed-to minimum-viable plan.

Single commit

39492a2 feat(ledger): Go PgLedger pgxpool impl + executor wiring

What landed

internal/db/ledger_pg.go (new, 232 LOC)

  • PgLedger on pgxpool. Hot-path enqueue is non-blocking try-send onto a bounded chan (cap 1024). Dispatcher goroutine drains and fans out via a Semaphore(8) so the pgx pool runs at capacity without acquire-queueing.
  • Saturation drops the row and bumps aether_ledger_drops_total{op} — never blocks the executor.
  • 2 s connect timeout: misconfigured DATABASE_URL fails fast and LedgerFromEnv falls back to NoopLedger.
  • Idempotent inserts on bundles + ON CONFLICT upsert on inclusion_results + delta-accumulation upsert on pnl_daily (multiple writers can contribute to the same day without lost updates; NUMERIC(78,0) preserves U256 economics losslessly).
  • Close() for graceful shutdown drains in-flight writes.

internal/db/metrics.go (new, 51 LOC)

  • LedgerMetrics registered against the default Prometheus registry. Names byte-identical to the Rust side:
    • aether_ledger_writes_total{op, result}
    • aether_ledger_drops_total{op}
    • aether_ledger_queue_depth
    • aether_ledger_write_latency_ms{op}

internal/db/ledger.go (extended)

  • ArbIDNamespace + BundleIDNamespace constants. ArbIDNamespace is byte-identical to the Rust ARB_ID_NAMESPACE so log↔DB join keys are symmetric across the gRPC boundary; a TestArbIDNamespaceMatchesRust pin guards drift.
  • ArbIDFromOppID(string) → uuid.UUID and BundleIDFor(uuid, block) → uuid.UUID for deterministic ids. Same (arb, block) always produces the same bundle_id so resubmission ON CONFLICT is naturally idempotent.

cmd/executor/main.go (wired)

  • LedgerFromEnv at startup, deferred Close() on shutdown.
  • processArb gains db.Ledger param. After bundle build, derives arb_db_id + bundle_id deterministically and emits both as structured log fields on the arb submitted line so grep <id> logs/* | psql ... WHERE arb_id = ... works straight from logs.
  • Shadow path persists a bundles row with IsShadow=true (no builders).
  • Live path persists bundles + per-builder inclusion_results. Critical semantic note (in code comment): Included here reflects builder acceptance for next-block inclusion, not on-chain inclusion — the future GetBundleStats poll loop will UPSERT the same (bundle_id, builder) row with included_block + landed_tx_hash populated.
  • pnl_daily inline roll-up: bundle_count bumps every submit; on builder acceptance realized_profit_wei + inclusion_count accumulate. ~30 LOC to keep the table populated during fork / staging runs without a separate cron.

Tests

  • All 14 processArb + 2 consumeArbStream call sites in *_test.go updated to pass db.NewNoopLedger().
  • New deterministic-id tests pin ArbIDNamespace byte-equality with Rust and the determinism / variance contract on ArbIDFromOppID / BundleIDFor.
  • go test ./... 100% green.

Cargo / go.mod

  • Adds github.com/jackc/pgx/v5/pgxpool.

Acceptance criteria progress (#95)

# Criterion After this PR
5 Rust access layer (sqlx::PgPool) ✅ (PR #119)
6 Go access layer (pgxpool)

Criteria 8 (counter-vs-row reconciliation) and 9 (CI Postgres) intentionally deferred per the minimum-viable plan.

Out of scope

  • GetBundleStats poll loop that UPSERTs inclusion_results with on-chain truth. Schema already supports the upsert; this PR's writes are submission-time. Followup.
  • CI Postgres service container + integration tests + counter-vs-row reconciliation. Tracked as future PR-3.
  • Chain-backfill reconciliation worker (drives the dead-code Rust update_inclusion).
  • Retention / partitioning on arbs + inclusion_results.

Local verification

  • go build ./... clean.
  • go vet ./... clean.
  • go test ./...: all green (executor + db + risk + config + pooldiscovery).

PR-2 of the trade-ledger plan. Mirrors the Rust PgLedger surface so a
unified /metrics scrape across both binaries surfaces a single set of
aether_ledger_* families, and brings 'rows on a fork run' from
half-functional (Rust side only) to end-to-end.

internal/db/ledger_pg.go — new
- PgLedger on pgxpool. Hot path is non-blocking try-send onto a bounded
  chan (cap 1024), drained by a dispatcher goroutine that fans out via
  a counting semaphore (limit 8) so the pgx pool runs at capacity
  without acquire-queueing. Saturation drops the row and bumps
  aether_ledger_drops_total{op}; never blocks the executor.
- 2 s connect timeout so a misconfigured DATABASE_URL fails fast and
  LedgerFromEnv falls back to NoopLedger instead of stalling boot.
- Idempotent INSERTs for bundles + ON CONFLICT upsert for inclusion +
  pnl_daily delta accumulation (multiple writers can contribute to the
  same day without lost updates; NUMERIC(78,0) preserves U256 economics
  losslessly).

internal/db/metrics.go — new
- LedgerMetrics over the default Prometheus registry. Names mirror the
  Rust LedgerMetrics::register surface exactly:
    aether_ledger_writes_total{op, result}
    aether_ledger_drops_total{op}
    aether_ledger_queue_depth
    aether_ledger_write_latency_ms{op}

internal/db/ledger.go — extended
- ArbIDNamespace + BundleIDNamespace constants. ArbIDNamespace is
  byte-identical to the Rust ARB_ID_NAMESPACE in
  crates/grpc-server/src/engine.rs so log↔DB join keys are symmetric
  across the gRPC boundary; a TestArbIDNamespaceMatchesRust pin guards
  drift.
- ArbIDFromOppID(string) -> uuid.UUID and BundleIDFor(uuid, block)
  -> uuid.UUID for deterministic ids. Same (arb, block) pair always
  produces the same bundle_id so resubmission ON CONFLICT is naturally
  idempotent.

cmd/executor/main.go — wired
- LedgerFromEnv constructed once at startup, deferred Close() flushes
  in-flight writes on shutdown.
- processArb gains a db.Ledger param. After bundle build, derives
  arb_db_id + bundle_id deterministically and emits both as structured
  fields on the submission log line so operators can grep id -> psql
  WHERE arb_id = ... straight from logs.
- Shadow path persists the bundle row (IsShadow=true, no builders).
- Live path persists bundle + per-builder inclusion rows. Critical
  semantic note: Included on these rows reflects builder *acceptance*
  for next-block inclusion, not on-chain inclusion — the future
  GetBundleStats poll loop UPSERTs the same (bundle_id, builder) row
  with included_block + landed_tx_hash populated.
- pnl_daily inline roll-up: bundle_count bumps every submit; on
  acceptance realized_profit_wei + inclusion_count accumulate. Adds
  ~30 LOC to keep the table populated during fork runs without a
  separate cron.

Tests
- Existing 14 processArb / 2 consumeArbStream call sites updated to
  pass db.NewNoopLedger().
- New deterministic-id tests pin the ArbIDNamespace and
  BundleIDFor / ArbIDFromOppID semantics.
- go test ./... all green.

go.mod — adds github.com/jackc/pgx/v5/pgxpool.
@vercel

vercel Bot commented May 5, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
aether Ready Ready Preview, Comment May 5, 2026 0:41am
aether-63xv Ready Ready Preview, Comment May 5, 2026 0:41am

@0xfandom

0xfandom commented May 5, 2026

Copy link
Copy Markdown
Collaborator Author

Self-review: alignment vs #95 + correctness on what landed

Reviewed locally before requesting external eyes. Posting here so the rationale travels with the PR.

Alignment matrix (#95 acceptance criteria)

# Criterion Status
1 docker compose up postgres healthcheck N/A — pre-existing
2 db_migrate.sh idempotent N/A — pre-existing
3 DATABASE_URL set → rows in arbs/bundles/inclusion_results Partial / broken (see CRIT 1 + 2)
4 DATABASE_URL unset → identical boot Met (ledger_pg.go:275-288, main.go:220-226)
5 Rust sqlx Landed in #119
6 Go pgxpool Met (ledger_pg.go:46-288) — surface correct, behaviour buggy
7 Wiring Met (main.go:315, 421-507)
8 Row counts ↔ Prom counters Deferred → PR-3
9 CI Postgres + -race integration Deferred → PR-3

Justified deferrals — and their consequences

  • Criteria 8 + 9 → PR-3: justified per the agreed minimum-viable plan, but the consequence is severe — the exact integration test being deferred would have caught CRIT 1 and CRIT 2 below on first contact with a real Postgres. Deferring reconciliation while merging an unverified writer is the worst-of-both outcome.
  • GetBundleStats poll loop deferred: justified — schema accommodates the future UPSERT. Consequence: inclusion_results.included here means builder acceptance for next-block inclusion, not on-chain landing. The partial index inclusion_results_included_block_idx WHERE included (migrations/0001_trade_ledger.sql:76) was clearly designed for on-chain truth, so any dashboard built between PR-2 and PR-3 will mis-report landed counts. Acknowledged in the PR comment at main.go:450-454; flagging the consequence, not the choice.
  • Chain-backfill / update_inclusion Rust dead code deferred: justified per [E2] Trade ledger: Postgres schema + access layer (Rust + Go) #95 out-of-scope. Consequence: included_block / landed_tx_hash stay NULL until that worker exists.
  • Retention / partitioning deferred: justified for PR-2. bundles.signed_tx_hex is unbounded TEXT — at ~200 bundles/min × multi-tx, crosses 10 GB inside two months. Track it.

CRITICAL — block merge

CRIT 1 — internal/db/ledger_pg.go:191-202 — JSONB column bound as text[]. bundles.builders is JSONB NOT NULL (migrations/0001_trade_ledger.sql:58) but the Go insert binds b.Builders ([]string) directly. pgx v5 encodes []string as Postgres text[]. Every live-mode insert will fail with column "builders" is of type jsonb but expression is of type text[]. Shadow mode dodges this only because shadow passes Builders: nil (main.go:428). Fix: json.Marshal(b.Builders) and bind []byte, or change the column to TEXT[]. Pick one.

CRIT 2 — cmd/executor/main.go:421, 460 vs migrations/0001_trade_ledger.sql:52 — FK race against arbs. bundles.arb_id is NOT NULL REFERENCES arbs(arb_id). Rust writes arbs via its own async channel; Go writes bundles via a separate async channel. There is no happens-before between them. Failure modes: (a) shadow runs without a Rust ledger writer → every InsertBundle violates FK; (b) live under load → the Go writer can drain a bundle row before the Rust writer drains the corresponding arb row → intermittent FK violations surfacing only as aether_ledger_writes_total{op="insert_bundle",result="err"}. Fix options: insert the arbs row from the Go side too with ON CONFLICT (arb_id) DO NOTHING (the executor has all the data on the gRPC ValidatedArb), or drop the FK and rely on application-level join discipline. Former is cleaner.

HIGH

H1 — internal/db/ledger_pg.go:100-104Close() does not actually drain. Order in cmd/executor/main.go is: cancel()wg.Wait() (executor goroutines) → deferred pg.Close(). By the time Close runs, the outer ctx is already cancelled, and runOne (:158-183) uses that same ctx — so every drained op fails with context.Canceled. The "drains in-flight writes" doc comment at :98-99 is wrong. Fix: give the writer pool an independent ctx with a hard deadline (e.g. 5s) used inside runOne, separate from the dispatcher's cancellation signal.

H2 — internal/db/ledger_pg.go:217-232 — UPSERT clobbers future poll results on resubmit. The ON CONFLICT (bundle_id, builder) DO UPDATE SET included = EXCLUDED.included, included_block = EXCLUDED.included_block, ... overwrites a previously poll-populated (included=true, included_block=N, landed_tx_hash=H) row with a fresh submission carrying included=false, NULL, NULL. BundleIDFor(arb, target_block) is deterministic so retries collide by design. Fix: only UPDATE columns that are non-NULL in EXCLUDED, or move submission-time writes to a submission_attempts table and let the poll be the sole writer of inclusion_results.

H3 — cmd/executor/main.go:469-482included=true on builder acceptance pollutes the on-chain semantic. Schema's partial index makes the intent unambiguous. Until PR-3's poll lands, dashboards filtering WHERE included will count accepted-but-not-landed bundles as landed. Recommendation: write included = false initially and let the poll flip it, or add a separate accepted boolean now.

MEDIUM

M1 — internal/db/ledger_pg.go:138-156 — dispatcher leaks on ctx cancel. When ctx.Done() fires, the loop returns with the channel still open — future enqueuers fall to the default branch and bump drops (fine), but goroutines already past the sem acquire are not awaited; the deferred inflight.Wait() is unreachable from that exit path. Fix: restructure so the loop breaks to a label and inflight.Wait() is always reached.

M2 — internal/db/ledger_pg.go:120-131QueueDepth gauge is racy. Inc() runs after the send, Dec() runs before dispatch — under contention the gauge can transiently go negative. Cosmetic, but the Rust side likely uses the same metric name and cross-process scrapes will look weird. Use a periodic Set(len(l.ch)) instead.

M3 — internal/db/ledger_pg.go:235-258pnl_daily.bundle_count inflates on resubmits. Postgres serialises the row lock so the upsert itself is correct, but every submission bumps the counter — even when the deterministic bundle row was deduped in bundles via DO NOTHING. Fix: rebuild pnl_daily from a periodic INSERT … SELECT COUNT(*), or only bump when the bundle INSERT actually inserted (use RETURNING to detect).

M4 — internal/db/ledger_test.go:23-34 — namespace pin is correctly byte-identical to the Rust constant at crates/grpc-server/src/engine.rs:182-185. ✓ Gap: BundleIDNamespace has no Rust mirror so no equivalent pin is possible — add an explicit comment in ledger.go near the constant saying so, to future-proof against a Rust mirror drifting silently.

LOW / NIT

  • internal/db/metrics.go:43-47 — histogram buckets start at 0.5ms; p50 of a single-row INSERT on local Postgres is ~150-300µs. Add 0.1, 0.25 buckets.
  • internal/db/ledger_pg.go:158-183runOne tags context.Canceled as result="err". Distinguish result="cancelled" so shutdowns don't pollute the err-rate alert.
  • cmd/executor/main.go:497-507gasSpentApprox uses float64 to compute a value bound for NUMERIC(78,0). The column exists precisely so we don't drop wei. Compute in *big.Int from the start.
  • internal/db/ledger.go:1-7 — package doc still says "a pgxpool-backed implementation … land in a follow-up". Stale now that the impl ships in this PR.

What's already right

  • Connect-failure-degrades-to-Noop matches the Rust contract; LedgerFromEnv is the right shape (ledger_pg.go:275-288).
  • ArbIDFromOppID is byte-pinned to the Rust namespace with a meaningful failure message.
  • Bounded channel + dispatcher + 8-way semaphore is the right structural pattern; capacity / pool size / inflight are tuned in lockstep with a clear explanatory comment.

Verdict

Request changes. Architecture is right. Criterion 6 nominally met, but criterion 3 is effectively unmet — CRIT 1 kills every live bundles insert and CRIT 2 makes shadow inserts impossible without a Rust writer. The deferral of PR-3 integration tests is the direct enabler of both bugs slipping through. Fix CRIT 1 + 2 + H1 (Close() drain), then re-evaluate H2/H3 (or accept them as PR-3 work with explicit tracking issues).

0xfandom added 5 commits May 5, 2026 17:49
bundles.builders is JSONB (per migration 0001) but the previous bind
sent a Go []string straight through, which pgx maps to Postgres
text[] — every live insert would fail with 'column "builders" is of
type jsonb but expression is of type text[]'. Self-review caught it
before any user-facing run hit the path.

Marshal []string → []byte JSON via encoding/json and bind with an
explicit $8::jsonb cast so the wire format matches the column type
regardless of nil / empty Builders slices.

Refs PR #122 self-review (CRIT 1).
The Rust engine and Go executor each own one half of the trade-ledger
write surface and both run fire-and-forget through their own bounded
mpsc → writer task. There is no cross-process ordering between Rust's
'ARB PUBLISHED → insert_arb' and Go's 'bundle signed → insert_bundle';
under load the Go bundle insert lands first and the immediate FK check
fails. Self-review flagged this as CRIT 2 — it would surface as a
steady stream of bundle drops on every busy block and mask real ledger
health.

Drops bundles_arb_id_fkey via an idempotent migration so both writers
can race freely. Trade-off: a transient Rust connection blip can
produce an orphan bundle row; that is already metered as
aether_ledger_writes_total{op="insert_arb",result="err"}, and
downstream LEFT JOIN queries treat the NULL arb side as informative.
Comment in the migration spells out the future path (coordinator or
reconciliation worker) for re-adding the FK once cross-process
ordering exists.

Refs PR #122 self-review (CRIT 2).
Two issues caught in self-review:

1. Close() called wg.Wait() unconditionally. A wedged Postgres would
   hang executor shutdown forever. Now caps the drain at
   ledgerCloseDrainTimeout (5s), and on timeout cancels the writer
   dispatcher's context so any in-flight queries abort cleanly before
   pool.Close() runs.

2. dispatch() used ctx for both the dispatcher loop and per-op runs,
   shared with the caller's ctx. Caller cancellation would also kill
   in-flight queries, defeating Close()'s drain. Splits the dispatcher
   onto an independent context, owned by PgLedger and only cancelled
   from Close() on timeout.

3. Reorganises dispatch() so defer inflight.Wait() runs on every exit
   path (channel close, ctx cancel, future error branches). Previously
   a ctx-cancel return path skipped the wait and left dangling writer
   goroutines holding pool connections after the dispatcher had
   already reported wg.Done().

Refs PR #122 self-review (CRIT 3, MED 1).
… loop

`inclusion_results.included` is the on-chain outcome the schema's
`WHERE included` partial index expects, not the JSON-RPC ACK from a
builder. The previous wiring set Included=r.Success on the submit-time
row, which would make dashboards think every accepted bundle landed —
silently inflating inclusion-rate metrics by an order of magnitude.

Submit-time row now writes Included=false unconditionally with the
builder's error string preserved on failure. The future GetBundleStats
poll loop UPSERTs the same (bundle_id, builder) row with the real
on-chain truth (Included=true, IncludedBlock, LandedTxHash) once the
target block lands. Dashboards can distinguish "builder rejected"
(Error != NULL, Included=false) from "never landed" (Error=NULL,
Included=false) on day one.

Drops the included-branch from the pnl_daily inline roll-up for the
same reason: realized_profit_wei + inclusion_count are deferred to the
poll loop. bundle_count still bumps every submit so the table stays
useful for "how many bundles did we attempt" queries during fork runs.

Refs PR #122 self-review (HIGH 2).
…pace doc

Bundle of small self-review nits:

- enqueue() flips Inc/send order so the gauge can't briefly read
  negative when the dispatcher's Dec() lands between a slow enqueuer's
  send and its Inc(). Failed sends now Revert via Dec(), keeping the
  gauge consistent with actual channel depth.
- write_latency_ms histogram adds 0.1 / 0.25 ms buckets so local-
  Postgres inserts (~150-300 µs) show up at p50 instead of being
  flattened into the 0.5 ms bucket.
- gasSpentApprox switches from big.Float intermediate to direct big.Int
  arithmetic so the value round-trips into NUMERIC(78,0) without
  precision drift across cumulative pnl_daily updates.
- BundleIDNamespace gains a doc note explaining why no parallel Rust
  pin test exists today (Rust does not write bundles), and what to add
  if a future chain-backfill reconciliation worker on the Rust side
  starts producing bundle ids.

Refs PR #122 self-review (MED 2, MED 4, NIT histogram, NIT gas).
@0xfandom

0xfandom commented May 5, 2026

Copy link
Copy Markdown
Collaborator Author

Self-review round 1 addressed across 5 atomic commits (39492a2..93981c3). Triage notes — for each finding, marked fixed / deferred / noise.

Fixed (5 commits)

  • 8a08dae fix(ledger): marshal NewBundle.builders to JSONB bytes — CRIT 1. []stringtext[]; column is JSONB. Every live insert would have failed. Now json.Marshal + explicit $8::jsonb cast.
  • 15944e8 fix(ledger): drop bundles.arb_id FK via migration 0002_drop_bundles_arb_id_fk.sqlCRIT 2. Rust + Go writers race; FK fired immediately on insert. Migration drops the constraint with a comment spelling out the trade-off (orphan bundles surface as aether_ledger_writes_total{op="insert_arb",result="err"} and are informative on LEFT JOIN).
  • c300451 perf(ledger): bounded Close() drain + guaranteed dispatcher cleanup — CRIT 3 + MED 1. Close() now caps drain at 5 s, cancels the dispatcher's independent context on timeout. dispatch() uses defer inflight.Wait() so every exit path drains. Dispatcher runs on a context independent of the caller's so cancelling ctx no longer kills in-flight queries.
  • 92a764d fix(ledger): submit-time Included=false; defer on-chain truth to poll loop — HIGH 2. inclusion_results.included is the on-chain outcome the schema's WHERE included partial index expects, not builder ACK. Submit-time row writes Included=false with builder error preserved; future GetBundleStats poll UPSERTs the on-chain truth. Drops the included-branch from pnl_daily for the same reason.
  • 93981c3 chore(ledger): queue gauge race + sub-ms buckets + bigInt gas + namespace doc — MED 2 + MED 4 + 2 NITs. enqueue() Inc-before-send + revert-on-fail; histogram gains 0.1 / 0.25 ms buckets; gasSpentApprox is now bigInt-only (no float intermediate) so cumulative pnl_daily doesn't drift; BundleIDNamespace documents why no Rust pin test exists today.

Deferred (real but follow-up scope)

  • HIGH 1 UPSERT overwrites poll-populated columns on resubmit — only matters once GetBundleStats poll loop ships. No poll loop in this PR; resubmit-after-poll is a hypothetical race for now. Tracked.
  • MED 3 pnl_daily.bundle_count inflates on processArb retries — would need pgx RowsAffected() plumbing through the async path or a separate counter rebuild. Small impact (< few % drift under retry). Tracked.
  • NIT Tag context-cancel as separate metric reason — defer.

Noise (skipped)

  • "Package doc still claims follow-up impl" — already accurate enough for the trait + ships-with-impl status.

Local checks

  • go build ./... clean.
  • go vet ./... clean.
  • go test ./... all green (executor + db + risk + config + pooldiscovery + monitor).

Re-requesting review.

@0xfandom 0xfandom merged commit e2d2099 into develop May 5, 2026
3 checks passed
0xfandom added a commit that referenced this pull request May 5, 2026
bundles.builders is JSONB (per migration 0001) but the previous bind
sent a Go []string straight through, which pgx maps to Postgres
text[] — every live insert would fail with 'column "builders" is of
type jsonb but expression is of type text[]'. Self-review caught it
before any user-facing run hit the path.

Marshal []string → []byte JSON via encoding/json and bind with an
explicit $8::jsonb cast so the wire format matches the column type
regardless of nil / empty Builders slices.

Refs PR #122 self-review (CRIT 1).
0xfandom added a commit that referenced this pull request May 5, 2026
The Rust engine and Go executor each own one half of the trade-ledger
write surface and both run fire-and-forget through their own bounded
mpsc → writer task. There is no cross-process ordering between Rust's
'ARB PUBLISHED → insert_arb' and Go's 'bundle signed → insert_bundle';
under load the Go bundle insert lands first and the immediate FK check
fails. Self-review flagged this as CRIT 2 — it would surface as a
steady stream of bundle drops on every busy block and mask real ledger
health.

Drops bundles_arb_id_fkey via an idempotent migration so both writers
can race freely. Trade-off: a transient Rust connection blip can
produce an orphan bundle row; that is already metered as
aether_ledger_writes_total{op="insert_arb",result="err"}, and
downstream LEFT JOIN queries treat the NULL arb side as informative.
Comment in the migration spells out the future path (coordinator or
reconciliation worker) for re-adding the FK once cross-process
ordering exists.

Refs PR #122 self-review (CRIT 2).
0xfandom added a commit that referenced this pull request May 5, 2026
Two issues caught in self-review:

1. Close() called wg.Wait() unconditionally. A wedged Postgres would
   hang executor shutdown forever. Now caps the drain at
   ledgerCloseDrainTimeout (5s), and on timeout cancels the writer
   dispatcher's context so any in-flight queries abort cleanly before
   pool.Close() runs.

2. dispatch() used ctx for both the dispatcher loop and per-op runs,
   shared with the caller's ctx. Caller cancellation would also kill
   in-flight queries, defeating Close()'s drain. Splits the dispatcher
   onto an independent context, owned by PgLedger and only cancelled
   from Close() on timeout.

3. Reorganises dispatch() so defer inflight.Wait() runs on every exit
   path (channel close, ctx cancel, future error branches). Previously
   a ctx-cancel return path skipped the wait and left dangling writer
   goroutines holding pool connections after the dispatcher had
   already reported wg.Done().

Refs PR #122 self-review (CRIT 3, MED 1).
0xfandom added a commit that referenced this pull request May 5, 2026
… loop

`inclusion_results.included` is the on-chain outcome the schema's
`WHERE included` partial index expects, not the JSON-RPC ACK from a
builder. The previous wiring set Included=r.Success on the submit-time
row, which would make dashboards think every accepted bundle landed —
silently inflating inclusion-rate metrics by an order of magnitude.

Submit-time row now writes Included=false unconditionally with the
builder's error string preserved on failure. The future GetBundleStats
poll loop UPSERTs the same (bundle_id, builder) row with the real
on-chain truth (Included=true, IncludedBlock, LandedTxHash) once the
target block lands. Dashboards can distinguish "builder rejected"
(Error != NULL, Included=false) from "never landed" (Error=NULL,
Included=false) on day one.

Drops the included-branch from the pnl_daily inline roll-up for the
same reason: realized_profit_wei + inclusion_count are deferred to the
poll loop. bundle_count still bumps every submit so the table stays
useful for "how many bundles did we attempt" queries during fork runs.

Refs PR #122 self-review (HIGH 2).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant