Durable cross-instance idempotency + wait-then-replay + concurrency tests by boymak · Pull Request #26 · niceberginc/tollgate

boymak · 2026-06-22T13:22:49Z

Why

Toolgate's pitch is execution-layer idempotency: payment rails answer "did payment happen?", Toolgate answers "what should happen to the paid action?". The code already had the right bones — an atomic claim/lease/complete/fail state machine (idempotency.ts) and a rail-aware recovery state machine (recovery.ts). But three gaps meant the runtime couldn't prove the headline property ("N parallel duplicates → 1 charge, 1 provider call") on a real deployment:

The atomic claim was process-local only. InMemoryIdempotencyStore defeats the double-charge race inside one Node process — but the API ships on Cloudflare Workers (api/), which is multi-instance by nature. Two workers each claim on their own in-memory map → both execute → double charge. The one thing that separates us from "just use Stripe idempotency" was the piece not yet built (our own KNOWN_LIMITATIONS.md said "Durable idempotency is future work.").
Concurrent duplicates were rejected, not replayed. A duplicate that hit an active lease returned an "in progress" error — so the "99 replay/wait" story was actually "99 rejections".
No concurrency test, and the idempotency integration test exercised an inline racy re-implementation (get-then-set), not the real atomic claim() path.

What this PR does

1. DbIdempotencyStore — durable, cross-instance (src/db-idempotency.ts)
SQL-backed (Cloudflare D1 / Turso / SQLite) via the same DbClient abstraction as DbLedger. Atomicity uses the exact idioms the codebase already trusts: INSERT OR IGNORE (claim-if-absent) + a single conditional UPDATE … WHERE (reclaim-if-stale), whose changes count decides the winner. Concurrent claimers across any number of workers collapse to one owner with no check-then-act race. Ships with schema + runMigrations(), exported from the index.

2. Wait-then-replay (src/toolgate.ts)
On a concurrent duplicate with an active lease, the runtime now waits for the in-flight execution and replays its cached result instead of erroring. This is what collapses N parallel identical calls into one provider call and one charge. Tunable via waitForInProgressMs (default 5000; set 0 to restore the previous fail-fast behaviour).

3. Concurrency tests against the real compiled runtime (src/__tests__/concurrency.test.mjs)
Imported from dist/ (not inlined), so they exercise the production claim() path:

50 parallel identical calls → 1 handler call, 1 charge, 50 identical results
two DbIdempotencyStore instances racing on one shared DB → exactly one winner
claim → complete → replay, fail → reclaim, non-owner complete() rejected, and lease-expiry semantics

Plus a .gitignore (node_modules/, dist/, logs) — these were untracked and easy to commit by accident.

Tests

New concurrency suite: 8/8 passing
Full suite: 208/208 passing (npm test), no regressions
npm run build / typecheck clean

Notes / follow-ups

DbIdempotencyStore is unit-tested against a faithful in-memory DbClient mock that models the atomic primitives; a real-DB integration test (D1/Turso) against live serialisation is a sensible next step.
Backwards compatible: defaults preserve existing single-process behaviour except that concurrent duplicates now replay instead of erroring (opt out with waitForInProgressMs: 0).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added support for durable idempotency across multi-instance deployments using a SQL database backend.
- Implemented wait-and-replay behavior for concurrent duplicate requests, with configurable timeout.
Documentation
- Updated production guidance for durable idempotency, including setup instructions and deployment recommendations.
Chores
- Updated .gitignore to exclude common build artifacts and logs.

…concurrency tests Closes the three gaps between the "paid-action runtime" thesis and what the code actually proves under concurrency: 1. Durable idempotency (DbIdempotencyStore) InMemoryIdempotencyStore only coordinates within one process — which means the atomic claim does NOT defeat the double-charge race on multi-instance / serverless deployments (e.g. the Cloudflare Workers API). Adds a SQL-backed store (D1 / Turso / SQLite) using the same atomic primitives as DbLedger (INSERT OR IGNORE + conditional UPDATE … WHERE), so concurrent claimers across any number of workers collapse to a single owner with no check-then-act race. Includes schema + runMigrations(). 2. Wait-then-replay for in-flight duplicates Previously a concurrent duplicate that hit an active lease was rejected with an "in progress" error. Now the runtime waits for the in-flight execution to finish and replays its cached result, so N parallel identical calls collapse to ONE provider call and ONE charge. Tunable via waitForInProgressMs (default 5000ms; 0 restores the old fail-fast behaviour). 3. Concurrency tests against the real compiled runtime New src/__tests__/concurrency.test.mjs (imported from dist/, not an inline re-implementation) proves: 50 parallel identical calls → 1 handler call, 1 charge, 50 identical results; two instances racing on one DB → exactly one winner; claim→complete→replay, fail→reclaim, and lease-expiry semantics. Also adds a .gitignore (node_modules/, dist/, logs) — these were untracked and easy to commit by accident. Full suite: 208/208 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-22T13:27:43Z

Warning

Review limit reached

@boymak, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 42 minutes and 39 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9f55145-fc22-4e55-9139-675fa2f0119a

📥 Commits

Reviewing files that changed from the base of the PR and between 150dfd0 and 9aa867b.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (5)

docs/KNOWN_LIMITATIONS.md
package.json
src/__tests__/_idempotency-claim-worker.mjs
src/__tests__/_sqlite-client.mjs
src/__tests__/db-idempotency.integration.test.mjs

📝 Walkthrough

Walkthrough

Adds DbIdempotencyStore, a SQL-backed durable idempotency store for multi-instance deployments using atomic INSERT OR IGNORE / conditional UPDATE semantics. Extends ToolGate with a waitForInProgressMs option and a waitForCompletion polling loop that replays stored results when an in-flight duplicate resolves. Exports the new store publicly and updates documentation and tests.

Changes

Durable Idempotency and ToolGate Wait-and-Replay

Layer / File(s)	Summary
Config option, DB schema, and row mapping `src/types.ts`, `src/db-idempotency.ts`	`ToolGateConfig` gains `waitForInProgressMs?: number`; `DB_IDEMPOTENCY_SCHEMA` defines the `tg_idempotency` table and `expires_at` index; `IdempotencyRow` interface and `rowToRecord` map raw SQL rows to `IdempotencyRecord`.
`DbIdempotencyStore` implementation `src/db-idempotency.ts`	Implements atomic three-phase `claim` (INSERT OR IGNORE → conditional UPDATE reclaim → SELECT), plus `peek`, owner-gated `heartbeat`, terminal `complete`/`fail`, internal `read` helper, and `runMigrations` static method.
`ToolGate` wait-and-replay for in-progress collisions `src/toolgate.ts`	Initializes `waitForInProgressMs` (default `5000`) in the private config; changes `in_progress` collision path in `executeTool` to poll via `waitForCompletion` and replay with `handleDuplicate` on completion; adds `sleep` helper.
Public exports, docs, and build config `src/index.ts`, `docs/KNOWN_LIMITATIONS.md`, `package.json`, `.gitignore`	Re-exports `DbIdempotencyStore` and `DB_IDEMPOTENCY_SCHEMA`; updates `KNOWN_LIMITATIONS.md` with multi-instance SQL guidance and migration snippet; updates test script and adds standard `.gitignore` entries.
MockDb, MockStmt, and concurrency test suite `src/__tests__/concurrency.test.mjs`	Adds `MockDb`/`MockStmt` simulating atomic SQL primitives in-memory; tests 50 parallel in-memory calls (one handler, one charge); unit tests `DbIdempotencyStore` claim/complete/fail/reclaim/lease semantics; end-to-end test through `ToolGate` with 25 parallel calls collapsing to one charge.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ToolGate
  participant DbIdempotencyStore
  participant SQL as tg_idempotency (SQL)

  rect rgba(100, 149, 237, 0.5)
    note over Client,SQL: First instance claims the key
    Client->>ToolGate: executeTool(key, input)
    ToolGate->>DbIdempotencyStore: claim(key, ownerId, leaseMs)
    DbIdempotencyStore->>SQL: INSERT OR IGNORE
    SQL-->>DbIdempotencyStore: 1 row inserted
    DbIdempotencyStore-->>ToolGate: claimed
    ToolGate->>ToolGate: run handler → complete(key, result)
    ToolGate->>DbIdempotencyStore: complete(key, ownerId, result)
    DbIdempotencyStore->>SQL: UPDATE status=completed
  end

  rect rgba(144, 238, 144, 0.5)
    note over Client,SQL: Second instance hits in_progress, waits and replays
    Client->>ToolGate: executeTool(key, input)
    ToolGate->>DbIdempotencyStore: claim(key, ownerId2, leaseMs)
    DbIdempotencyStore->>SQL: INSERT OR IGNORE → 0 rows, UPDATE fails
    DbIdempotencyStore->>SQL: SELECT → in_progress
    DbIdempotencyStore-->>ToolGate: in_progress
    loop poll until completed or deadline
      ToolGate->>DbIdempotencyStore: peek(key)
      DbIdempotencyStore->>SQL: SELECT
      SQL-->>DbIdempotencyStore: completed record
      DbIdempotencyStore-->>ToolGate: completed record
    end
    ToolGate->>ToolGate: handleDuplicate(record)
    ToolGate-->>Client: replayed result
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

tkorkmazeth/toolgate#15: Prior refactor of idempotency to lease-based claim/peek semantics and ToolGate duplicate handling — the direct precursor to this PR's DbIdempotencyStore and wait-and-replay implementation.

Poem

🐰 Hop, hop, the store is durable now,
No more single-instance vow!
SQL claims the key with care,
INSERT OR IGNORE — oh so fair.
Wait and replay, no duplicate woe,
Atomic leaps wherever you go! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the three main deliverables: durable cross-instance idempotency, wait-then-replay behavior, and concurrency tests.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

docs/KNOWN_LIMITATIONS.md (1)
22-30: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add waitForInProgressMs configuration to the example or document its behavior.

The example code does not show the waitForInProgressMs option mentioned in the PR objectives ("tunable via waitForInProgressMs, default 5000ms"). Users following this example won't learn about the wait-and-replay behavior or how to customize it (e.g., set to 0 to restore fail-fast behavior).

Consider adding a note like:

Show waitForInProgressMs: 5000 as an optional example in the ToolGate config, or

Add a brief explanation that duplicate in-flight calls are replayed by default after a configurable wait.
📝 Proposed addition to clarify behavior

Append a paragraph after the code example (after line 30):
By default, when a duplicate request arrives while an identical call is in progress,
ToolGate waits up to 5 seconds for the in-flight execution to complete, then replays
its result. To restore the previous fail-fast behavior, set `waitForInProgressMs: 0`
when constructing `ToolGate`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/KNOWN_LIMITATIONS.md` around lines 22 - 30, The ToolGate configuration
example does not document the waitForInProgressMs option that controls how long
to wait for in-flight duplicate requests before replaying results. Add a
clarifying paragraph after the code example explaining that by default
waitForInProgressMs is set to 5000ms and will wait for identical in-flight calls
to complete and replay their result, and that users can set waitForInProgressMs
to 0 to restore the previous fail-fast behavior where duplicate requests
immediately fail.
src/toolgate.ts (1)
180-205: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider a more accurate error message for non-timeout cases.

The error message always says "still in progress" but waitForCompletion returns null for three distinct cases: timeout, record failed, or record vanished. The message is accurate only for timeouts.

This is minor since the retry-safe behavior is correct in all cases, and the caller (or agent) can safely retry regardless.
💡 Optional: Differentiate error messages
-      // Timed out, or the in-flight execution failed/vanished. Surface a
-      // clear signal; the caller (or agent) can safely retry.
+      // Timed out, or the in-flight execution failed/vanished.
+      // Surface a clear signal; the caller (or agent) can safely retry.
       return {
         success: false,
         output: {
-          error:
-            "Duplicate request: a previous execution is still in progress.",
+          error: "Duplicate request: the previous execution did not complete successfully. Please retry.",
           idempotencyKey,
         },
       };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/toolgate.ts` around lines 180 - 205, The error message returned when
waitForCompletion returns a falsy value is not accurate for all scenarios since
this method can return null for three distinct reasons: timeout, the previous
execution failed, or the record vanished. Instead of always stating "a previous
execution is still in progress", modify the code to differentiate between these
cases and provide more accurate error messages for each scenario. You may need
to examine what waitForCompletion returns and add logic to determine which case
occurred, then return the appropriate error message accordingly while
maintaining the existing retry-safe behavior.
src/__tests__/concurrency.test.mjs (1)
242-357: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider adding a heartbeat test.

The MockDb implements the heartbeat tag (lines 127-133), and the IdempotencyStore contract includes heartbeat() for lease extension. Adding a test case would improve coverage of the lease-extension flow.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/__tests__/concurrency.test.mjs` around lines 242 - 357, Add a new test
case within the "Concurrency: DbIdempotencyStore atomic claim" describe block to
verify the heartbeat lease-extension functionality. The test should create a
DbIdempotencyStore instance with MockDb, claim a record with a short lease
duration, call the heartbeat() method to extend the lease, and verify that the
lease is successfully extended and active (preventing reclaim by another
claimer). This will provide test coverage for the heartbeat() method which is
part of the IdempotencyStore contract but is not currently tested in the
concurrency test suite.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/KNOWN_LIMITATIONS.md`:
- Around line 22-30: The ToolGate configuration example does not document the
waitForInProgressMs option that controls how long to wait for in-flight
duplicate requests before replaying results. Add a clarifying paragraph after
the code example explaining that by default waitForInProgressMs is set to 5000ms
and will wait for identical in-flight calls to complete and replay their result,
and that users can set waitForInProgressMs to 0 to restore the previous
fail-fast behavior where duplicate requests immediately fail.

In `@src/__tests__/concurrency.test.mjs`:
- Around line 242-357: Add a new test case within the "Concurrency:
DbIdempotencyStore atomic claim" describe block to verify the heartbeat
lease-extension functionality. The test should create a DbIdempotencyStore
instance with MockDb, claim a record with a short lease duration, call the
heartbeat() method to extend the lease, and verify that the lease is
successfully extended and active (preventing reclaim by another claimer). This
will provide test coverage for the heartbeat() method which is part of the
IdempotencyStore contract but is not currently tested in the concurrency test
suite.

In `@src/toolgate.ts`:
- Around line 180-205: The error message returned when waitForCompletion returns
a falsy value is not accurate for all scenarios since this method can return
null for three distinct reasons: timeout, the previous execution failed, or the
record vanished. Instead of always stating "a previous execution is still in
progress", modify the code to differentiate between these cases and provide more
accurate error messages for each scenario. You may need to examine what
waitForCompletion returns and add logic to determine which case occurred, then
return the appropriate error message accordingly while maintaining the existing
retry-safe behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4aa5c7f7-3cb1-4a3b-86af-f2c2c3b4c267

📥 Commits

Reviewing files that changed from the base of the PR and between 70f7bff and 150dfd0.

📒 Files selected for processing (8)

.gitignore
docs/KNOWN_LIMITATIONS.md
package.json
src/__tests__/concurrency.test.mjs
src/db-idempotency.ts
src/index.ts
src/toolgate.ts
src/types.ts

…SQLite; Turso-bound) Adds integration coverage that runs DbIdempotencyStore against a REAL SQL engine (better-sqlite3), complementing the in-memory mock in concurrency.test: - Level 1 — lifecycle against a real query planner: claim→complete→replay, INSERT OR IGNORE blocks an active lease, failed records reclaimable, non-owner complete rejected, lease-expiry semantics. - Level 2 — true multi-instance race: 8 SEPARATE OS processes open one shared SQLite file (WAL) and race the same key; asserts exactly one winner. The guarantee is decided by the DB write lock, not JS event-loop ordering — this is the actual "multi-instance safe" proof. SQLite is the local/CI engine; the same DbClient adapter shape maps to Turso/libsql (@libsql/client), the intended production target (documented in KNOWN_LIMITATIONS). better-sqlite3 added as a devDependency; the suite skips gracefully if the native module is unavailable. Full suite: 213/213 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

boymak · 2026-06-22T13:42:42Z

Follow-up: real-DB integration tests (the follow-up noted above is now done)

Added src/__tests__/db-idempotency.integration.test.mjs — runs DbIdempotencyStore against a real SQL engine (better-sqlite3), not just the in-memory mock:

Level 1 — real query planner: claim→complete→replay, INSERT OR IGNORE blocks an active lease, failed records reclaimable, non-owner complete() rejected, lease-expiry semantics.
Level 2 — true multi-instance race: 8 separate OS processes open one shared SQLite file (WAL) and race the same key → exactly one winner. The guarantee is decided by the DB write lock, not JS event-loop ordering — this is the actual "multi-instance safe" proof.

SQLite is the local/CI engine; the same DbClient adapter maps to Turso/libsql (@libsql/client) as the intended production target (documented in KNOWN_LIMITATIONS.md). better-sqlite3 added as a devDependency; the suite skips gracefully if the native module is unavailable.

Full suite now 213/213 passing.

tkorkmazeth · 2026-06-22T14:09:56Z

thanks for also adding test cases tho, nice pr

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

tkorkmazeth merged commit ae3427b into niceberginc:main Jun 22, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Durable cross-instance idempotency + wait-then-replay + concurrency tests#26

Durable cross-instance idempotency + wait-then-replay + concurrency tests#26
tkorkmazeth merged 2 commits into
niceberginc:mainfrom
boymak:feat/durable-idempotency-and-concurrency

boymak commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

boymak commented Jun 22, 2026

Uh oh!

tkorkmazeth commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

boymak commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What this PR does

Tests

Notes / follow-ups

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

boymak commented Jun 22, 2026

Follow-up: real-DB integration tests (the follow-up noted above is now done)

Uh oh!

tkorkmazeth commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

boymak commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading