Skip to content

Durable cross-instance idempotency + wait-then-replay + concurrency tests#26

Merged
tkorkmazeth merged 2 commits into
niceberginc:mainfrom
boymak:feat/durable-idempotency-and-concurrency
Jun 22, 2026
Merged

Durable cross-instance idempotency + wait-then-replay + concurrency tests#26
tkorkmazeth merged 2 commits into
niceberginc:mainfrom
boymak:feat/durable-idempotency-and-concurrency

Conversation

@boymak

@boymak boymak commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Why

Toolgate's pitch is execution-layer idempotency: payment rails answer "did payment happen?", Toolgate answers "what should happen to the paid action?". The code already had the right bones — an atomic claim/lease/complete/fail state machine (idempotency.ts) and a rail-aware recovery state machine (recovery.ts). But three gaps meant the runtime couldn't prove the headline property ("N parallel duplicates → 1 charge, 1 provider call") on a real deployment:

  1. The atomic claim was process-local only. InMemoryIdempotencyStore defeats the double-charge race inside one Node process — but the API ships on Cloudflare Workers (api/), which is multi-instance by nature. Two workers each claim on their own in-memory map → both execute → double charge. The one thing that separates us from "just use Stripe idempotency" was the piece not yet built (our own KNOWN_LIMITATIONS.md said "Durable idempotency is future work.").
  2. Concurrent duplicates were rejected, not replayed. A duplicate that hit an active lease returned an "in progress" error — so the "99 replay/wait" story was actually "99 rejections".
  3. No concurrency test, and the idempotency integration test exercised an inline racy re-implementation (get-then-set), not the real atomic claim() path.

What this PR does

1. DbIdempotencyStore — durable, cross-instance (src/db-idempotency.ts)
SQL-backed (Cloudflare D1 / Turso / SQLite) via the same DbClient abstraction as DbLedger. Atomicity uses the exact idioms the codebase already trusts: INSERT OR IGNORE (claim-if-absent) + a single conditional UPDATE … WHERE (reclaim-if-stale), whose changes count decides the winner. Concurrent claimers across any number of workers collapse to one owner with no check-then-act race. Ships with schema + runMigrations(), exported from the index.

2. Wait-then-replay (src/toolgate.ts)
On a concurrent duplicate with an active lease, the runtime now waits for the in-flight execution and replays its cached result instead of erroring. This is what collapses N parallel identical calls into one provider call and one charge. Tunable via waitForInProgressMs (default 5000; set 0 to restore the previous fail-fast behaviour).

3. Concurrency tests against the real compiled runtime (src/__tests__/concurrency.test.mjs)
Imported from dist/ (not inlined), so they exercise the production claim() path:

  • 50 parallel identical calls → 1 handler call, 1 charge, 50 identical results
  • two DbIdempotencyStore instances racing on one shared DB → exactly one winner
  • claim → complete → replay, fail → reclaim, non-owner complete() rejected, and lease-expiry semantics

Plus a .gitignore (node_modules/, dist/, logs) — these were untracked and easy to commit by accident.

Tests

  • New concurrency suite: 8/8 passing
  • Full suite: 208/208 passing (npm test), no regressions
  • npm run build / typecheck clean

Notes / follow-ups

  • DbIdempotencyStore is unit-tested against a faithful in-memory DbClient mock that models the atomic primitives; a real-DB integration test (D1/Turso) against live serialisation is a sensible next step.
  • Backwards compatible: defaults preserve existing single-process behaviour except that concurrent duplicates now replay instead of erroring (opt out with waitForInProgressMs: 0).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added support for durable idempotency across multi-instance deployments using a SQL database backend.
    • Implemented wait-and-replay behavior for concurrent duplicate requests, with configurable timeout.
  • Documentation

    • Updated production guidance for durable idempotency, including setup instructions and deployment recommendations.
  • Chores

    • Updated .gitignore to exclude common build artifacts and logs.

…concurrency tests

Closes the three gaps between the "paid-action runtime" thesis and what the
code actually proves under concurrency:

1. Durable idempotency (DbIdempotencyStore)
   InMemoryIdempotencyStore only coordinates within one process — which means
   the atomic claim does NOT defeat the double-charge race on multi-instance /
   serverless deployments (e.g. the Cloudflare Workers API). Adds a SQL-backed
   store (D1 / Turso / SQLite) using the same atomic primitives as DbLedger
   (INSERT OR IGNORE + conditional UPDATE … WHERE), so concurrent claimers
   across any number of workers collapse to a single owner with no
   check-then-act race. Includes schema + runMigrations().

2. Wait-then-replay for in-flight duplicates
   Previously a concurrent duplicate that hit an active lease was rejected with
   an "in progress" error. Now the runtime waits for the in-flight execution to
   finish and replays its cached result, so N parallel identical calls collapse
   to ONE provider call and ONE charge. Tunable via waitForInProgressMs
   (default 5000ms; 0 restores the old fail-fast behaviour).

3. Concurrency tests against the real compiled runtime
   New src/__tests__/concurrency.test.mjs (imported from dist/, not an inline
   re-implementation) proves: 50 parallel identical calls → 1 handler call,
   1 charge, 50 identical results; two instances racing on one DB → exactly one
   winner; claim→complete→replay, fail→reclaim, and lease-expiry semantics.

Also adds a .gitignore (node_modules/, dist/, logs) — these were untracked and
easy to commit by accident.

Full suite: 208/208 passing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@boymak, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 42 minutes and 39 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9f55145-fc22-4e55-9139-675fa2f0119a

📥 Commits

Reviewing files that changed from the base of the PR and between 150dfd0 and 9aa867b.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (5)
  • docs/KNOWN_LIMITATIONS.md
  • package.json
  • src/__tests__/_idempotency-claim-worker.mjs
  • src/__tests__/_sqlite-client.mjs
  • src/__tests__/db-idempotency.integration.test.mjs
📝 Walkthrough

Walkthrough

Adds DbIdempotencyStore, a SQL-backed durable idempotency store for multi-instance deployments using atomic INSERT OR IGNORE / conditional UPDATE semantics. Extends ToolGate with a waitForInProgressMs option and a waitForCompletion polling loop that replays stored results when an in-flight duplicate resolves. Exports the new store publicly and updates documentation and tests.

Changes

Durable Idempotency and ToolGate Wait-and-Replay

Layer / File(s) Summary
Config option, DB schema, and row mapping
src/types.ts, src/db-idempotency.ts
ToolGateConfig gains waitForInProgressMs?: number; DB_IDEMPOTENCY_SCHEMA defines the tg_idempotency table and expires_at index; IdempotencyRow interface and rowToRecord map raw SQL rows to IdempotencyRecord.
DbIdempotencyStore implementation
src/db-idempotency.ts
Implements atomic three-phase claim (INSERT OR IGNORE → conditional UPDATE reclaim → SELECT), plus peek, owner-gated heartbeat, terminal complete/fail, internal read helper, and runMigrations static method.
ToolGate wait-and-replay for in-progress collisions
src/toolgate.ts
Initializes waitForInProgressMs (default 5000) in the private config; changes in_progress collision path in executeTool to poll via waitForCompletion and replay with handleDuplicate on completion; adds sleep helper.
Public exports, docs, and build config
src/index.ts, docs/KNOWN_LIMITATIONS.md, package.json, .gitignore
Re-exports DbIdempotencyStore and DB_IDEMPOTENCY_SCHEMA; updates KNOWN_LIMITATIONS.md with multi-instance SQL guidance and migration snippet; updates test script and adds standard .gitignore entries.
MockDb, MockStmt, and concurrency test suite
src/__tests__/concurrency.test.mjs
Adds MockDb/MockStmt simulating atomic SQL primitives in-memory; tests 50 parallel in-memory calls (one handler, one charge); unit tests DbIdempotencyStore claim/complete/fail/reclaim/lease semantics; end-to-end test through ToolGate with 25 parallel calls collapsing to one charge.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ToolGate
  participant DbIdempotencyStore
  participant SQL as tg_idempotency (SQL)

  rect rgba(100, 149, 237, 0.5)
    note over Client,SQL: First instance claims the key
    Client->>ToolGate: executeTool(key, input)
    ToolGate->>DbIdempotencyStore: claim(key, ownerId, leaseMs)
    DbIdempotencyStore->>SQL: INSERT OR IGNORE
    SQL-->>DbIdempotencyStore: 1 row inserted
    DbIdempotencyStore-->>ToolGate: claimed
    ToolGate->>ToolGate: run handler → complete(key, result)
    ToolGate->>DbIdempotencyStore: complete(key, ownerId, result)
    DbIdempotencyStore->>SQL: UPDATE status=completed
  end

  rect rgba(144, 238, 144, 0.5)
    note over Client,SQL: Second instance hits in_progress, waits and replays
    Client->>ToolGate: executeTool(key, input)
    ToolGate->>DbIdempotencyStore: claim(key, ownerId2, leaseMs)
    DbIdempotencyStore->>SQL: INSERT OR IGNORE → 0 rows, UPDATE fails
    DbIdempotencyStore->>SQL: SELECT → in_progress
    DbIdempotencyStore-->>ToolGate: in_progress
    loop poll until completed or deadline
      ToolGate->>DbIdempotencyStore: peek(key)
      DbIdempotencyStore->>SQL: SELECT
      SQL-->>DbIdempotencyStore: completed record
      DbIdempotencyStore-->>ToolGate: completed record
    end
    ToolGate->>ToolGate: handleDuplicate(record)
    ToolGate-->>Client: replayed result
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • tkorkmazeth/toolgate#15: Prior refactor of idempotency to lease-based claim/peek semantics and ToolGate duplicate handling — the direct precursor to this PR's DbIdempotencyStore and wait-and-replay implementation.

Poem

🐰 Hop, hop, the store is durable now,
No more single-instance vow!
SQL claims the key with care,
INSERT OR IGNORE — oh so fair.
Wait and replay, no duplicate woe,
Atomic leaps wherever you go! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main deliverables: durable cross-instance idempotency, wait-then-replay behavior, and concurrency tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
docs/KNOWN_LIMITATIONS.md (1)

22-30: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add waitForInProgressMs configuration to the example or document its behavior.

The example code does not show the waitForInProgressMs option mentioned in the PR objectives ("tunable via waitForInProgressMs, default 5000ms"). Users following this example won't learn about the wait-and-replay behavior or how to customize it (e.g., set to 0 to restore fail-fast behavior).

Consider adding a note like:

  • Show waitForInProgressMs: 5000 as an optional example in the ToolGate config, or
  • Add a brief explanation that duplicate in-flight calls are replayed by default after a configurable wait.
📝 Proposed addition to clarify behavior

Append a paragraph after the code example (after line 30):

By default, when a duplicate request arrives while an identical call is in progress,
ToolGate waits up to 5 seconds for the in-flight execution to complete, then replays
its result. To restore the previous fail-fast behavior, set `waitForInProgressMs: 0`
when constructing `ToolGate`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/KNOWN_LIMITATIONS.md` around lines 22 - 30, The ToolGate configuration
example does not document the waitForInProgressMs option that controls how long
to wait for in-flight duplicate requests before replaying results. Add a
clarifying paragraph after the code example explaining that by default
waitForInProgressMs is set to 5000ms and will wait for identical in-flight calls
to complete and replay their result, and that users can set waitForInProgressMs
to 0 to restore the previous fail-fast behavior where duplicate requests
immediately fail.
src/toolgate.ts (1)

180-205: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider a more accurate error message for non-timeout cases.

The error message always says "still in progress" but waitForCompletion returns null for three distinct cases: timeout, record failed, or record vanished. The message is accurate only for timeouts.

This is minor since the retry-safe behavior is correct in all cases, and the caller (or agent) can safely retry regardless.

💡 Optional: Differentiate error messages
-      // Timed out, or the in-flight execution failed/vanished. Surface a
-      // clear signal; the caller (or agent) can safely retry.
+      // Timed out, or the in-flight execution failed/vanished.
+      // Surface a clear signal; the caller (or agent) can safely retry.
       return {
         success: false,
         output: {
-          error:
-            "Duplicate request: a previous execution is still in progress.",
+          error: "Duplicate request: the previous execution did not complete successfully. Please retry.",
           idempotencyKey,
         },
       };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/toolgate.ts` around lines 180 - 205, The error message returned when
waitForCompletion returns a falsy value is not accurate for all scenarios since
this method can return null for three distinct reasons: timeout, the previous
execution failed, or the record vanished. Instead of always stating "a previous
execution is still in progress", modify the code to differentiate between these
cases and provide more accurate error messages for each scenario. You may need
to examine what waitForCompletion returns and add logic to determine which case
occurred, then return the appropriate error message accordingly while
maintaining the existing retry-safe behavior.
src/__tests__/concurrency.test.mjs (1)

242-357: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider adding a heartbeat test.

The MockDb implements the heartbeat tag (lines 127-133), and the IdempotencyStore contract includes heartbeat() for lease extension. Adding a test case would improve coverage of the lease-extension flow.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/__tests__/concurrency.test.mjs` around lines 242 - 357, Add a new test
case within the "Concurrency: DbIdempotencyStore atomic claim" describe block to
verify the heartbeat lease-extension functionality. The test should create a
DbIdempotencyStore instance with MockDb, claim a record with a short lease
duration, call the heartbeat() method to extend the lease, and verify that the
lease is successfully extended and active (preventing reclaim by another
claimer). This will provide test coverage for the heartbeat() method which is
part of the IdempotencyStore contract but is not currently tested in the
concurrency test suite.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/KNOWN_LIMITATIONS.md`:
- Around line 22-30: The ToolGate configuration example does not document the
waitForInProgressMs option that controls how long to wait for in-flight
duplicate requests before replaying results. Add a clarifying paragraph after
the code example explaining that by default waitForInProgressMs is set to 5000ms
and will wait for identical in-flight calls to complete and replay their result,
and that users can set waitForInProgressMs to 0 to restore the previous
fail-fast behavior where duplicate requests immediately fail.

In `@src/__tests__/concurrency.test.mjs`:
- Around line 242-357: Add a new test case within the "Concurrency:
DbIdempotencyStore atomic claim" describe block to verify the heartbeat
lease-extension functionality. The test should create a DbIdempotencyStore
instance with MockDb, claim a record with a short lease duration, call the
heartbeat() method to extend the lease, and verify that the lease is
successfully extended and active (preventing reclaim by another claimer). This
will provide test coverage for the heartbeat() method which is part of the
IdempotencyStore contract but is not currently tested in the concurrency test
suite.

In `@src/toolgate.ts`:
- Around line 180-205: The error message returned when waitForCompletion returns
a falsy value is not accurate for all scenarios since this method can return
null for three distinct reasons: timeout, the previous execution failed, or the
record vanished. Instead of always stating "a previous execution is still in
progress", modify the code to differentiate between these cases and provide more
accurate error messages for each scenario. You may need to examine what
waitForCompletion returns and add logic to determine which case occurred, then
return the appropriate error message accordingly while maintaining the existing
retry-safe behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4aa5c7f7-3cb1-4a3b-86af-f2c2c3b4c267

📥 Commits

Reviewing files that changed from the base of the PR and between 70f7bff and 150dfd0.

📒 Files selected for processing (8)
  • .gitignore
  • docs/KNOWN_LIMITATIONS.md
  • package.json
  • src/__tests__/concurrency.test.mjs
  • src/db-idempotency.ts
  • src/index.ts
  • src/toolgate.ts
  • src/types.ts

…SQLite; Turso-bound)

Adds integration coverage that runs DbIdempotencyStore against a REAL SQL
engine (better-sqlite3), complementing the in-memory mock in concurrency.test:

- Level 1 — lifecycle against a real query planner: claim→complete→replay,
  INSERT OR IGNORE blocks an active lease, failed records reclaimable,
  non-owner complete rejected, lease-expiry semantics.
- Level 2 — true multi-instance race: 8 SEPARATE OS processes open one shared
  SQLite file (WAL) and race the same key; asserts exactly one winner. The
  guarantee is decided by the DB write lock, not JS event-loop ordering — this
  is the actual "multi-instance safe" proof.

SQLite is the local/CI engine; the same DbClient adapter shape maps to
Turso/libsql (@libsql/client), the intended production target (documented in
KNOWN_LIMITATIONS). better-sqlite3 added as a devDependency; the suite skips
gracefully if the native module is unavailable.

Full suite: 213/213 passing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@boymak

boymak commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up: real-DB integration tests (the follow-up noted above is now done)

Added src/__tests__/db-idempotency.integration.test.mjs — runs DbIdempotencyStore against a real SQL engine (better-sqlite3), not just the in-memory mock:

  • Level 1 — real query planner: claim→complete→replay, INSERT OR IGNORE blocks an active lease, failed records reclaimable, non-owner complete() rejected, lease-expiry semantics.
  • Level 2 — true multi-instance race: 8 separate OS processes open one shared SQLite file (WAL) and race the same key → exactly one winner. The guarantee is decided by the DB write lock, not JS event-loop ordering — this is the actual "multi-instance safe" proof.

SQLite is the local/CI engine; the same DbClient adapter maps to Turso/libsql (@libsql/client) as the intended production target (documented in KNOWN_LIMITATIONS.md). better-sqlite3 added as a devDependency; the suite skips gracefully if the native module is unavailable.

Full suite now 213/213 passing.

@tkorkmazeth

Copy link
Copy Markdown
Collaborator

thanks for also adding test cases tho, nice pr

@tkorkmazeth tkorkmazeth merged commit ae3427b into niceberginc:main Jun 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants