Durable cross-instance idempotency + wait-then-replay + concurrency tests#26
Conversation
…concurrency tests Closes the three gaps between the "paid-action runtime" thesis and what the code actually proves under concurrency: 1. Durable idempotency (DbIdempotencyStore) InMemoryIdempotencyStore only coordinates within one process — which means the atomic claim does NOT defeat the double-charge race on multi-instance / serverless deployments (e.g. the Cloudflare Workers API). Adds a SQL-backed store (D1 / Turso / SQLite) using the same atomic primitives as DbLedger (INSERT OR IGNORE + conditional UPDATE … WHERE), so concurrent claimers across any number of workers collapse to a single owner with no check-then-act race. Includes schema + runMigrations(). 2. Wait-then-replay for in-flight duplicates Previously a concurrent duplicate that hit an active lease was rejected with an "in progress" error. Now the runtime waits for the in-flight execution to finish and replays its cached result, so N parallel identical calls collapse to ONE provider call and ONE charge. Tunable via waitForInProgressMs (default 5000ms; 0 restores the old fail-fast behaviour). 3. Concurrency tests against the real compiled runtime New src/__tests__/concurrency.test.mjs (imported from dist/, not an inline re-implementation) proves: 50 parallel identical calls → 1 handler call, 1 charge, 50 identical results; two instances racing on one DB → exactly one winner; claim→complete→replay, fail→reclaim, and lease-expiry semantics. Also adds a .gitignore (node_modules/, dist/, logs) — these were untracked and easy to commit by accident. Full suite: 208/208 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 42 minutes and 39 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds ChangesDurable Idempotency and ToolGate Wait-and-Replay
Sequence DiagramsequenceDiagram
participant Client
participant ToolGate
participant DbIdempotencyStore
participant SQL as tg_idempotency (SQL)
rect rgba(100, 149, 237, 0.5)
note over Client,SQL: First instance claims the key
Client->>ToolGate: executeTool(key, input)
ToolGate->>DbIdempotencyStore: claim(key, ownerId, leaseMs)
DbIdempotencyStore->>SQL: INSERT OR IGNORE
SQL-->>DbIdempotencyStore: 1 row inserted
DbIdempotencyStore-->>ToolGate: claimed
ToolGate->>ToolGate: run handler → complete(key, result)
ToolGate->>DbIdempotencyStore: complete(key, ownerId, result)
DbIdempotencyStore->>SQL: UPDATE status=completed
end
rect rgba(144, 238, 144, 0.5)
note over Client,SQL: Second instance hits in_progress, waits and replays
Client->>ToolGate: executeTool(key, input)
ToolGate->>DbIdempotencyStore: claim(key, ownerId2, leaseMs)
DbIdempotencyStore->>SQL: INSERT OR IGNORE → 0 rows, UPDATE fails
DbIdempotencyStore->>SQL: SELECT → in_progress
DbIdempotencyStore-->>ToolGate: in_progress
loop poll until completed or deadline
ToolGate->>DbIdempotencyStore: peek(key)
DbIdempotencyStore->>SQL: SELECT
SQL-->>DbIdempotencyStore: completed record
DbIdempotencyStore-->>ToolGate: completed record
end
ToolGate->>ToolGate: handleDuplicate(record)
ToolGate-->>Client: replayed result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
docs/KNOWN_LIMITATIONS.md (1)
22-30: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd
waitForInProgressMsconfiguration to the example or document its behavior.The example code does not show the
waitForInProgressMsoption mentioned in the PR objectives ("tunable viawaitForInProgressMs, default 5000ms"). Users following this example won't learn about the wait-and-replay behavior or how to customize it (e.g., set to0to restore fail-fast behavior).Consider adding a note like:
- Show
waitForInProgressMs: 5000as an optional example in the ToolGate config, or- Add a brief explanation that duplicate in-flight calls are replayed by default after a configurable wait.
📝 Proposed addition to clarify behavior
Append a paragraph after the code example (after line 30):
By default, when a duplicate request arrives while an identical call is in progress, ToolGate waits up to 5 seconds for the in-flight execution to complete, then replays its result. To restore the previous fail-fast behavior, set `waitForInProgressMs: 0` when constructing `ToolGate`.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/KNOWN_LIMITATIONS.md` around lines 22 - 30, The ToolGate configuration example does not document the waitForInProgressMs option that controls how long to wait for in-flight duplicate requests before replaying results. Add a clarifying paragraph after the code example explaining that by default waitForInProgressMs is set to 5000ms and will wait for identical in-flight calls to complete and replay their result, and that users can set waitForInProgressMs to 0 to restore the previous fail-fast behavior where duplicate requests immediately fail.src/toolgate.ts (1)
180-205: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueConsider a more accurate error message for non-timeout cases.
The error message always says "still in progress" but
waitForCompletionreturnsnullfor three distinct cases: timeout, record failed, or record vanished. The message is accurate only for timeouts.This is minor since the retry-safe behavior is correct in all cases, and the caller (or agent) can safely retry regardless.
💡 Optional: Differentiate error messages
- // Timed out, or the in-flight execution failed/vanished. Surface a - // clear signal; the caller (or agent) can safely retry. + // Timed out, or the in-flight execution failed/vanished. + // Surface a clear signal; the caller (or agent) can safely retry. return { success: false, output: { - error: - "Duplicate request: a previous execution is still in progress.", + error: "Duplicate request: the previous execution did not complete successfully. Please retry.", idempotencyKey, }, };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/toolgate.ts` around lines 180 - 205, The error message returned when waitForCompletion returns a falsy value is not accurate for all scenarios since this method can return null for three distinct reasons: timeout, the previous execution failed, or the record vanished. Instead of always stating "a previous execution is still in progress", modify the code to differentiate between these cases and provide more accurate error messages for each scenario. You may need to examine what waitForCompletion returns and add logic to determine which case occurred, then return the appropriate error message accordingly while maintaining the existing retry-safe behavior.src/__tests__/concurrency.test.mjs (1)
242-357: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueConsider adding a heartbeat test.
The
MockDbimplements theheartbeattag (lines 127-133), and theIdempotencyStorecontract includesheartbeat()for lease extension. Adding a test case would improve coverage of the lease-extension flow.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/__tests__/concurrency.test.mjs` around lines 242 - 357, Add a new test case within the "Concurrency: DbIdempotencyStore atomic claim" describe block to verify the heartbeat lease-extension functionality. The test should create a DbIdempotencyStore instance with MockDb, claim a record with a short lease duration, call the heartbeat() method to extend the lease, and verify that the lease is successfully extended and active (preventing reclaim by another claimer). This will provide test coverage for the heartbeat() method which is part of the IdempotencyStore contract but is not currently tested in the concurrency test suite.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docs/KNOWN_LIMITATIONS.md`:
- Around line 22-30: The ToolGate configuration example does not document the
waitForInProgressMs option that controls how long to wait for in-flight
duplicate requests before replaying results. Add a clarifying paragraph after
the code example explaining that by default waitForInProgressMs is set to 5000ms
and will wait for identical in-flight calls to complete and replay their result,
and that users can set waitForInProgressMs to 0 to restore the previous
fail-fast behavior where duplicate requests immediately fail.
In `@src/__tests__/concurrency.test.mjs`:
- Around line 242-357: Add a new test case within the "Concurrency:
DbIdempotencyStore atomic claim" describe block to verify the heartbeat
lease-extension functionality. The test should create a DbIdempotencyStore
instance with MockDb, claim a record with a short lease duration, call the
heartbeat() method to extend the lease, and verify that the lease is
successfully extended and active (preventing reclaim by another claimer). This
will provide test coverage for the heartbeat() method which is part of the
IdempotencyStore contract but is not currently tested in the concurrency test
suite.
In `@src/toolgate.ts`:
- Around line 180-205: The error message returned when waitForCompletion returns
a falsy value is not accurate for all scenarios since this method can return
null for three distinct reasons: timeout, the previous execution failed, or the
record vanished. Instead of always stating "a previous execution is still in
progress", modify the code to differentiate between these cases and provide more
accurate error messages for each scenario. You may need to examine what
waitForCompletion returns and add logic to determine which case occurred, then
return the appropriate error message accordingly while maintaining the existing
retry-safe behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4aa5c7f7-3cb1-4a3b-86af-f2c2c3b4c267
📒 Files selected for processing (8)
.gitignoredocs/KNOWN_LIMITATIONS.mdpackage.jsonsrc/__tests__/concurrency.test.mjssrc/db-idempotency.tssrc/index.tssrc/toolgate.tssrc/types.ts
…SQLite; Turso-bound) Adds integration coverage that runs DbIdempotencyStore against a REAL SQL engine (better-sqlite3), complementing the in-memory mock in concurrency.test: - Level 1 — lifecycle against a real query planner: claim→complete→replay, INSERT OR IGNORE blocks an active lease, failed records reclaimable, non-owner complete rejected, lease-expiry semantics. - Level 2 — true multi-instance race: 8 SEPARATE OS processes open one shared SQLite file (WAL) and race the same key; asserts exactly one winner. The guarantee is decided by the DB write lock, not JS event-loop ordering — this is the actual "multi-instance safe" proof. SQLite is the local/CI engine; the same DbClient adapter shape maps to Turso/libsql (@libsql/client), the intended production target (documented in KNOWN_LIMITATIONS). better-sqlite3 added as a devDependency; the suite skips gracefully if the native module is unavailable. Full suite: 213/213 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up: real-DB integration tests (the follow-up noted above is now done)Added
SQLite is the local/CI engine; the same Full suite now 213/213 passing. |
|
thanks for also adding test cases tho, nice pr |
Why
Toolgate's pitch is execution-layer idempotency: payment rails answer "did payment happen?", Toolgate answers "what should happen to the paid action?". The code already had the right bones — an atomic
claim/lease/complete/failstate machine (idempotency.ts) and a rail-aware recovery state machine (recovery.ts). But three gaps meant the runtime couldn't prove the headline property ("N parallel duplicates → 1 charge, 1 provider call") on a real deployment:InMemoryIdempotencyStoredefeats the double-charge race inside one Node process — but the API ships on Cloudflare Workers (api/), which is multi-instance by nature. Two workers each claim on their own in-memory map → both execute → double charge. The one thing that separates us from "just use Stripe idempotency" was the piece not yet built (our ownKNOWN_LIMITATIONS.mdsaid "Durable idempotency is future work.")."in progress"error — so the "99 replay/wait" story was actually "99 rejections".get-then-set), not the real atomicclaim()path.What this PR does
1.
DbIdempotencyStore— durable, cross-instance (src/db-idempotency.ts)SQL-backed (Cloudflare D1 / Turso / SQLite) via the same
DbClientabstraction asDbLedger. Atomicity uses the exact idioms the codebase already trusts:INSERT OR IGNORE(claim-if-absent) + a single conditionalUPDATE … WHERE(reclaim-if-stale), whosechangescount decides the winner. Concurrent claimers across any number of workers collapse to one owner with no check-then-act race. Ships with schema +runMigrations(), exported from the index.2. Wait-then-replay (
src/toolgate.ts)On a concurrent duplicate with an active lease, the runtime now waits for the in-flight execution and replays its cached result instead of erroring. This is what collapses N parallel identical calls into one provider call and one charge. Tunable via
waitForInProgressMs(default5000; set0to restore the previous fail-fast behaviour).3. Concurrency tests against the real compiled runtime (
src/__tests__/concurrency.test.mjs)Imported from
dist/(not inlined), so they exercise the productionclaim()path:DbIdempotencyStoreinstances racing on one shared DB → exactly one winnerclaim → complete → replay,fail → reclaim, non-ownercomplete()rejected, and lease-expiry semanticsPlus a
.gitignore(node_modules/,dist/, logs) — these were untracked and easy to commit by accident.Tests
npm test), no regressionsnpm run build/typecheckcleanNotes / follow-ups
DbIdempotencyStoreis unit-tested against a faithful in-memoryDbClientmock that models the atomic primitives; a real-DB integration test (D1/Turso) against live serialisation is a sensible next step.waitForInProgressMs: 0).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores
.gitignoreto exclude common build artifacts and logs.