COW-1013: orderbook 429/backoff resilience + non-root container#103
Merged
jeffersonBastos merged 2 commits intoJun 17, 2026
Conversation
Orderbook API resilience (429/5xx): - Add fetchOrderbook(): bounded retry/backoff around fetchWithTimeout that honors Retry-After (capped) on 429 and exponential backoff on 5xx, failing fast within a wall-clock budget so it never holds a block-handler TX open. - Add OrderbookUnavailableError + ob:unavailable / ob:retry log codes so a rate-limited / down API is distinguishable from "order not on API yet" (previously both surfaced as a silent missing UID). - Wire into fetchAccountOrders and fetchOrdersByUids; caller control flow and return shapes are unchanged (promotion still safely defers, now observable). Container hardening: - Run as the non-root `node` user (uid 1000) with chown of the workdir and /pnpm store. Verified: image builds, runs as uid 1000, workdir writable. Tests: 429-then-success, persistent 429 → ob:unavailable (bounded retries), 5xx retry, empty-200 stays "absent", HTTP-date Retry-After parsing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep retry-then-succeed (Retry-After honored) and persistent-429 → ob:unavailable (bounded). Drop the 5xx, empty-200, and HTTP-date cases: 5xx already flows through the pre-existing 500 tests, and empty-200 is covered by the existing empty-array test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes COW-1013.
Two production-hardening items surfaced in the COW-1012 QA review (PR #97), each with tests.
1. Orderbook API resilience (429 / 5xx)
Problem.
orderbookClient.tstreated every non-OK response identically:log("warn", ...)thenbreak/continue, returning a partial/empty result. That empty result is indistinguishable downstream from "order genuinely absent" (blockHandler.ts:if (!orderbookEntry) continue; // not on API yet). So a429/5xxcaused candidate orders to silently never promote, with onlywarnlogs — looks healthy from the status enums.Change.
fetchOrderbook()wrappingfetchWithTimeoutwith bounded retry/backoff: honorsRetry-After(delta-seconds and HTTP-date, capped atORDERBOOK_RETRY_MAX_DELAY_MS) on429, exponential backoff on5xx. Stops atORDERBOOK_MAX_RETRIESor when the next sleep would exceedORDERBOOK_RETRY_BUDGET_MS, then throws — never holds a block-handler DB transaction open past the budget (the key constraint perwithTimeout.ts).OrderbookUnavailableError+ distinct log codes:ob:unavailable(levelerror, carriesstatus) — API refused to answer. Alarm on this.ob:retry(levelwarn) — a retry is happening.fetchAccountOrdersandfetchOrdersByUids. Caller control flow and return shapes are unchanged — promotion still safely defers on unavailability (retries on the next poll ~ORDERBOOK_POLL_INTERVALblocks later), it's just now observable.Constants (
src/constants.ts):ORDERBOOK_MAX_RETRIES=2,ORDERBOOK_RETRY_BASE_MS=250,ORDERBOOK_RETRY_MAX_DELAY_MS=2000,ORDERBOOK_RETRY_BUDGET_MS=4000.Tests (
tests/helpers/orderbookClient.test.ts): two focused cases — 429-then-success (Retry-After honored) and persistent-429 →ob:unavailableafter bounded retries. 5xx is already exercised by the pre-existing500tests (they now route through the retry/classify path), and empty-200 by the existing empty-array test.2. Run the container as non-root
Dockerfilenever setUSER(ran as uid 0). Nowchown -R node:node /usr/src/app /pnpmafter install, thenUSER nodebeforeHEALTHCHECK/CMD(node:22-alpine shipsnode, uid 1000).Verified: image builds;
docker run→uid=1000(node); workdir owned by node and writable (Ponder cache). Fulldocker compose --profile deploy upend-to-end was not run here (needs RPC secrets/network) — worth a sanity check in deploy.Live-probe finding — rate limiting may surface as
403, not429A controlled probe against the real
api.cow.fi(826 requests to a live mainnet order, bursts up to 300 concurrent) returned all 200s, no 429 — the endpoint is CDN/WAF-fronted and absorbs single-IP bursts. Separately, restricted endpoints returned403from the edge nginx. Implication: real-world throttling may arrive as a WAF403rather than an app-level429. Our code handles this correctly — a non-429/non-5xx response is a non-retryableOrderbookUnavailableError→ loggedob:unavailable(so it is never mistaken for "order absent"), with no retry (a403is not transient). NoRetry-Afteris sent on success; the retry path is defensive (honor it if present, else backoff).Scope decisions (per discussion)
blockHandler.ts) falls back to writingcancelledfor orphans the API doesn't return — so an unavailable API (429/5xx, or a timeout) can mislabel a filled order as cancelled. Pre-existing behavior (already happens on timeout); the retry here reduces its likelihood. Could be a follow-up: skip thecancelledwrite specifically onOrderbookUnavailableError.Checks
pnpm test→ 108 passed ·pnpm lintclean ·tsc --noEmitclean