Skip to content

retry transient Durable Object errors at the routing boundary#118

Closed
FredKSchott wants to merge 1 commit into
mainfrom
retry-transient-do-errors
Closed

retry transient Durable Object errors at the routing boundary#118
FredKSchott wants to merge 1 commit into
mainfrom
retry-transient-do-errors

Conversation

@FredKSchott

Copy link
Copy Markdown
Member

On the Cloudflare target, Flue calls routeAgentRequest to hand a request off to the per-agent Durable Object. Workerd can throw transient infrastructure errors from this RPC (e.g. Internal error in Durable Object storage caused object to be reset.) before user code runs inside the DO. These errors bypass the agents SDK's _tryCatch / Agent::onError entirely, so they can only be handled at the routing boundary.

This PR wraps routeAgentRequest with a small routeWithDoRetry helper that applies exponential backoff per Cloudflare's documented guidance:

  • Up to 3 attempts, ~100–800ms jittered backoff (worst-case added latency ~1s).
  • Retries only when err.retryable === true and err.overloaded !== true.
  • Clones the request body per attempt so the stream stays replayable; the DO stub is implicitly recreated on each call.
  • Logs each retry with the agent name and instance id for traceability.
  • Falls through to the existing app.onError → canonical 500 envelope if all attempts fail.

Scope is intentionally narrow: only the two Cloudflare-only routeAgentRequest call sites in flue-app.ts. The Node target is untouched.

cc @threepointone for review.

@threepointone

Copy link
Copy Markdown

I think this should be in partyserver, I'll fix it there tonight. But I'd be curious what error is being thrown that's not being caught, usually happens in an unprotected hook or something. Could you ask your agent to try and analyze and figure it out? point it at our agents codebase as well if it'll help.

@threepointone

Copy link
Copy Markdown

landed in cloudflare/partykit#399 should go out tomorrow in agents

@FredKSchott

Copy link
Copy Markdown
Member Author

amazing thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants