Skip to content

Clean up mailbox state, attachment blobs, and agent history on mailbox delete#1

Open
stevenobiajulu wants to merge 1 commit intocloudflare:mainfrom
stevenobiajulu:fix/delete-mailbox-cleanup
Open

Clean up mailbox state, attachment blobs, and agent history on mailbox delete#1
stevenobiajulu wants to merge 1 commit intocloudflare:mainfrom
stevenobiajulu:fix/delete-mailbox-cleanup

Conversation

@stevenobiajulu
Copy link
Copy Markdown

Addresses the existing TODO at workers/index.ts:140:

await c.env.BUCKET.delete(key); // TODO: also delete DO data and R2 attachment blobs

The current DELETE /api/v1/mailboxes/:mailboxId handler removes only the R2 settings JSON. Three categories of state are leaked:

  1. MailboxDO SQLite storage — all emails, folders, threads, attachment rows, and the d1_migrations tracker.
  2. R2 attachment blobs — every object under attachments/{emailId}/{attId}/{filename} from every email ever received or sent.
  3. EmailAgent DO state — the chat table (cf_ai_chat_agent_messages), the runtime tables (cf_agents_state, cf_agents_schedules, cf_agents_queues, cf_agents_workflows), plus any scheduled alarms.

Because idFromName(email) is deterministic, a mailbox re-created with the same address inherits the old DO state from both Durable Objects. Over time R2 usage grows without bound.

Change

Two files touched. workers/agent/index.ts is unchanged — EmailAgent inherits the upstream AIChatAgent.destroy() method, which already does the right thing.

workers/durableObject/index.ts — new destroyMailbox() RPC on MailboxDO:

  • Enumerates attachment rows (SELECT email_id, id, filename FROM attachments).
  • Batch-deletes the corresponding R2 keys in chunks of 1000 (R2 batch-delete cap), guarding the empty-array case.
  • Calls this.ctx.storage.deleteAll() to wipe SQLite (emails, folders, attachments, threading, migrations tracker).
  • Defers this.ctx.abort("destroyed") via setTimeout(fn, 0) so the RPC response flushes before the uncatchable abort fires. The abort ensures a subsequent request hits a fresh DO instance (which re-runs migrations on a new DB) instead of reusing the live instance with an empty schema.

Attachment deletion happens inside the DO rather than shipping a string[] back over RPC, so the operation is atomic from the caller's perspective and scales to mailboxes with many attachments.

workers/index.ts — DELETE handler now:

  1. HEAD-checks the settings JSON (returns 404 if the mailbox doesn't exist).
  2. Calls mailboxStub.destroyMailbox().
  3. Calls agentStub.destroy() — the upstream AIChatAgent.destroy() (see cloudflare/agents packages/ai-chat/src/index.ts) drops cf_agents_* / cf_ai_chat_agent_messages, runs deleteAlarm(), disposes resources, calls deleteAll(), and aborts the DO.
  4. Deletes the settings JSON last, so a partial failure leaves the HEAD check passing and the handler is safe to retry.

Why not hand-roll agent cleanup

An earlier draft of this change proposed adding a /destroy route on EmailAgent that called this.ctx.storage.deleteAll(). A review of the upstream agents package source showed that would have been incorrect:

  • AIChatAgent.destroy() does more than a raw storage wipe — it also aborts the DO, disposes resources, and cancels scheduled alarms explicitly.
  • The repo's compatibility_date is 2025-11-28. Cloudflare made deleteAll() implicitly clear alarms only on 2026-02-24+. A hand-rolled path on this compat date would have leaked alarms.

Delegating to the upstream method sidesteps both issues.

Scope — best-effort, not linearizable

This is best-effort cleanup, not full deletion linearizability. receiveEmail() checks BUCKET.head() once at workers/index.ts:367 and then writes attachments + DO rows. A concurrent inbound delivery that passed its HEAD check before we delete the settings JSON can still land in the freshly-aborted-then-re-instantiated DO. Closing that race requires coordination (e.g. a deleting: true flag in settings JSON that receiveEmail reads) and is intentionally left for a follow-up — happy to take that on as a separate PR if wanted.

Verification

Full E2E locally via npm run dev:

  1. Created a mailbox test@example.com.
  2. POSTed an email with a 30-byte attachment → HTTP 200, size 30 when fetching the attachment blob confirms it's in R2.
  3. DELETE /api/v1/mailboxes/test@example.comHTTP 204.
  4. Mailbox list returns [].
  5. Direct inspection of the miniflare R2 SQLite showed the old attachment key is gone; after re-creating the mailbox, only the new settings JSON is present.
  6. Re-created the same mailbox → HTTP 201 (would have been 409 Mailbox already exists if settings had leaked).
  7. Re-listed the SENT folder on the recreated mailbox → {"emails":[],"totalCount":0}, proving the DO was drained and the new instance re-ran migrations from scratch.

Also deployed the branch to my own Cloudflare account via wrangler deploy with one-click Access enabled; worker deploys cleanly and Access gating is functional (unauthenticated requests return 302 to the login redirect). The code path is identical to the local run — the fix exercises DO storage primitives + R2 batch delete + upstream AIChatAgent.destroy(), none of which differ between miniflare and production.

Out of scope (intentional)

  • The receiveEmail() race described above.
  • DraftBody.to email validation at workers/index.ts:34 — happy to submit separately.
  • Test harness — the repo has no existing test infrastructure (no *.test.ts, no tests/ dir, no vitest.config.*). Adding Vitest + Miniflare would expand the review surface and open tooling debates. Happy to follow up if a test harness PR would be welcome.

DCO

I certify that this contribution is made under the terms of the Developer Certificate of Origin 1.1 (https://developercertificate.org). The commit is signed off with Signed-off-by: Steven Obiajulu <steven@usejunior.com>.

…x delete

The DELETE /api/v1/mailboxes/:mailboxId handler only removed the R2 settings
JSON, leaving the MailboxDO SQLite storage, all R2 attachment blobs under
attachments/{emailId}/{attId}/{filename}, and the EmailAgent DO state
(cf_ai_chat_agent_messages, cf_agents_* tables, alarms) behind. This is the
existing TODO at workers/index.ts:140. Over time the leak grows without bound;
a recreated mailbox with the same email address inherits the old DO state
because idFromName() is deterministic.

This change adds a destroyMailbox() RPC method on MailboxDO that enumerates
attachments, batch-deletes their R2 blobs (1000 keys per call), wipes SQLite
via ctx.storage.deleteAll(), and aborts the DO so the next access
re-instantiates with a fresh migration run. Attachment deletion happens
inside the DO so the operation is atomic from the caller and avoids
shipping a potentially large string[] over RPC.

For the agent side, the handler now calls the existing AIChatAgent.destroy()
RPC method from the upstream agents package. Upstream destroy() drops
cf_agents_* and cf_ai_chat_agent_messages tables, deletes alarms, runs
dispose hooks, wipes storage, and aborts the DO. The compatibility_date of
2025-11-28 predates the change that made deleteAll() implicitly clear
alarms, so delegating to upstream destroy() is both simpler and more
correct than a hand-rolled path.

The settings JSON is deleted last so partial failures leave the HEAD check
passing and the handler can be safely retried. This is best-effort cleanup,
not linearizable deletion -- a concurrent receiveEmail() that passed its
own HEAD check can still repopulate the mailbox after drain. Closing that
race requires coordination (e.g. a deleting-flag in settings read by the
receive path) and is intentionally left for a follow-up.

Verified locally via npm run dev: created a mailbox, posted an email with
an attachment blob, verified the blob in R2, issued DELETE, and confirmed
(a) the attachment blob is gone, (b) the settings JSON is gone, (c)
recreating the same mailbox succeeds with an empty DO.

Signed-off-by: Steven Obiajulu <steven@usejunior.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants