Skip to content

feat: attachment retrieval via metadata + signed download URL (§6a #5)#259

Open
jiashuoz wants to merge 6 commits into
mainfrom
feat/attachment-retrieval
Open

feat: attachment retrieval via metadata + signed download URL (§6a #5)#259
jiashuoz wants to merge 6 commits into
mainfrom
feat/attachment-retrieval

Conversation

@jiashuoz

Copy link
Copy Markdown
Member

Implements §6a #5: retrieve attachments via metadata + a short-lived signed download URL instead of base64 through the agent's context. No external dependency (native adapter); the 2 MB inline wall is gone.

Slices

  1. internal/mailparse — Go MIME attachment extractor (Attachments/AttachmentAt), the authoritative attachment index.
  2. APIMessageView.attachments[] metadata parsed server-side (+ spec/SDK regen).
  3. AttachmentStore port + native adapter + endpoints — bearer metadata+mint endpoint (GET …/attachments/{index}{…, download_url, expires_at}, ?inline=true ≤256 KB) and a capability-token streaming route (…/attachments/{index}/download?token=, no bearer, HMAC-SHA256 token bound to message+index, 15-min TTL).
  4. SDK + MCPmessages.getAttachment; get_attachment returns the URL by default (inline:true → base64); get_message reads the server attachments[]; client-side MIME re-parse + 2 MB wall removed.

Security model

The download URL is a capability (no bearer): the token binds message_id|index|expiry; the path {email} binds the message to its owning agent (GetMessage is agent-keyed), so a token streams only what it minted. Fails closed: bad/expired/tampered/index-mismatched token → 403; index OOB → 404; draft → 404.

Verification

  • Unit (mailparse), httpapi over a real httptest server (metadata, inline, 413, 404, cross-agent 403, download stream + headers, token negatives), SDK + MCP shape/forwarding/error-surfacing.
  • Local-service e2e (real binary + Postgres): seeded an inbound message with a real MIME attachment; over the wire confirmed get_message attachments[], metadata + download_url, the capability-token download stream (exact bytes + Content-Type/Disposition/nosniff), inline base64, and the 404/403 negatives. Logs clean.
  • Go unit suite + spec/SDK drift gates green; SDK 97 / MCP 134.

Deferred (designed seams)

Object-storage/AgentDrive adapter, outbound presigned upload, large-file attach-by-reference. See docs/design/attachment-retrieval.md.

🤖 Generated with Claude Code

jiashuoz and others added 6 commits June 20, 2026 23:26
…lice 1)

Attachments(raw) / AttachmentAt(raw, i) walk the MIME tree and return attachment
parts in stable document order, decoding Content-Transfer-Encoding to bytes
(binary-safe). An attachment = a leaf part with a filename or an explicit
attachment disposition; body text parts and multipart containers are excluded;
named inline parts (cid images) are included. This index is the authoritative
attachment index for the read view + the download route (slices 2-3) — the
backend, not the MCP's client-side parse, now owns it.

Tests: order + base64/quoted-printable decode, binary integrity, bounds,
plain-message (no attachments), malformed input → empty.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d (slice 2)

MessageView gains attachments[] — per-attachment metadata {index, filename,
content_type, size_bytes}, parsed server-side from raw_message via
mailparse.Attachments (slice 1). Never the bytes. Always an array (empty when
none / held drafts). This makes the attachment INDEX authoritative on the backend
so the upcoming download route and the agent agree on which part is which (the
MCP's prior client-side TS parse can't drive a server route consistently).

Parsed for any direction with a raw_message (inbound + sent outbound).

Regenerated api/openapi.yaml (new AttachmentMetaView schema + MessageView field)
and the TS + Python SDK bases (make spec + make generate-sdk). httpapi tests
green; spec-golden + drift gates pass once committed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ts (slice 3)

§6a #5: retrieve attachments via metadata + a short-lived signed download URL
instead of base64 through the agent's context.

- internal/httpapi/attachments.go:
  - AttachmentStore port (DownloadURL / VerifyDownload) — the seam an
    object-storage adapter (deferred) implements later.
  - nativeAttachmentStore (default, zero-dep): mints an HMAC-SHA256 capability
    token bound to message_id+index+expiry over the deployment signing secret,
    pointing at e2a's own streaming route. (Self-contained signer — not the
    HITL approvaltoken, whose action allowlist + payload are approve/reject
    specific; same crypto family.)
  - GET /v1/agents/{email}/messages/{id}/attachments/{index} (Huma, bearer):
    {index, filename, content_type, size_bytes, download_url, expires_at};
    ?inline=true adds base64 data for <=256 KB (413 over the cap).
  - GET …/attachments/{index}/download?token= (raw chi, capability-token, no
    bearer): streams bytes with Content-Type/Disposition/Length + nosniff. The
    token binds message+index; the path {email} binds the message to its owning
    agent (GetMessage is agent-keyed), so a token only streams what it minted.
- Deps.AttachmentStore + route registration; apiserver wires the native store
  from SigningSecret+PublicURL (nil when unset → endpoints unwired); main passes
  cfg.Signing.HMACSecret. TTL 15m, inline cap 256 KB.

Spec + SDK bases regenerated (getAttachment op + AttachmentView). Tests: metadata
+ signed URL, inline small, inline-too-large 413 (+ URL still works), index OOB
404, agent-scope cross-agent 403, download happy path streams bytes, bad token
403, index-mismatched token 403. httpapi green; drift gates pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ce 4)

§6a #5 client surface. SDK hand-layer: messages.getAttachment(email, id, index,
{inline}) → AttachmentView. MCP:
- McpClient.getAttachment wrapper.
- get_attachment tool now returns {index, filename, content_type, size_bytes,
  download_url, expires_at} by default; `inline: true` adds base64 `data` for
  small files. The 2 MB hard wall and the client-side MIME re-parse are gone —
  index-out-of-range (404) and inline-too-large (413) are server concerns now,
  surfaced via the structured error code (§6a #4).
- get_message now reads MessageView.attachments (server-authoritative metadata)
  instead of parsing rawMessage client-side; the mailparser dependency and the
  parseAttachments helper are removed from the tool path.

Tests: SDK getAttachment maps the view + URL-encodes the path + forwards inline;
MCP get_attachment default (download_url, no data) / inline (base64) / error
surfaced as isError; get_message attachments now sourced from the server field.
SDK 97 + MCP 134 green. No spec/SDK-base change (getAttachment was generated in
slice 3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- api-v1-redesign.md: flip the §6a as-built banner + recommendation #5 to ✅ done
  (native adapter; AgentDrive/object-storage + outbound upload + attach-by-reference
  recorded as deferred seams).
- docs/design/attachment-retrieval.md: new as-built design (port, native adapter,
  endpoints, invariants, edge cases, deferred seams, verification).
- mcp/README.md: add the get_attachment tool row (download_url + inline).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Independent review: pass-with-risks. Adversarial: SAFE (proved the capability
token can't be forged or cross-tenant; index in get_message == bytes streamed).
No blockers. Findings applied, each with a regression test:

- [both, #1] Cross-agent token-replay: the download route's claimed defense
  (token has no agent; binding rests on GetMessage keyed by the path agent) was
  untested. Added TestAttachment_DownloadTokenCannotCrossAgents — token minted for
  support's message replayed against other's path → 404, no bytes.
- [adversarial L1] Empty-secret fail-closed: nativeAttachmentStore.DownloadURL now
  errors and VerifyDownload returns false when the HMAC secret is empty (prod
  config already blocks it; defense-in-depth so a future miswiring can't open the
  route). + test.
- [adversarial L3] Expiry off-by-one: reject at the exact expiry second
  (!Before instead of After). + boundary test.
- [independent] Python SDK parity: added MessagesResource.get_attachment mirroring
  the TS ergonomic method (+ test); the §6a "SDK done" claim now holds for both.
- [independent] MIME walker coverage: nested multipart (depth-first index
  stability) + unnamed attachment-disposition part.
- [independent] Dead code: removed the unused mailparser import from the MCP test
  and dropped mailparser/@types/mailparser from mcp/package.json (the client-side
  re-parse is fully gone now).
- [both] Doc: noted the message/rfc822-nested-attachment limitation.

Deferred: L2 (full MIME decode per call + post-decode inline cap) — bounded by the
10 MB inbound cap; an optimization, not a correctness gap. A live contract test —
the over-the-wire httptest endpoint tests + the documented real-binary e2e + the
new cross-tenant unit test already cover the route.

Go (mailparse+httpapi) + SDK 97 + MCP 134 + Python 147 green; drift gates pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant