feat(parser): read Claude/Codex sessions from S3-compatible object storage#650
feat(parser): read Claude/Codex sessions from S3-compatible object storage#650DanielMao1 wants to merge 2 commits into
Conversation
…orage When a claude_project_dirs / codex_sessions_dirs entry is an s3:// URI, sessions are listed and fetched from object storage (AWS S3, MinIO, Aliyun OSS, Cloudflare R2) via minio-go — pure Go, no cgo. Each object is downloaded, buffered to a transient temp file so the existing path-based parsers run unchanged, then removed; no persistent local mirror. The parsed session records the original s3:// URI as its source path. This is a push-based alternative to SSH remote sync (kenn-io#412): each machine pushes its sessions to its own S3 prefix on its own schedule and a central agentsview reads them, with no inbound SSH to each machine required. The source machine is derived from the .../<machine>/raw/ path layout, mirroring the host prefix SSH sync attaches. Credentials come from standard AWS env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, AWS_S3_ENDPOINT).
roborev: Combined Review (
|
…okup - processS3Session derives the temp filename from path.Base and rejects names containing a path separator or "." / "..", so an S3 object key can never write outside the temp dir (e.g. a key embedding a backslash on Windows). - FetchS3Object returns a streaming reader; processS3Session io.Copy's it straight to the temp file instead of buffering the whole object in memory. - FindSourceFile returns stored s3:// paths as-is (no local file to stat) so single-session resync routes them back through processS3Session.
|
The issues reviewed are addressed in 598e6bd.
|
roborev: Combined Review (
|
Summary
Lets
claude_project_dirs/codex_sessions_dirsentries bes3://URIs.When a configured root is an
s3://URI, Claude and Codex sessions are listedand fetched directly from an S3-compatible object store (AWS S3, MinIO, Aliyun
OSS, Cloudflare R2) and parsed into the local SQLite DB like any other source.
Pure Go via
minio-go— no cgo, no change to the default build orcross-compilation.
Motivation
I'm a heavy Claude Code (and Codex) user and a big fan of this project — it's
become how I actually review what my agents did. But my sessions are scattered:
some on my Mac, some on a personal EC2 box, a lot more on my company's GPU
cluster. What I really want is one agentsview that gives me an integrated
view and analytics across all of them, not a separate dashboard per machine.
sync --hostand PG push already cover part of this, but both assume the centralinstance can reach each machine: SSH needs inbound access, and PG push needs a
daemon + Postgres on every box. The cluster nodes I can't SSH into from home, and
ephemeral cloud instances are often gone by the time I'd want to pull from them.
So this goes the other direction — push, not pull. Each machine drops its own
~/.claude/projects/~/.codex/sessionsinto its own prefix of a shared bucket(a plain
aws s3 sync/rclonecron), and one agentsview reads them all fromobject storage. Nothing has to reach back into the source machines; they just
push before they disappear. It sits alongside
sync --hostand PG push ratherthan replacing either.
How it works
The two touch points are discovery and reading bytes; the per-agent
parsers are untouched.
DiscoverClaudeProjects/DiscoverCodexSessionsdetect ans3://root andlist objects, reconstructing the same project / subagent layout the local
walkers produce.
processS3Sessiondownloads each object, buffers it to a transient temp fileso the existing path-based parsers (incremental offsets, subagent paths) run
unchanged, then deletes it — no persistent local mirror. The parsed session
records the original
s3://URI as its source path..../<machine>/raw/...layout, mirroring thehost:prefix that SSH sync already attaches to pulled sessions.Credentials come from standard AWS env vars (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION,AWS_S3_ENDPOINT).AWS_S3_ENDPOINTselects the S3-compatible endpoint (empty = AWS).
Where to look
internal/parser/s3source.go— the S3 client, listing, fetch, and discovery (new)internal/parser/discovery.go—s3://dispatch +DiscoveredFile.Machineinternal/sync/engine.go—processS3Sessionand theprocessFileshort-circuitTradeoffs / limitations
natural follow-up using the metadata
ListObjectsalready returns.session export <id>is local-only and doesn't fetch from S3 yet.github.com/minio/minio-go/v7.Open question
Happy to adjust the shape. If you'd rather this be a dedicated
[[remotes]]-styleconfig block (in the spirit of #412) instead of overloading the existing
*_dirskeys with
s3://, I can rework it that way — wanted to keep the diff minimalfirst and get your read on whether an object-store source is something you'd want
in tree at all.