feat: PostgreSQL push sync and read-only server mode#167
Conversation
roborev: Combined Review (
|
|
Picking up the conversation from #108, I asked Claude to compare against roborev: |
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
|
I'll get to this within a few days, just bear with me |
No rush, I'm still iterating and trying to converge with a passing roborev verdict. I'll follow up here and tag you when ready. |
roborev: Combined Review (
|
|
Finding 1 (boundary fingerprints): Fixed in df26d91. Finding 2 (late arrival at cutoff): Already handled by the boundary replay mechanism. When |
roborev: Combined Review (
|
…h boundary state Address three findings from PR wesm#167 combined review: 1. CheckSSL rejects insecure sslmode for non-loopback PG hosts at startup (pgdb.New and pgsync.New). AllowInsecurePG config option falls back to the existing warning for local dev setups. 2. pgdb.New now uses PingContext with a 10s timeout, matching pgsync.New. 3. readPushedFingerprints reads boundary state fingerprints without cutoff validation, so partial failures on the first push (when lastPush is empty) are still deduplicated on the next cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
roborev: Combined Review (
|
Addressed: Combined Review (
|
roborev: Combined Review (
|
|
Addressed the combined review ( Fixed:
Not applicable:
|
|
@tlmaloney since this is a large PR it's going to be hard to get a clean review, so it might make sense to pause here so I can have a chance to review some of the high level details |
Agreed, I am not having luck converging on a passing verdict. I'll pause and happy to squash / resubmit / or whichever direction makes sense. |
|
I'm going to cut 0.14.0 before getting to this because there's a good amount of work to do here, mainly around making sure the design / ergonomics are right. I have a lot going on right now so just bear with me! |
No worries, I went down a bit of a review-fix rabbit hole and should have stopped earlier to align on approach. Happy to wait for you to free up. |
|
@wesm If it's any easier, I can split this PR over a few, here's a suggested partition. By the first commit, there was already 7k new additions. Either way, we need to make sure the approach is right. |
|
I'll get there, just please bear with me! I am working on a bunch of parallel projects and want to make sure this PR doesn't get rushed. I have a postgres db that I already use for roborev so I can go through the e2e testing motion |
- feat: add push-only PostgreSQL sync - fix: address review findings for pg push sync - fix: remove hardcoded PG credentials from integration tests - feat: add PostgreSQL read-only server mode - fix: address review findings for PG read-only mode - feat: add machine filter to session list sidebar - fix: address PR review findings for PG sync - fix: address second round of PR review findings - fix: address third round of PR review findings - fix: address fourth round of PR review findings - fix: warn about ignored options in pg-read mode - fix: add missing space in PG schema updated_at DEFAULT clause - fix: address sixth round of PR review findings - docs: add PostgreSQL sync and read-only mode documentation - fix: address seventh round of PR review findings - fix: address eighth round of PR review findings - test: add missing coverage for PG sync config and error handling - fix: address ninth round of PR review findings - fix: address tenth round of PR review findings - fix: improve PG session version signal and DSN parsing - fix: run schema migration in pg-read startup - fix: return empty slices instead of nil in PG read-only stubs - fix: make schema migration non-fatal in pg-read mode - fix: only suppress read-only PG errors in pg-read migration - fix: add schema compat probe and typed SQLSTATE detection for pg-read - fix: close rows in CheckSchemaCompat to prevent connection leak - refactor: extract shared DSN utilities into internal/pgutil - fix: gate local-only handlers in pg-read mode and remove dead code - fix: address branch review findings for pg-read mode - fix: add ESCAPE clause to ILIKE and fix top sessions time filtering - fix: use E-string syntax for ILIKE ESCAPE clause - fix: address third round of PR review findings - fix: compute duration ranking in Go instead of SQL ::timestamp cast - fix: use exact duration for ranking, skip unparseable timestamps - test: add unit tests for top sessions duration ranking - fix: add missing created_at column to pgdb test DDL - fix: drop and recreate tables in pgdb test DDL - fix: check DROP TABLE errors in pgdb test DDL - fix: use DROP SCHEMA CASCADE in pgdb test DDL - fix: address full-branch review findings - fix: add test coverage and make push message errors per-session - fix: exclude failed sessions from push boundary state - fix: do not advance last_push_at when sessions fail - fix: make pushSession errors per-session and add ctx check in analytics - fix: make all push loop errors per-session skip - fix: surface PushResult.Errors in CLI and periodic sync output - fix: defer push error exit until after pg-status output - fix: reject empty search term in PG ILIKE search - fix: fingerprint all pushed sessions to avoid redundant re-pushes - fix: enforce SSL for non-loopback PG, add ping timeout, fix first-push boundary state - fix: address review findings for SSL enforcement and push dedup - fix: hostaddr bypass, fingerprint merge on repeated failures, error propagation - test: add coverage for fingerprint merge and loopback hostaddr override - fix: use pgconn.ParseConfig for DSN parsing in SSL enforcement - test: re-add default sslmode test case for remote PG hosts - fix: add missing allowInsecure arg to pgtest-tagged test files - fix: add missing allowInsecure arg to remaining pgsync test calls - fix: add missing allowInsecure arg to TestNewRejectsEmptyURL
|
I'm working on refactoring this to make some structural changes, so don't push any more work. We need containerized integration tests so I'm working on that too |
Sounds good! Thank you. I spun up a Dockerized postgres, but didn't add that to the repo. |
- Use min() builtin instead of if-clamp in messages.go - Use maps.Copy instead of manual loop in push.go Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- pg.url (was postgres_url) in bare env var warning - pg.machine_name (was pg_sync.machine_name) in reserved name error - allow_insecure under [pg] (was allow_insecure_pg) in SSL error - Rename misleading "legacy" test names to describe actual behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Force UTC timezone on all PG connections via DSN parameter, and use AT TIME ZONE 'UTC' in DATE() filters for defense in depth - Replace COALESCE(timestamp, NOW()) with NULLS LAST for stable search result ordering - Use COALESCE(updated_at, created_at) instead of NOW() for stable session version detection - Disable remote_access in pg serve (read-only, has own --host flag) - Fix --full flag help text to describe both local resync and PG push Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CLAUDE.md: update config description (TOML, not legacy migration) - CLAUDE.md: architecture diagram says "pg serve" not "pg-read mode" - README.md: pg serve examples use CLI flags, not positional DSN arg - README.md: clarify loopback is a default, not enforced Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
roborev: Combined Review (
|
- When pg serve binds to a non-loopback host, enable remote access and auto-generate an auth token (printed to stdout). Keeps remote access off for localhost binding. - Add session_id and ordinal tie-breakers to search ORDER BY for deterministic pagination. - Update README to document pg serve auth behavior accurately. - CLAUDE.md: mention JSON migration in config description. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reuse the existing isLoopbackHost helper (which uses net.ParseIP().IsLoopback()) instead of comparing three string literals. Handles 127.0.0.2 and other loopback addresses correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Write SchemaVersion (currently 1) to sync_metadata on EnsureSchema. Only bumps forward, never downgrades. Add GetSchemaVersion helper for future migration detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
roborev: Combined Review (
|
pg serve now accepts --base-path (e.g. --base-path /agentsview) for deployment behind a reverse proxy on a subpath. When set: - http.StripPrefix removes the prefix before routing, so all existing routes work unchanged - index.html is rewritten at serve time to prefix asset paths (href="/assets/..." -> href="/agentsview/assets/...") and inject a <base href="/agentsview/"> tag - Frontend derives API base from document.baseURI instead of hardcoding /api/v1, so API calls go through the proxy path Hash-based SPA routing (#/sessions, #/analytics) is unaffected since the hash fragment never hits the server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Group sessions into batches of 50 per transaction instead of one transaction per session - Use multi-row VALUES for message inserts (100 per batch) and tool_call inserts (50 per batch) instead of row-by-row - Fix batch error handling: track per-batch pushed/message counts separately and undo correctly on rollback or commit failure - Add integration tests for batch push (75 sessions across 2 batches) and bulk message insert (250 messages with tool calls) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Guard StripPrefix with full path-segment check so /basepathFOO returns 404 instead of being handled as a sibling path - Only derive frontend API base from document.baseURI when a real <base href> tag exists; fall back to /api/v1 otherwise so SPA fallback pages on non-root URLs produce correct API paths - Add TestBasePath_RejectsSiblingPath test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rite - Extract pushBatch method from inline batch loop - On batch failure, retry each session individually so one bad session doesn't block up to 49 neighbors in the same batch - Add explicit UPDATE sessions SET updated_at = NOW() after pushMessages rewrites rows, ensuring updated_at is bumped even when pushSession's upsert is a no-op (metadata unchanged) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
buildAllowedHosts only read from public_url (singular), so public_origins whitelisted CORS but the host check middleware still rejected requests with the origin's Host header. This broke reverse-proxy deployments (e.g. Caddy on a tailnet) where the proxy forwards the external hostname. Now buildAllowedHosts also iterates over PublicOrigins and calls addHostHeadersFromOrigin for each, so any origin trusted for CORS is also trusted as a Host header value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PostgreSQL strictly enforces valid UTF-8. Tool call content from parsed sessions can contain null bytes (0x00) and truncated multi-byte sequences (e.g. an ellipsis character cut after 1 of 3 bytes), causing bulk insert failures on 8 sessions. Three fixes: - Add sanitizePG() that strips null bytes and replaces invalid UTF-8 via strings.ToValidUTF8; apply it to all text fields going into PG (nilStr, nilIfEmpty, message content) - Fix truncate() in parser and truncateStr() in export to operate on rune boundaries instead of byte offsets, preventing creation of invalid UTF-8 at truncation points - Clarify watermark behavior: advance only on zero errors, keep at lastPush on errors so failed sessions are retried (already-pushed sessions are fingerprint-skipped cheaply) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review #10271: pushBatch no longer increments result.Errors — errors are only counted for sessions that fail their individual retry after a batch rollback. Extracted batchResult struct to separate concerns. Review #10277: HTTPS host header test now covers both bare hostname (viewer.example.test) and explicit port (:443) since browsers omit the default port. Review #10279: nilStr and nilIfEmpty now sanitize before checking emptiness, so inputs like "\x00" correctly return nil instead of "". Sanitization scoped to content-carrying fields (project, message content, nullable text) — IDs, machine names, roles, and agent names are tool-generated ASCII. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Getting close here. I ran into a cascade of problems pushing my agent session data to postgres and running |
Thank you!! Looking at the first commits is super informative, I see the spec you created and refined. I am curious about your process. I ran the integration tests and all pass. I also ran roborev branch reviews with Claude/Codex and they surfaced some high/medium issues. Is CI just a HEAD review? I'll share more tomorrow as you're still working on the frontend issues and it's getting late. |
When the local watermark says we've pushed before but PG has zero sessions for this machine, the PG side was reset (schema dropped, DB recreated, etc.). Push now detects this mismatch and automatically forces a full resync instead of only pushing recently modified sessions. Add TestPushDetectsSchemaReset integration test that pushes a session, drops and recreates the PG schema, then verifies an incremental push auto-detects the reset and re-pushes everything. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add sanitizePG to tool_name, category, and tool_use_id in bulkInsertToolCalls to handle null bytes and invalid UTF-8 - Clear schemaDone memoization in coherence check so Push can recreate the schema after it has been dropped externally - Call normalizeSyncTimestamps after schema reset detection to ensure timestamps are set up before schema recreation - Fix TestPushDetectsSchemaReset to exercise the real reset path instead of manually calling EnsureSchema (which was memoized) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When Push(ctx, true) is called after the PG schema has been dropped, schemaDone was still memoized from the initial push, causing EnsureSchema to skip and inserts to fail against missing tables. Clear schemaDone and re-run normalizeSyncTimestamps when full is true, not only in the coherence check path. Add TestPushFullAfterSchemaDropRecreatesSchema regression test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
UpsertSession takes no context parameter. Fixes compile error in TestPushFullAfterSchemaDropRecreatesSchema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
I took a look at the API and the code structure and proposed some changes that I preferred, and used obra/superpowers to create and execute the spec, then used roborev to address bugs and problems left behind by the superpowers implementation pass (because superpowers doesn't do much review aside from evaluating spec conformance, so they two systems actually work very well together). the CI reviews are just branch reviews with roborev (2 reviews with codex and 1 with gemini) |
roborev: Combined Review (
|
- Use relative paths in index.html (./favicon.svg, ./src/main.ts) so assets resolve correctly under reverse-proxy subpath deployments - Set Vite base to "./" so built output uses relative asset paths - CI stub: create index.html (not stub.html) so base-path tests that serve the SPA root find a valid HTML document - Add unreachable return after t.Fatal in openclaw_test.go and statefile_test.go to silence staticcheck SA5011 false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Merging this! thanks @tlmaloney |
|
I'll plan a release and start working on documentation for this tomorrow (I had my agent do a write-up on how I deployed this behind caddy on my tailnet at a path like |
Summary
Adds PostgreSQL support as an optional remote backend for multi-machine shared access to session data. Three new commands under
pg:pg push— one-way sync from local SQLite to PostgreSQL. Incremental by default (only sessions modified since last push), with--fullto re-push everything. Batched transactions (50 sessions/tx) with per-session retry fallback on batch failure. Multi-row INSERT for messages and tool calls. Fingerprint-based change detection skips unchanged sessions on incremental pushes.pg serve— read-only HTTP server backed by PostgreSQL instead of SQLite. Same API and frontend, no local SQLite dependency. Write operations (stars, pins, rename, etc.) returnErrReadOnly; the frontend adapts accordingly.pg status— show sync state (last push time, session/message counts).Implementation details
internal/postgres/package: connection management, SSL enforcement, schema DDL with version tracking, push sync, read-only store with sessions/messages/search/analyticsdb.Storeinterface extracted fromdb.DBso the HTTP server works identically against SQLite or PostgreSQL~/.agentsview/config.toml), with automatic migration of existing JSON configsallow_insecure = trueunder[pg]overrides with a warningTimeZone=UTCfor consistent date filtering--base-pathflag for reverse-proxy subpath deployment with<base href>injectionpublic_originsconfig expands the host allowlist for trusted reverse-proxy originsNot in scope (follow-up issues)