feat: add recall export with JSON, Markdown, SQL dump, and SQLite backup formats#59
Conversation
…Lite backup formats - recall export --format json|markdown|sql|sqlite with file/dir/stdout output contract - explicit provenance on every provenance-bearing row; NULL exported as 'unknown' - manifest with version, schema, counts, provenance counts incl. unknown - --backup: timestamped SQL dump to ~/.agents/Recall/backups/, never overwrites - keyset-paginated reads sized by shared SQLITE_SAFE_CHUNK_SIZE for large exports
…qlite dir-mode paths - hoist the provenance-bearing table list to types/index.ts (was duplicated between lib/export.ts and the provenance backfill's BACKFILL_TABLES) - name manifest.json path once in the sqlite directory branch
…s; unify text-format rendering Review findings from the pre-PR pass: - a failed VACUUM INTO no longer leaves a manifest.json describing a backup that was never created - renderAppExport/renderSqlExport collapsed into renderTextExport; ARTIFACT_EXT narrowed to the formats that reach it - document why --format has no Commander default and the id-PK invariant behind keyset pagination; docs note trailing-slash rule for new dirs
Review — PR #59
|
| Contract | Result |
|---|---|
| Provenance portability | ✅ Verified in real output: every provenance-bearing row exports with an explicit provenance field; legacy NULL renders as literal unknown in JSON ({"decision":"legacy decision","provenance":"unknown"}) and Markdown (3 **provenance:** unknown lines = exactly the 3 seeded legacy rows); sessions gets no invented field. |
| NULL-in-SQL-dump decision | ✅ Endorsed — it is not just defensible, it is required. Empirically proven: INSERT … provenance='unknown' is rejected by the schema CHECK (CHECK constraint failed: provenance IN ('verbatim', 'user_authored', 'extracted', 'derived')), while the actual dump restores cleanly through sqlite3 into a fresh DB with counts matching (1200/2/1), NULL preserved, and quote/newline content round-tripping. renderSqlDump correctly bypasses toExportRow (src/lib/export.ts:266). |
| stdout purity | ✅ No --output: exactly one JSON value on stdout (jq -s 'length' → 1), summary on stderr. All error paths verified to leave stdout at 0 bytes with exit 1. |
| Backup safety | ✅/c40733a fix works). Overwrite refusal and sequential -N suffixing verified. But see Blocking #1: the no-overwrite guarantee fails under concurrency. |
| Chunking convention | ✅ Keyset pagination WHERE id > ? ORDER BY id LIMIT ? — fixed 2-param bind; verified 1,200 rows export complete and ordered across 3 batches. All six EXPORT_TABLES confirmed INTEGER PRIMARY KEY AUTOINCREMENT (schema.ts). Audit-note convention from src/lib/chunk.ts followed in both new files. No bind list anywhere scales with row count. |
| PROVENANCE_TABLES hoist | ✅ Single source in src/types/index.ts:14; src/commands/provenance.ts:34 and src/lib/export.ts:40-41 both consume it; the old BACKFILL_TABLES duplicate is removed. Backfill import verified correct (table validation + error message both use the canonical list). |
Blocking
1. --backup's "never overwrites" guarantee is check-then-act — concurrent backups silently destroy each other. Reproduced: 4 concurrent recall export --backup invocations in the same second produced 1 surviving file — all four processes resolved the same path via existsSync (src/lib/export.ts:74-83) and writeFileSync with default flag 'w' truncated three of the writes, each reporting success with exit 0. This violates the issue #43 acceptance criterion in the one mode whose documented contract is never overwrite, and concurrent agent sessions are this tool's normal operating environment. Fix is small: write with { flag: 'wx' } and advance to the next -N candidate on EEXIST (applies to --backup at src/commands/export.ts:96 and dir-mode artifacts at :162).
2. A failed VACUUM INTO leaves a corrupt partial file, and the retry error then actively misleads. SQLite documents that an interrupted VACUUM INTO can leave an incomplete output file the application must delete. Scenario: disk fills mid-backup → error + exit 1 (correct so far), but a corrupt partial backup.db remains; the user frees space, retries, and hits Refusing to overwrite existing database backup at … (src/commands/export.ts:130-131) — implying a valid backup exists where a corrupt remnant sits. For a disaster-recovery tool that is the worst failure shape available. Fix: wrap the VACUUM (src/commands/export.ts:136), unlinkSync(target) on failure, rethrow.
3. No test pins the SQL-dump-keeps-NULL behavior — the PR's central deliberate decision is unguarded. Mutation check: if renderSqlDump were refactored to reuse the normalized collectExportData (a plausible change given this repo's DRY mandate), every existing test still passes — and restores would then silently corrupt legacy NULLs into the literal string 'unknown'… except they wouldn't restore at all, because the CHECK constraint rejects it. One round-trip test covers both the NULL fidelity and the "restorable into an empty database" claim (src/lib/export.ts:244): execute the dump into a fresh Database(':memory:'), assert legacy provenance restores as null and the dump contains no 'unknown' literal.
4. CHANGELOG. ## [Unreleased] has no entry for a major user-facing CLI surface; docs/releasing.md makes the changelog part of the release flow. Two lines under ### Added.
Non-blocking (recommended)
- Repeat dir-mode exports clobber
manifest.jsonwhile artifacts accumulate viaresolveNonClobbering— N artifacts, 1 manifest, andExportManifesthas no field naming the artifact it describes (src/commands/export.ts:141, 161-163). If "manifest describes the latest export" is the intent, document it in the header comment and docs/cli-reference.md, and consider adding the artifact filename(s) to the manifest. - Latent silent truncation in
collectTableRows(src/lib/export.ts:113): a future EXPORT_TABLES entry with a TEXT/UUID pk would ship partial exports with exit 0 — the only guard is a comment. Cheap insurance: validatelastIdis a finite number, or cross-checkmanifest.counts[t]againstdata[t].length(both already computed). - Two comment mis-citations: src/lib/export.ts:11 claims a "search/recent display contract" —
recent.tsrenders no provenance at all (search.ts does); src/lib/export.ts:8 cites ADR-0001 for the NULL→'unknown'export rendering rule, but ADR-0001 only covers write-path stamping — the rendering rule comes from issue Add mem export with JSON, Markdown, SQL dump, and SQLite backup formats #43 (and Add Record Provenance across memory records #42's storage semantics). - Minor: error catch in src/index.ts:655 discards stack/error-class with no debug escape hatch; ENOSPC on the dir-mode manifest write orphans an artifact without naming it; empty-DB export and FTS-shadow-exclusion (
listPhysicalTables) have no test;sqlQuotesilently maps non-finite numbers to NULL (tested but undocumented).
Strengths
Genuinely strong PR. Real-DB tests at the right altitude throughout (the SQLite backup test opens the copy and counts rows; the chunking test spans 3 real batches via SQLITE_SAFE_CHUNK_SIZE * 2 + 50; the no-overwrite test re-runs with an identical timestamp). The --format-has-no-Commander-default comment is load-bearing and exactly right. Every deliberate error message names the conflicting flags and the rule. No SQL injection surface — every interpolated table name derives from hardcoded constants. The manifest-after-VACUUM ordering fix in c40733a is correct and verified under forced failure.
Verdict: REQUEST CHANGES — items 1 and 2 are small fixes in the disaster-recovery path (both empirically demonstrated or SQLite-documented), item 3 is one test guarding this PR's defining decision, item 4 is two lines. Everything else is follow-up material. Happy to re-review immediately on push.
…UM cleanup, round-trip test, changelog - --backup and dir-mode text artifacts now claim their filename atomically (writeFileSync flag 'wx', advance to next -N candidate on EEXIST) — four concurrent same-second backups verified to produce four distinct files; resolveNonClobbering retained only for the VACUUM INTO target, where SQLite itself refuses an existing file - a failed VACUUM INTO now unlinks its partial output so a retry is not blocked by the refuse-to-overwrite guard on a corrupt remnant - round-trip test pins that SQL dumps keep NULL provenance (never the 'unknown' display literal) and restore cleanly into an empty database - CHANGELOG: Unreleased entry for recall export - fix two comment mis-citations (search-only display contract; rendering rule comes from #43, not ADR-0001)
|
Review fixes pushed in 320ee04 — all four blocking items addressed:
Also took the allowed trivial fix: both comment mis-citations corrected (search-only display contract; rendering rule attributed to #43 rather than ADR-0001). Non-blocking items left untouched as directed. Gate: |
Re-review — delta
|
Closes #43
Summary
Adds
recall export— portable and disaster-recovery export paths for Recall memory, built on the merged #41 (chunking) and #42 (Record Provenance) foundations.--format json | markdown— app-level export of the six durable tables (sessions,messages,decisions,learnings,breadcrumbs,loa_entries). Every row of a provenance-bearing table carries an explicitprovenancefield; legacyNULLis exported as the literalunknown— never omitted, never guessed.sessions(no provenance column) gets no invented field. Embeddings excluded.--format sql— textual SQL dump (CREATE TABLE fromsqlite_master+ per-row INSERTs) of the same durable tables. Restorable into an empty DB; one-command restore intentionally out of scope per the issue.--format sqlite— full database backup viaVACUUM INTO(includes embeddings and internal tables). Binary, so it always requires--output; refuses to overwrite.--backup— timestamped SQL dump to~/.agents/Recall/backups/, creating the directory if needed, never overwriting (-Nsuffix on collision), printing the output path.Output contract
--output: data → stdout, manifest summary → stderr (stdout stays pipeable — verified by a test thatJSON.parses captured stdout)--output <file>: single export file, manifest summary → stdout--output <dir>: artifact +manifest.json(always written in dir mode)Manifest carries Recall version, timestamp,
PRAGMA user_version, format, included tables, per-table counts, per-table provenance counts includingunknown(explicit even at zero), and the embeddings flag.Design notes
WHERE id > ? LIMIT ?) sized by the sharedSQLITE_SAFE_CHUNK_SIZE, so no bind list ever scales with row count — documented in the module header following thechunk.tsaudit-note convention. A test exports >2 batches and asserts no dropped/duplicated rows.NULLprovenance deliberately: the schema CHECK constraint only admits the four real values or NULL; dumping the string'unknown'would make the dump unrestorable. The explicit-unknowncontract applies to JSON/Markdown per the issue.PROVENANCE_TABLEShoisted tosrc/types/index.tsas the single source of truth (was about to be duplicated against the backfill'sBACKFILL_TABLESlist insrc/commands/provenance.ts— both now import it).src/lib/export.tsare pure functions over an open DB handle + plain data, exported for the upcoming property suites (not written here, per scope).Testing
tests/commands/export.test.ts,tests/lib/export.test.ts) covering every issue-mandated scenario: provenance presence, explicit-unknown, manifest counts/provenance counts, SQL dump creation, SQLite backup creation + embeddings inclusion, backup no-overwrite, file-vs-dir behavior, stdout piping, large-export pagination.bun run lintclean.VACUUM INTOsucceeds (a failed vacuum can no longer leave a manifest describing a nonexistent backup).Known follow-ups (pre-existing, intentionally untouched)
src/commands/migrate.tsandsrc/commands/doctor.tscarry their own inline timestamp-slug and~/.agents/Recall/backupspath constructions; they could adopt the newtimestampSlug()/defaultBackupDir()helpers in a follow-up.Post-Deploy Monitoring & Validation
CLI-only feature, no server/runtime impact. After release: run
recall export | headandrecall export --backupon a real install and confirm the backup file lands under~/.agents/Recall/backups/. Failure signal: anyExport failed:stderr output on a healthy DB. No additional operational monitoring required.