Skip to content

docs(corpus): #65 PR 4 of 4. spec/34 LOCKED + arc closer + Eleven status flip#316

Merged
dep0we merged 6 commits into
mainfrom
corpus-pr4-arc-closer
Jun 1, 2026
Merged

docs(corpus): #65 PR 4 of 4. spec/34 LOCKED + arc closer + Eleven status flip#316
dep0we merged 6 commits into
mainfrom
corpus-pr4-arc-closer

Conversation

@dep0we
Copy link
Copy Markdown
Owner

@dep0we dep0we commented Jun 1, 2026

Summary

PR 4 of 4 of the CorpusBackend arc (#65). The arc closer. Doc-heavy lock pass.
Eleven backend protocols shipped; only MCPServerRegistryBackend (#201)
remains for v1.0 close.

6 commits, all bisectable:

  1. 5ab1950 spec/34 RFC to LOCKED + spec/24 Decision 7 addendum naming CorpusBackend as the source of truth for wiki/ and raw/.
  2. afa2575 refresh CLAUDE.md (canonical 11th lock-paragraph + ASCII architecture diagram flip from Corpus 🟡 to Corpus ✅ (locked at #65 PR 4)) + README.md (backend-protocols table row flips to ✅ Shipped) + repo-root ROADMAP.md (Ten to Eleven of twelve, [backend] CorpusBackend — wiki/raw knowledge storage abstracted from filesystem walks #65 row removed).
  3. 121eb8a cross-spec CorpusBackend references (spec/01 anatomy, spec/02 atomic memory, spec/04 runtime assembly step [7] note, spec/26 future-tense to present-tense, spec/31 protocol list) + reference impl docstring scrub (corpus/init.py, types.py, backend.py + 1 test file) per PersonaBackend PR 4 commit 93dad48 precedent.
  4. e725dcc CHANGELOG arc-closer (3 bullets under [Unreleased] mirroring PersonaBackend PR 4 shape) + non-spec doc test counts and protocol counts refreshed (programmatic.md, methodology.md, CONTRIBUTING.md).
  5. d149382 Round 2 convergence: fold tests/test_corpus_registry.py into the canonical PR 4 record (Round 1 adversarial caught the inverse-phantom omission).
  6. 9fd650b Round 3 convergence: doc-release sweep landed 5 stale-marker fixes (3 in spec/34 body, 2 in corpus reference impl docstrings) that the Stream A and Stream F sweeps both missed.

The status flip is the PR 4 deliverable. PR 3 (3d82c84 / #304) explicitly preserved the "Ten" wording so this PR's CHANGELOG arc-close bullet, ASCII diagram, status block, and lock-paragraph all flip together.

What landed

Test Coverage

No new code paths; doc-only PR (apart from docstring scrubs in 4 reference impl modules + 1 test file). Test suite count unchanged: 2937 collected (2889 passing + 48 skipped, 41 warnings) on Python 3.11. Zero regressions.

uv run pytest -q ran after Stream F (docstring scrubs) completed and again on the 8 corpus test modules after Round 2 + Round 3 fix commits. Result: identical counts, identical warnings, zero perturbation.

Pre-Landing Review

Pre-impl prep pass (/plan-subagent methodology) ran 5 parallel Sonnet subagents covering: (1) spec/34 N-MUST count audit from shipped surface; (2) spec/34 LOCK readiness + per-PR marker enumeration; (3) cross-spec parity + status-flip surfaces audit; (4) stale-marker scrub candidates in reference impls + tests; (5) CHANGELOG arc-closer drafting + 10 follow-up issue templates.

Findings rolled up: 0 SEVERE + 1 HIGH internal-consistency in the doc-heavy lock scope (Subagent 4 found 10 stale temporal markers in shipped Python code; folded into the implementation as commit 3). 3 MEDIUM cross-file count drifts (test count drifted across CONTRIBUTING.md, methodology.md, programmatic.md by varying amounts; folded into commit 4). Plus 1 load-bearing decision: final N for the Implementer Contract was 9 (not 8 as Subagent 1 initially recommended). The 8-vs-9 decision was surfaced to the maintainer; recommendation was for 9 to mirror PersonaBackend spec/33's structural pattern exactly with close() documented at the Protocol surface rather than elevated to a numbered MUST.

Track record extended: 22+ SEVERE + 30+ HIGH across 14 prep passes in the post-#285-revert streak.

Single-round Opus adversarial review post-implementation (Sonnet adversarial doc-consistency review per CLAUDE.md taste rule 11; minimal review army for doc-only PR per the project methodology).

Round 1 (adversarial): 0 P0 + 1 P1 + 0 P2.

  • P1 (inverse phantom): CLAUDE.md 11th lock-paragraph and spec/34's PR 1 test coverage section both omitted tests/test_corpus_registry.py, which is a real file with 4 tests shipped in PR 1 and cited in CHANGELOG's PR 1 bullet. Mirrors PersonaBackend PR 4 Round 1's phantom-file risk in the opposite direction. Fixed in commit d149382.

Round 2 (doc-release sweep, Step 18): 0 P0 + 5 FIX_NOW + 1 FOLLOW_UP.

  • FIX_NOW: 3 stale future-tense PR markers in spec/34 body (line 54 module-layout # SQLite ships in PR 2: header; line 233 (PR 3) migration-target callout in Protocol surface; line 818 "PR 3's call-site migration scope" in implementation notes).
  • FIX_NOW: 2 missed items in corpus reference impl docstrings (corpus/types.py used the word "provisional" in a locked spec's reference impl; corpus/backend.py used future-tense "the PR 3 call-site migration" language; Stream F's scrub missed both).
  • All 5 FIX_NOW items landed inline as commit 9fd650b BEFORE PR creation rather than fix-forward post-creation (matches PersonaBackend PR 4's ad6723b + e1d05cf pattern but tighter).
  • FOLLOW_UP: TENSIONS.md T9 carries pre-landing predictive language ("21 spec docs today," "expect ~26 spec docs," "Around spec doc docs: refresh README status table #25 (~CorpusBackend land)"). Numbers and prediction stale; substantive tension still valid. Filed as [docs] TENSIONS.md T9 stale predictive language about spec surface count #315 follow-up.

Round 3 not run separately; the doc-release commit IS the convergence step. Full pytest re-run after R3 fix on the 8 corpus test modules: 196 passed, zero regressions. Full suite expected stable at 2889 + 48 skipped (no executable Python changed; only docstrings + a comment line).

Plan Completion

All deliverables in the original brief landed (DONE):

Item Status
Open corpus-pr4-arc-closer branch off main DONE
/plan-subagent prep pass (5 parallel Sonnet subagents) DONE
File 10 follow-up issues at #305 through #314 DONE
spec/34 LOCK (drop RFC banner, finalize Implementer Contract MUSTs) DONE (N=9 locked)
CLAUDE.md 11th lock-paragraph + diagram flip + count bumps DONE
README backend-protocols table row + spec list + Status block DONE
Both ROADMAPs refreshed (repo-root + vault) DONE
spec/24 Decision 7 addendum DONE
Cross-spec refs (spec/01, spec/02, spec/04, spec/26, spec/31) DONE
spec/27 corpus-backend doctor entry VERIFIED (PR 3 already added)
Reference impl docstring scrub (corpus modules + 1 test) DONE
Non-spec doc count refreshes (programmatic.md, methodology.md, CONTRIBUTING.md) DONE
CHANGELOG arc-closer (3 bullets) DONE
Single-round Opus adversarial sanity pass DONE (R1: 1 P1, fixed)
Step 18 doc-release subagent sweep DONE (5 FIX_NOW + 1 FOLLOW_UP)
/ship end-to-end with the full pipeline DONE

Plus 1 additional follow-up issue surfaced by doc-release (TENSIONS.md T9 stale predictive language): filed as #315, not blocking PR 4.

Documentation

docs/spec/34-corpus-backend.md flipped from RFC to LOCKED. docs/spec/24-agent-profile-backend.md Decision 7 received the CorpusBackend ownership addendum. docs/spec/01-anatomy.md, docs/spec/02-atomic-memory.md, docs/spec/04-runtime-assembly.md, docs/spec/26-cascade-bundle.md, and docs/spec/31-llm-backend.md gained CorpusBackend cross-references. docs/deployment/programmatic.md, docs/methodology.md, and CONTRIBUTING.md test counts and shipped-backend counts refreshed.

spec/27 (doctor catalogue) corpus-backend entry confirmed present at line 384 (PR 3 inline status flip already added it).

Documentation debt (deferred, filed as follow-up)

Surfaced during the Step 18 doc-release sweep; not in PR 4 scope:

Test plan

  • uv run pytest -q passes: 2889 passed + 48 skipped, zero regressions
  • uv run pytest -q re-run after R2 + R3 fix commits on 8 corpus modules: 196 passed, zero regressions
  • spec/34 internally consistent at LOCKED status (no RFC banner, no "(PR 3 -- implemented)" subheaders, no "(to be updated at PR 4)" annotations, the 9-MUST Implementer Contract is final)
  • CLAUDE.md + README.md + repo-root ROADMAP.md + vault ROADMAP.md all uniformly say "Eleven of twelve" / "11 of 12" / "eleven backend protocols shipped"
  • 11th CorpusBackend lock-paragraph cites real test files (all 8 verified present on disk after R2 fix)
  • All 10 follow-up issues filed via gh issue create with backend + bug / polish / v0.1-followup labels matching project convention
  • Branch protection on main honored (PR-only merge path)

After this PR lands, the CorpusBackend arc CLOSES. Eleven of twelve backend protocols shipped for v1.0; only MCPServerRegistryBackend (#201) remains.

Closes #65.

🤖 Generated with Claude Code

Dan Powers and others added 6 commits June 1, 2026 13:55
…ndum

spec/34 flips from RFC to LOCKED. RFC banner and 4-PR shipping-plan provenance
block replaced with a single locked-status line. Per-PR temporal markers
throughout the body consolidated to present-tense lock prose (capability
declarations, "Per-runner kwargs (PR 3 -- implemented)" subheaders, "SQLite
hybrid layout (PR 2)" header, "Call-site migration reference (PR 3 --
implemented in #65 PR 3 of 4)" section title, "Follow-up issue filed at PR 4"
deferral markers). The "PR 4 documentation-update checklist" section (lines
847-864) deleted; self-referential scaffolding has no place in a locked spec.

Implementer Contract finalized at 9 normative MUSTs, mirroring PersonaBackend
spec/33's shape exactly with one extra MUST for the query() capability
precedence rule that CorpusBackend has via the FTS5 / semantic / substring
fallback ladder: (1) name and corpus charset validation at API boundary,
(2) side-effect-free construction, (3) capability honesty including
embedding_provider=None invariant, (4) query() capability precedence rule,
(5) write_page() 4-case behavior table, (6) URL credential redaction across
all operator-facing error paths, (7) cross-corpus isolation at storage layer,
(8) snapshot id determinism + cross-page isolation, (9) backend_id property
stability + close() idempotency. The merge in MUST 9 is honest: backend_id
is name-identity and close() is lifecycle-identity, both backend-identity
contracts.

spec/24 Decision 7 receives the CorpusBackend ownership addendum. The
existing "Why" paragraph previously said MemoryBackend owned wiki/, memory/,
and journal/. With CorpusBackend locked, the addendum clarifies: MemoryBackend
retains exclusive ownership of memory/ and journal/; CorpusBackend, when
registered, owns wiki/ and raw/. The two backends compose at prompt assembly
(agent.py:_load_indexes() reads from both).

18 distinct edits across 11 line ranges in spec/34. File 881 to 855 lines.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…to "eleven shipped"

CLAUDE.md adds the canonical CorpusBackend lock-paragraph (the 11th, mirroring
the 10 prior shipped-protocol bullets), flips the ASCII architecture diagram
from "Corpus 🟡" to "Corpus ✅ (locked at #65 PR 4)", bumps the spec-doc count
from "30 locked + 2 drafts" to "31 locked + 2 drafts", refreshes the live test
count to 2,937 collected (2,889 passing + 48 skipped) at 2026-06-01, and flips
the Status block from "Ten backend protocols shipped" to "Eleven backend
protocols shipped". Status tail flips from "remaining two protocols (Corpus /
MCPServerRegistry)" to "remaining protocol (MCPServerRegistry)".

README.md adds CorpusBackend to the shipped list in the Current limits
paragraph (replacing "filesystem-default-only today" with the locked
CorpusBackend summary including FTS5 + page-count cliff WARN + CLI + env-var
override), bumps the comparison-matrix locked-docs count from 30 to 31, adds
spec/34 to the spec list, flips the backend-protocols table row for
CorpusBackend from "Planned" to "✅ Shipped" with the locked summary cell,
flips the v1 direction sentence from "those two land" to "MCPServerRegistry
lands", bumps the repo-structure test count to 2937 collected (2889 passing),
and flips the Status block from "Ten of twelve" to "Eleven of twelve".

ROADMAP.md (repo root, public strategic narrative) flips line 11 from "Ten of
twelve" to "Eleven of twelve" with CorpusBackend appended to the shipped list
and "Two remain" to "One remains", removes the now-shipped #65 row from the
remaining-protocols table, and flips the ship-when sentence from "both
remaining backends" to "the remaining backend".

7 + 10 + 3 = 20 edits across 3 files. The vault ROADMAP at ~/ObsidianVault/
Atomic Agents/ROADMAP.md is refreshed out-of-band (not in the git repo).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e impl docstring scrub

Cross-spec cross-references propagate the CorpusBackend locked status to
adjacent spec docs:
- spec/01 (anatomy): adds a CorpusBackend cross-reference paragraph after the
  wiki/raw section explaining the Protocol seam and the SQLiteCorpusBackend
  GB-scale benefit.
- spec/02 (atomic memory): adds a CorpusBackend cross-reference paragraph
  after the "Why two layers" section naming the wiki/ + raw/ vs memory/ +
  journal/ ownership split.
- spec/04 (runtime assembly): adds an integration note after the canonical
  load order describing how step [7] routes through corpus_backend.
  render_index_summary("wiki") when CorpusBackend is registered.
- spec/26 (cascade bundle DRAFT): flips two future-tense references ("when
  CorpusBackend ships") to present-tense ("now that CorpusBackend has shipped,
  locked at #65 PR 4 of 4") and updates the composition table row to cite
  the specific render_index_summary("wiki") method.
- spec/31 (LLMBackend): appends "(spec/34)" link to the Corpus entry in the
  protocol-pattern list.

spec/27 (doctor catalogue) already has the corpus-backend entry from PR 3
inline status flip; no edit needed (verified).

Reference impl docstring scrub completes the per-PR-marker consolidation
sweep across shipped Python code:
- corpus/__init__.py: drops "scaffolding PR -- no behavior change today" and
  "in PR 3" temporals; rewrites the PRE-PR-3 wiring contract block to a
  present-tense locked-status block (the SQLiteCorpusBackend "DEFERRED"
  bullet is now FALSE since SQLite shipped in PR 2; only semantic search
  remains deferred to v1.1); drops "(wired in PR 3)" from
  get_default_corpus_backend docstring.
- corpus/types.py: drops "PR 1 of 4" + "PR 1, File 2 of 3" parentheticals;
  deletes the "Scaffolding PR (#65 PR 1 of 4)" paragraph entirely; drops
  "in PR 1 / PR 2 respectively" temporal.
- corpus/backend.py: replaces the 4-bullet per-PR shipping plan with a
  single locked-status line.
- test_corpus_sqlite_backend.py: drops "PR 2 of 4" from the module docstring.

Mirrors PersonaBackend PR 4 commit 93dad48's stale-marker scrub pattern.
All edits are docstring/comment only; no executable Python changed. 158
tests on the affected corpus modules continue to pass; full suite still
2889 passing + 48 skipped (zero regressions).

7 + 10 = 17 edits across 9 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… + protocol counts

CHANGELOG.md adds 3 bullets under [Unreleased] mirroring PersonaBackend
PR 4's arc-summary shape:
- ### Changed: framework status flip from "ten of twelve" to "eleven of
  twelve backend protocols shipped" with operator-user outcome lead
  (pin SQLite via one env var for indexed FTS5 query at GB scale; doctor
  surfaces page-count cliff WARN; CLI honors env var; legacy paths
  soft-degrade gracefully on UnicodeDecodeError + OSError; IRON RULE
  byte-identity preserved). Cites all 4 PRs (#297, #298, #304, this PR)
  and all 10 follow-up issues (#305-#314).
- ### Changed: spec/24 Decision 7 addendum naming CorpusBackend as the
  source of truth for wiki/ and raw/ (cross-spec ownership propagation).
- ### Documentation: spec/34 LOCKED + doc-release sweep landed. Names
  the 9-MUST Implementer Contract finalization, the per-PR marker scrub
  across spec body + reference impls + tests + strategic docs, and the
  cross-spec cross-references.

docs/deployment/programmatic.md: protocol-pattern paragraph flipped from
"Ten backend protocols have shipped" to "Eleven backend protocols have
shipped"; CorpusBackend added to the enumerated list; spec/34 added to
the spec doc list; "two remain" flipped to "one remains" with
MCPServerRegistryBackend the only remaining protocol.

docs/methodology.md: "today ten are shipped" flipped to "today eleven
are shipped" with CorpusBackend appended; test count bumped from 2686+
to 2937+.

CONTRIBUTING.md: stale "2401 tests today" (drifted across multiple arcs)
refreshed to "2937 tests today".

3 + 1 + 1 + 1 = 6 edits across 4 files.

Test suite stable at 2889 passing + 48 skipped (Python 3.11/3.12).
This is the PR 4 of 4 arc closer. After merge, the CorpusBackend arc
CLOSES. 11 of 12 backend protocols shipped for v1.0; only
MCPServerRegistryBackend (#201) remains.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… locked record

Adversarial Round 1 caught an inverse-phantom in the CLAUDE.md 11th
CorpusBackend lock-paragraph: the canonical "locked at PR 4 with..."
test file list omitted `tests/test_corpus_registry.py`, which is a real
file with 4 tests shipped in PR 1 of the arc and cited in CHANGELOG's
PR 1 bullet. spec/34's §"Test coverage" PR 1 section also did not
enumerate it. The omission is small but it creates inconsistency
between the canonical PR 4 record (CLAUDE.md lock-paragraph) and the
actual locked test surface.

This is the PersonaBackend PR 4 Round 1 phantom-file failure shape in
the opposite direction: a real-but-uncited file rather than a
cited-but-nonexistent file. Same risk surface; same fix discipline.

Fixes:
- CLAUDE.md line 15 lock-paragraph: insert `tests/test_corpus_registry.py`
  between `test_corpus_sqlite_backend.py` and `test_corpus_composition.py`
  in the locked-at-PR-4 test file list.
- spec/34 §"Test coverage" PR 1 section: add a 4-bullet sub-list under
  the `tests/test_corpus_registry.py` heading naming the registry
  primitives the tests cover (register / unregister round-trip and
  collision-replace; get_corpus_backend raises on unknown id;
  list_corpus_backends ordering; get_default_corpus_backend env var).

Round 1 finding count: 0 P0, 1 P1, 0 P2. This commit lands the Round 2
convergence; full pytest still 2889 passing + 48 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 18 doc-release subagent caught 5 stale per-PR temporal markers that
the Stream A spec/34 LOCK sweep and the Stream F reference-impl docstring
sweep both missed. PersonaBackend PR 4's doc-release subagent caught the
exact same shape (2 finds folded into commits ad6723b + e1d05cf); for
PR 4 of #65 the equivalent finds are landed here BEFORE PR creation
rather than fix-forward post-creation.

Three spec/34 body edits:
- Line 54 module-layout code block: drop "# SQLite ships in PR 2:"
  comment header (stale future-tense; SQLite shipped in PR 2 of this arc).
- Line 233 Protocol surface docstring: drop "(PR 3)" parenthetical from
  the render_index_summary migration-target comment.
- Line 818 implementation notes: rewrite "PR 3's call-site migration scope
  is writes of render_index_summary only" to present-tense "The call-site
  migration scope is reads through render_index_summary only"; rewrite
  "The PR 3 IRON RULE regression suite" to "The IRON RULE regression
  suite".

Two corpus reference-impl docstring edits:
- corpus/types.py lines 190-195: drop "(Subagent 2 HIGH H4 ... is a
  design assumption until real raw sample data is added at PR 1 prep or
  contributed by operators). Accept as provisional for v1.0." replaced
  with "the raw-side field shape is locked at v1.0 against issue #65's
  stated schema. Operator-contributed raw sample data could surface
  refinements for v1.1." The word "provisional" in a locked spec's
  reference impl contradicts the spec/34 LOCKED status; the v1.1
  refinement framing matches the corpus_backend bundle.py:_source_paths
  v1.1 migration pattern at #314.
- corpus/backend.py lines 145-156: drop "the PR 3 call-site migration"
  and "(PR 3)" temporals from the render_index_summary Protocol method
  docstring. The migration is historical; the docstring describes the
  Protocol contract today.

Sixth finding (doc-release Check 3, TENSIONS.md T9): classified as
FOLLOW_UP, not FIX_NOW. T9 carries pre-landing predictive language
("expect ~26 spec docs," "Around spec doc #25 (~CorpusBackend land)").
Worth a follow-up issue to update the count and tense; not blocking PR 4.
(Tension itself, "spec surface grows with code surface," is still active.)

Full pytest re-run on the 8 corpus test modules after these edits: 196
tests pass, zero regressions. Full suite expected to remain at 2889 + 48
skipped (no executable Python changed; only docstrings + a comment line).

This completes the per-PR-marker consolidation sweep across the full
locked surface: spec body, reference impls, tests, strategic docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dep0we dep0we merged commit e91286c into main Jun 1, 2026
5 checks passed
@dep0we dep0we deleted the corpus-pr4-arc-closer branch June 1, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[backend] CorpusBackend — wiki/raw knowledge storage abstracted from filesystem walks

1 participant