Purpose: central index of every Layer-4 matrix in this repo plus the gaps where features ship without an L4 anchor. Read this before declaring "all experiments done." The list is never finished — new features should land with their matrix and update this index.
Companion:
findings-log.md— every latent production bug AND methodology gotcha surfaced while building these matrices. Read both when assessing mykb's correctness state.
These ship with full EXPERIMENT.md + at least one scenarios/*.sh and have been RED→GREEN-proven against a real Pi runtime.
| Feature | Matrix | Scenarios | Status |
|---|---|---|---|
kb work handoff |
experiments/handoff/ |
continuity, overwrite, clear, no-active-workspace | ✅ implemented — <mykb-workspace>-not-visible regression (issue #5) fixed in commit ab2c22b; continuity re-verified GREEN post-fix |
| Per-turn journal injection | experiments/journal-auto-inject/ |
resume-continuity, stale-filter, mid-session-append, no-active-workspace | ✅ implemented — same regression fixed (ab2c22b); resume-continuity re-verified GREEN post-fix |
| Area scoring (v1 + v2 + v3) | experiments/area-scoring/ |
keyword-match-loads, off-topic-no-leak, no-workspace-still-loads, init-area-tags, kb-list-shows-tags, scoring-without-tools, scoring-isolated, file-path-signal, workspace-boost, sticky-area-persistence, token-budget-eviction | ✅ implemented (full matrix — all 11 rows) |
kb_search tool + FTS area-metadata |
experiments/kb-search/ |
tool-direct-text-match, tool-finds-via-area-metadata, tool-no-match-no-fabrication | ✅ implemented |
kb_load tool contract |
experiments/kb-load/ |
basic-load, discover-via-area-index, unknown-area-no-fabrication | ✅ implemented |
kb_list tool contract |
experiments/kb-list/ |
basic-list, lists-tags-suffix, no-match-no-fabrication | ✅ implemented |
kb work checkpoint (LLM-as-extractor) |
experiments/work-checkpoint/ |
journal-extraction, knowledge-extraction, empty-conversation-no-fabrication | ✅ implemented |
| Claude Code runtime | experiments/claude-code/ |
bare-runs, hook-injects-handoff | ✅ implemented |
tool-gating hook |
experiments/tool-gating/ |
blocks-write-to-brain, blocks-edit-by-pattern, allows-non-knowledge-writes, block-then-retry-via-kb-add, bash-bypass-known-gap | ✅ implemented (one known-fail documenting a security gap — see matrix) |
kb_work_* tools (journal, state, note) |
experiments/kb-work-tools/ |
journal-tool, state-tool, note-tool, no-active-workspace | ✅ implemented — the streaming workspace-mutation path (per-tool); state-tool is the cross-step <mykb-workspace> re-injection anchor |
kb_add tool |
experiments/kb-add/ |
add-fact, add-decision-with-why, add-then-search-roundtrip, add-to-unknown-area | ✅ implemented — the LLM-mutates-area path (per-entry-type); add-then-search-roundtrip is the write → FTS index → retrieve closed-loop anchor; add-to-unknown-area pins the auto-create policy |
kb_verify tool |
experiments/kb-verify/ |
verify-by-id, add-then-verify-roundtrip, verify-unknown-id | ✅ implemented — the trust-decay promote-half; add-then-verify-roundtrip is the create-and-attest anchor; pins the NEGATIVE that verifyEntry records no source |
/kb slash command |
experiments/kb-command/ |
load-marks-area-loaded, usage-on-empty-args, unknown-area-no-load | 🟡 partial — these 3 single-step scenarios anchor dispatch + the pi.sendMessage content + markAreaLoaded persistence (and surfaced the never-worked-against-real-Pi /kb bug — gotcha QbRwqLxJ). The load-and-cite row (LLM cites a /kb-loaded marker) is blocked on harness support — issue #7. |
Total: 13 matrices, 52 scenarios (including 1 documented known-fail). Every Layer-4 feature now has ≥1 scenario — the "Scaffolded matrices" backlog is empty.
None. Every experiments/<feature>/ ships at least one scenarios/*.sh. (If a future feature lands without its matrix, add a row here AND scaffold its EXPERIMENT.md so this doc stays linkable.)
These belong to existing matrices but the matrix's behavior table flags them as not-yet-covered.
| Matrix | Gap | Notes |
|---|---|---|
area-scoring |
Sticky-area persistence across turns | An area loaded in turn N gets a sticky-boost in turn N+1. Scenario attempted; blocked — issue #6. |
area-scoring |
Token-budget eviction order | When the 2000-token budget is exceeded, which areas keep their entries? Highest-scoring — selectEntriesForInjection now has a deterministic area-id tie-break, but the end-to-end scenario is blocked on the same investigation (issue #6). |
kb-command |
load-and-cite — LLM cites a /kb-loaded marker |
/kb triggers no LLM turn; step can't deliver /kb <area> + a follow-up prompt in one Pi process (vfa run = one --prompt, no Pi-level session continuity). Blocked — issue #7. The LLM-reads-a-role:'custom'-message leg itself is already exercised by area-scoring/scoring-isolated.sh. |
kb-command |
multiple-areas — /kb a b injects both |
Single-step-testable (a 2-area variant of load-marks-area-loaded); not yet written. |
- Compaction interaction: when Pi auto-compacts a long session, does the kb extension survive? Specifically: do persisted signals / loaded-areas survive the compaction event? No scenario.
- Multi-turn injection coherence: turn N injects area A; turn N+1 injects area B. Does the LLM see both, just B, or get confused? No scenario.
- Concurrent session contention: two
KB_SESSION_IDs writing to the same workspace. Atomicity is L1-tested for individual operations; the multi-process pattern at L4 isn't.
- Adding a feature: ship its matrix at the same time. Add a row to the Implemented matrices table.
- Closing a scaffold: implement scenarios; move the row from Scaffolded to Implemented. Delete from Sub-behavior gaps if it covered one.
- Discovering a gap: add a row to Scaffolded matrices AND scaffold an
EXPERIMENT.mdso the doc stays linkable.
The methodology says experiments accumulate as the regression suite. That's correct as a steady-state goal; in practice, shipping an L4 matrix lags shipping the feature. Without an explicit gap list, "feature X has no L4" silently fades from collective awareness — until a scenario surfaces a latent bug (cycle 8 was the load-bearing example: 3-layer context-event bug latent for the entire history of mykb because every prior matrix had a fallback path that masked it).
Tracking gaps is cheap. The cost of a gap that goes unnoticed for months is high.