Skip to content

#274 Slice A: skip harness-injected pseudo-turns in the consolidation sweep#359

Merged
StarshipSuperjam merged 2 commits into
mainfrom
claude/s274-consolidation-sweep-skip
Jun 30, 2026
Merged

#274 Slice A: skip harness-injected pseudo-turns in the consolidation sweep#359
StarshipSuperjam merged 2 commits into
mainfrom
claude/s274-consolidation-sweep-skip

Conversation

@StarshipSuperjam

@StarshipSuperjam StarshipSuperjam commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Part of #274 (the memory anchor decision). Slice A of three — the small, independent piece; #274 closes with the consolidated-raw erasure slice, not here. Folds in the #333 sub-concern.

Purpose

Stop the consolidation sweep from tidying the engine's own injected notifications as if the operator had written them — while keeping every byte resident and recoverable.

  • Claude Code injects non-conversational blocks as user-role turns: a background-agent <task-notification>, and the /compact continuation summary (This session is being continued from a previous conversation…). They land in the ledger as turn-delta records and are already excluded from recall by kind (Memory recall is dominated by raw turn-notes; curated summaries never surface (locked-spec tension) #332), but consolidate.read_deltas read the raw ledger unfiltered, so the in-context AI consolidated them into the operator's episodic record as junk.

Impact: consolidation now reads only genuine conversation as its fuel; the injected notes stay in the ledger, recoverable, and recall is unchanged.

Scope

Recognise an injected pseudo-turn at capture and tag it (resident, never dropped); skip a tagged/injected record in the consolidation sweep.

  • records.py: INJECTED_TAG plus is_injected_pseudo_turn_text(text) (start-anchored on the two ground-truthed standalone sentinels) and is_injected_record(record) (the durable tag path, with a text-prefix fallback for records captured before tagging).
  • capture.py: _make_record(..., injected=) appends INJECTED_TAG; _capture decides injectedness on the whole message before chunking, so every chunk of a >4 KB continuation summary is tagged — not just the first.
  • consolidate.py: read_deltas and _scan_sessions skip injected records. Applying it in detection as well as the read keeps a session whose turn-deltas are all injected from being flagged pending forever (read and detection must agree, or the sweep loops). A demo part (PART 6) shows the notice + banner skipped as fuel while all three notes stay in the cabinet.

Impact: the fix is read-side for consolidation and a tag at capture; no record is ever deleted, and recall (already turn-delta-excluded) is untouched.

Out of scope

The rest of #274, <system-reminder>, and any pre-ledger drop.

  • The Q1 disposition (construction knowledge stays in Claude Code memory) and the Q2 consolidated-raw batched erasure (the later slice that resolves the anchor issue — not this one) are separate pieces.
  • <system-reminder> is deliberately not a marker: ground-truth over live transcripts shows it fuses with a human prompt in the same captured turn (8 confirmed cases), so a start-anchored drop would lose real content. [SYSTEM NOTIFICATION…] and <user-prompt-submit-hook> are omitted as inert — they anchor 0 stored records.
  • No pre-ledger drop: Memory capture: also skip injected context blocks (system-reminder / hook context), safely #333 settled that deleting these at capture removes recoverable content against the durability law.

Impact: the change is bounded to the two distinctive standalone sentinels and never touches the recall path or the erasure machinery.

Risk

Low — a read-side consolidation filter plus a capture-time tag, both behaviour-preserving for genuine conversation; the failure mode is at most slightly noisier consolidation fuel, never a lost or hidden turn.

  • False-drop guard: the predicate is start-anchored, so a real turn that merely mentions a marker mid-sentence is kept (covered by tests), and the two markers were verified against the live ledger to never carry trailing human content. The one shape that can fuse with a prompt (<system-reminder>) is excluded.
  • Locked-design note (owed clarification, tracked as Memory design: the "sweep reads the raw ledger unfiltered" wording is now inexact (a non-recall filter was added) #360): this introduces a filter into the consolidation sweep, which the memory design describes as reading "the raw ledger unfiltered" (systems/cognitive/memory/README.md:113). That clause's guarantee — recall-exclusion never shrinks the sweep, and recovery keys on the consolidation marker, not on fuel content — is fully preserved here (the sweep still reads every resident record; only harness scaffolding is withheld from the AI). The literal phrasing becomes inexact, so an engine-planning clarification distinguishing "reads every resident record" from "feeds every record to the AI" is owed (a maintainer act; not edited from this build) — filed as Memory design: the "sweep reads the raw ledger unfiltered" wording is now inexact (a non-recall filter was added) #360 so it is tracked, not lost.
  • Guarded surface: .engine/tools/memory/*.py edits trip engine-guard, so this carries a guardrail-ack; no guardrail is weakened.

Impact: the worst case is an injected note that escapes the filter and is summarised as before; each tool's own test suite plus the demo is the regression catch.

Validation

Full suite and the CI validator green; the demo exercises the real filter.

  • python -m unittest discover -s tools -p 'test_*.py' → 2755 tests OK (2 documented offline skips), run from the worktree. New legs: the two predicates (start-anchor, tag/text, <system-reminder> excluded); capture tagging incl. a multi-chunk continuation summary where every chunk is tagged; read_deltas skip-but-keep; the all-injected session never flagged pending.
  • validate.py --suite CI → the sole hard finding is the known local no-token state of disposition-issue-resolution (it needs a token to bite; CI is its real witness); the change does not touch it.
  • graph.json regenerated and in sync; self-map.md unchanged (declaration-derived). Demo (consolidate.py demo, PART 6) shows the notification + banner skipped as fuel while all three notes remain resident.

Impact: the behaviour-preserving claim rests on the existing memory suites staying green plus the new injected-turn legs and the falsifiable demo.

Review

A four-lens cold plan gate and a four-lens cold deliverable gate both ran; findings were ground-truthed against source and folded. No blocking or serious findings remain.

  • The plan gate caught a blocking re-detection loop (an all-injected session flagged pending forever) — fixed by applying the predicate in _scan_sessions; a blocking false-drop (the <system-reminder> fusion) and the inert/over-broad markers — fixed by paring the list to the two ground-truthed sentinels (verified against the live ledger, 206 task-notifications and 20 continuation summaries, 0 fused); and the chunking gap — fixed by tagging at capture before chunking.
  • The deliverable gate (spec-conformance, technical-integrity, security/governance, usability) read the committed diff and ran the suite in throwaway copies: no blocking, no serious findings. Spec-conformance mutation-tested the multi-chunk "every chunk tagged" guarantee (it goes red if only the first chunk is tagged); technical-integrity probed the predicates against malformed input (none raises) and confirmed O(1)-per-record cost; security/governance independently re-ground-truthed the live ledger (208 task-notifications, 0 fused) and judged the locked-clause item disclose-and-proceed (tracked as Memory design: the "sweep reads the raw ledger unfiltered" wording is now inexact (a non-recall filter was added) #360); usability regressed the filter to confirm the demo actually fails. Nits folded: two over-long demo lines tightened and the "fuel" coinage dropped; remaining nits (a bounded old-multi-chunk residual and the inherent harness-string coupling) the lenses judged acceptable-as-built.
  • The change is an internal behaviour-preserving filter with no settled product description; the operator-facing evidence is the green suite and the falsifiable demo.

Impact: this is the engine's own account of the review — the maintainer's merge is the binding gate.

Files of interest

The predicate, the capture-time tag, and the sweep skip.

  • .engine/tools/memory/records.pyis_injected_pseudo_turn_text / is_injected_record and INJECTED_TAG, the shared cycle-free vocabulary both writers import.
  • .engine/tools/memory/capture.py — message-level tagging in _capture (covers every chunk).
  • .engine/tools/memory/consolidate.pyread_deltas + _scan_sessions skip, and the read↔detection agreement.

Impact: these three carry the mechanism; the test files pin each leg.

Claude involvement

Claude (Opus 4.8) built the slice under the maintainer's direction; the maintainer chose the complete (capture-time) scope and holds the merge.

Impact: AI judgment is load-bearing on which shapes are injected vs conversation; the ground-truth tally, the tests, and the demo are the correlate.

… sweep

Claude Code injects non-conversational blocks as user-role turns — a
background-agent <task-notification>, and the /compact continuation summary.
They reach the ledger as turn-delta records and are already recall-excluded by
kind (#332), but the consolidation sweep read the raw ledger unfiltered, so the
in-context AI consolidated them as if the operator had said them.

Per #333 these are NOT dropped pre-ledger (that deletes recoverable content
against the durability law). Instead capture TAGS an injected message
(records.INJECTED_TAG) on every chunk — recognised on the whole message before
chunking, so a >4 KB continuation summary is fully tagged, not just its first
chunk — and the sweep skips a tagged/injected record as fuel. The record stays
physically resident and recoverable; recall already excludes it.

The marker set is the two distinctive, ground-truthed standalone sentinels
(<task-notification>, the continuation summary) — each is the whole injected
message and never fuses with a real prompt. <system-reminder> is deliberately
excluded: it fuses with a human prompt in the same turn, so a start-anchored
drop would lose real content (confirmed against live transcripts).

detect_unconsolidated / _scan_sessions apply the same predicate, so a session
whose turn-deltas are all injected is never flagged pending forever (read and
detection must agree, or the sweep loops).

- records.py: INJECTED_TAG + is_injected_pseudo_turn_text / is_injected_record
- capture.py: _make_record(injected=) + message-level tagging in _capture
- consolidate.py: read_deltas + _scan_sessions skip injected; demo PART 6
- tests: test_records / test_capture (incl. multi-chunk) / test_consolidate

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… "fuel" coinage

Wrap two over-long PART 6 output lines (108/114 chars) toward the demo's ~80
convention, and rename the internal `fuel` variable / operator-facing phrasing
so the demo no longer leans on the undefined "fuel" metaphor — "what the AI
reads to tidy this session" carries the meaning plainly. Behaviour and the
demo's pass/fail self-check are unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@StarshipSuperjam StarshipSuperjam marked this pull request as ready for review June 30, 2026 22:30
@StarshipSuperjam StarshipSuperjam added the guardrail-ack Deliberately approves a guardrail-weakening change; clears the engine-guard block. label Jun 30, 2026
@StarshipSuperjam StarshipSuperjam merged commit a86d65b into main Jun 30, 2026
10 of 11 checks passed
@StarshipSuperjam StarshipSuperjam deleted the claude/s274-consolidation-sweep-skip branch June 30, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

guardrail-ack Deliberately approves a guardrail-weakening change; clears the engine-guard block.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant