fix(wiki-compile): adaptive truncation for clusters > model context by GuyMannDude · Pull Request #6 · GuyMannDude/mnemo-cortex

GuyMannDude · 2026-05-24T23:45:57Z

Summary

Fixes wiki-compile fails on hot-entity clusters that exceed reasoning model context (no chunking/truncation) #5: hot-entity clusters (entities/guy, entities/igor, entities/opie, entities/rocky on artforge) silently stopped updating because the cluster size exceeded Gemini 2.5 Flash's 1M-token cap. Observed today at 4.7M / 3.9M / 4.8M / cluster-error respectively during a --days 14 backfill.
Adds halve-and-retry adaptive truncation at the per-cluster level, mirroring the existing pattern in agentb/vec.py:embed_with_adaptive_truncation. Sort newest-first, try full cluster, halve on 400-context-length, retry until success or floor.
Surfaces truncation in three places on the rendered page so it's visible whether you scan the header, the front matter, or scroll to the footer — vapor-truth on the fact that the page is operating on partial data, with a pointer to mnemo_recall for the dropped entries.

What this PR does NOT fix

The wiki compiler still reads from v2 legacy paths (~/.agentb/agents/<agent>/memory/*.json + ~/.mnemo-v2/mnemo.sqlite3) rather than the v3 server's storage. That's a separate concern (what gets harvested) and not what wiki-compile fails on hot-entity clusters that exceed reasoning model context (no chunking/truncation) #5 is about (how to handle what's harvested when it's too big for one LLM call).
For genuinely-can't-fit clusters (single memory whose summary alone exceeds context), the per-cluster try/except still logs the topic as failed — no change in behavior, just a smaller surface for the failure to bite.

Why halve-and-retry (and not chunk-merge)

Issue #5 lists four fix options ranked by effort. This PR picks option 1 (adaptive truncation) because:

Smallest diff (127 lines, isolated to two functions plus a helper) — easy to revert if the user-visible truncation note turns out to be the wrong UX.
Mirrors a pattern that's already proven in this codebase (agentb/vec.py), so reviewers don't need to evaluate a new strategy.
Doubles-or-triples the LLM bill on hot entities only in the worst case (halve, halve, succeed). Chunk-merge (option 3) would also double the bill but adds a final merge LLM call on top, ~3x cost in the success case.
Acceptable degradation: the newest N memories are exactly what fits, so the page reflects current state at the cost of historical depth. Older entries remain queryable via mnemo_recall and the page footer says so explicitly.

Chunk-merge (option 3) is the better long-term answer for the highest-fidelity wiki pages but a much bigger surface change. This PR is the "stop the silent breakage" fix; option 3 can land as a follow-up if the truncation footer turns out to drop too much.

Behavior changes

Cluster fits	Behavior
Yes (existing)	One LLM call, page renders identically to today
No, fits after halve(s)	N LLM calls (one per halve attempt). Page header shows `Source memories: N of M (⚠️ K older entries dropped — see footer)`. Front matter gains `cluster-truncated: true`, `cluster-total: M`, `cluster-dropped: K`. Footer explains why and points at `mnemo_recall`
Even at min_memories (default 1), still fails	Re-raises (existing per-cluster try/except in `main` logs as failed topic; same as today)
Non-context error (500, 401, network)	Re-raises immediately (no retry loop on auth or network failures)

render_page gains an optional total_memories: int \| None parameter (default None preserves prior behavior byte-for-byte).

Test plan

All four smoke scenarios verified locally:

Oversized cluster (100 memories, mock LLM fails above 25): halves 100 → 50 → 25, succeeds. Rendered page surfaces "25 of 100" in header + footer + front matter cluster-truncated/total/dropped keys.
Small cluster (3 memories, mock LLM never fails): one call, rendered page byte-identical to prior format — no truncation noise.
Non-overflow 500 / 401: re-raises immediately without halving.
Persistent overflow at floor (mock LLM returns 400 on every call): halves 10 → 5 → 2 → 1, then re-raises rather than looping forever.
KeyError('choices') shape (the artforge entities/rocky failure mode where the response JSON was malformed rather than a clean 400): correctly treated as an overflow signal, halves and recovers.
python -m py_compile mnemo-wiki-compile.py clean.

Production verification — needs the next nightly cron or a manual --days 14 run on a deployment with hot entities. Today's backfill on artforge had 4 such failures and would be the natural validation; happy to run it from my end once this lands and report back, or hand off.

Closes #5.

🤖 Generated with Claude Code

Hot entities accumulated across deep history produce clusters that exceed the reasoning model's context window. Surfaced today on artforge backfilling 14 days into entities/guy (4.7M tokens), entities/igor (3.9M), entities/opie (4.8M), entities/rocky — Gemini 2.5 Flash caps at 1M. The per-cluster try/except caught the 400s and kept the rest of the run going, but those four pages silently stopped updating. See issue #5 for the full repro and design tradeoffs. Fix mirrors the existing adaptive-truncation pattern in agentb/vec.py:embed_with_adaptive_truncation — halve and retry until either the call succeeds or we hit the min_memories floor and re-raise for the per-cluster handler. New behavior: - `compile_topic_adaptive(section, slug, memories, existing, min_memories=1)`: sort newest-first, build prompt, call LLM. On context-overflow 400 (or the rare `KeyError('choices')` shape some providers return when the prompt is so oversized the response JSON is malformed — observed on artforge's entities/rocky cluster), halve to `len(current) // 2` and retry. Stop when call succeeds or cluster reaches min_memories and still fails — in the latter case re-raise so the existing per-cluster try/except logs the topic as failed (same behavior as today, just for the genuinely unfit-at-any-size case). - Caller in `main()` switches from `call_llm(prompt)` to `compile_topic_adaptive(...)`, receives back the memories actually used, passes both used and total to `render_page`. - `render_page(section, slug, body, memories, total_memories=None)`: when `total_memories > len(memories)`, surfaces the truncation in three places — visible header line ("Source memories: N of M (⚠️ K older dropped — see footer)"), front matter (`cluster-truncated`, `cluster-total`, `cluster-dropped` for machine readers), and an expanded footer note pointing readers at `mnemo_recall` for the dropped entries (vapor truth — page operates on partial data, say so). - Non-overflow errors (500, 401, auth, network) re-raise immediately without halving — only context-length 400s trigger the retry loop. - Detection helper `_is_context_overflow_error(err)` factored out so future provider-specific shapes are one place to extend. Verified locally with four smoke scenarios: - Oversized cluster (100 memories, fail-above-25): halves 100 → 50 → 25, succeeds; rendered page surfaces 25 of 100 with the dropped count in header + footer + front matter. - Small cluster (3 memories, never fails): one call, no truncation noise in the output (footer matches prior format byte-for-byte). - Non-overflow 500 / 401: re-raises immediately without retry. - Persistent overflow at floor (every call returns 400): halves 10 → 5 → 2 → 1, then re-raises rather than looping forever. - `KeyError('choices')` treated as overflow signal so artforge's entities/rocky failure mode (where the response JSON was malformed rather than returning a clean 400) recovers correctly. Token cost note: a successful halve-twice run uses ~3x the LLM calls of the original single attempt, but a failed compile would have produced no page update at all. Net: pages on hot entities stay fresh at the cost of one extra retry per oversize cluster per night. Doesn't address the legacy-paths issue separately tracked in our internal brain — that's about WHAT the wiki harvests, this fixes WHAT IT CAN COMPILE once harvested. Closes #5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(wiki-compile): adaptive truncation for clusters > model context#6

fix(wiki-compile): adaptive truncation for clusters > model context#6
GuyMannDude wants to merge 1 commit into
masterfrom
fix/wiki-compile-adaptive-truncation

GuyMannDude commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GuyMannDude commented May 24, 2026

Summary

What this PR does NOT fix

Why halve-and-retry (and not chunk-merge)

Behavior changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant