Skip to content

fix(wiki-compile): adaptive truncation for clusters > model context#6

Open
GuyMannDude wants to merge 1 commit into
masterfrom
fix/wiki-compile-adaptive-truncation
Open

fix(wiki-compile): adaptive truncation for clusters > model context#6
GuyMannDude wants to merge 1 commit into
masterfrom
fix/wiki-compile-adaptive-truncation

Conversation

@GuyMannDude

Copy link
Copy Markdown
Owner

Summary

  • Fixes wiki-compile fails on hot-entity clusters that exceed reasoning model context (no chunking/truncation) #5: hot-entity clusters (entities/guy, entities/igor, entities/opie, entities/rocky on artforge) silently stopped updating because the cluster size exceeded Gemini 2.5 Flash's 1M-token cap. Observed today at 4.7M / 3.9M / 4.8M / cluster-error respectively during a --days 14 backfill.
  • Adds halve-and-retry adaptive truncation at the per-cluster level, mirroring the existing pattern in agentb/vec.py:embed_with_adaptive_truncation. Sort newest-first, try full cluster, halve on 400-context-length, retry until success or floor.
  • Surfaces truncation in three places on the rendered page so it's visible whether you scan the header, the front matter, or scroll to the footer — vapor-truth on the fact that the page is operating on partial data, with a pointer to mnemo_recall for the dropped entries.

What this PR does NOT fix

  • The wiki compiler still reads from v2 legacy paths (~/.agentb/agents/<agent>/memory/*.json + ~/.mnemo-v2/mnemo.sqlite3) rather than the v3 server's storage. That's a separate concern (what gets harvested) and not what wiki-compile fails on hot-entity clusters that exceed reasoning model context (no chunking/truncation) #5 is about (how to handle what's harvested when it's too big for one LLM call).
  • For genuinely-can't-fit clusters (single memory whose summary alone exceeds context), the per-cluster try/except still logs the topic as failed — no change in behavior, just a smaller surface for the failure to bite.

Why halve-and-retry (and not chunk-merge)

Issue #5 lists four fix options ranked by effort. This PR picks option 1 (adaptive truncation) because:

  • Smallest diff (127 lines, isolated to two functions plus a helper) — easy to revert if the user-visible truncation note turns out to be the wrong UX.
  • Mirrors a pattern that's already proven in this codebase (agentb/vec.py), so reviewers don't need to evaluate a new strategy.
  • Doubles-or-triples the LLM bill on hot entities only in the worst case (halve, halve, succeed). Chunk-merge (option 3) would also double the bill but adds a final merge LLM call on top, ~3x cost in the success case.
  • Acceptable degradation: the newest N memories are exactly what fits, so the page reflects current state at the cost of historical depth. Older entries remain queryable via mnemo_recall and the page footer says so explicitly.

Chunk-merge (option 3) is the better long-term answer for the highest-fidelity wiki pages but a much bigger surface change. This PR is the "stop the silent breakage" fix; option 3 can land as a follow-up if the truncation footer turns out to drop too much.

Behavior changes

Cluster fits Behavior
Yes (existing) One LLM call, page renders identically to today
No, fits after halve(s) N LLM calls (one per halve attempt). Page header shows **Source memories:** N of M (⚠️ K older entries dropped — see footer). Front matter gains cluster-truncated: true, cluster-total: M, cluster-dropped: K. Footer explains why and points at mnemo_recall
Even at min_memories (default 1), still fails Re-raises (existing per-cluster try/except in main logs as failed topic; same as today)
Non-context error (500, 401, network) Re-raises immediately (no retry loop on auth or network failures)

render_page gains an optional total_memories: int \| None parameter (default None preserves prior behavior byte-for-byte).

Test plan

All four smoke scenarios verified locally:

  • Oversized cluster (100 memories, mock LLM fails above 25): halves 100 → 50 → 25, succeeds. Rendered page surfaces "25 of 100" in header + footer + front matter cluster-truncated/total/dropped keys.
  • Small cluster (3 memories, mock LLM never fails): one call, rendered page byte-identical to prior format — no truncation noise.
  • Non-overflow 500 / 401: re-raises immediately without halving.
  • Persistent overflow at floor (mock LLM returns 400 on every call): halves 10 → 5 → 2 → 1, then re-raises rather than looping forever.
  • KeyError('choices') shape (the artforge entities/rocky failure mode where the response JSON was malformed rather than a clean 400): correctly treated as an overflow signal, halves and recovers.
  • python -m py_compile mnemo-wiki-compile.py clean.

Production verification — needs the next nightly cron or a manual --days 14 run on a deployment with hot entities. Today's backfill on artforge had 4 such failures and would be the natural validation; happy to run it from my end once this lands and report back, or hand off.

Closes #5.

🤖 Generated with Claude Code

Hot entities accumulated across deep history produce clusters that
exceed the reasoning model's context window. Surfaced today on
artforge backfilling 14 days into entities/guy (4.7M tokens),
entities/igor (3.9M), entities/opie (4.8M), entities/rocky — Gemini
2.5 Flash caps at 1M. The per-cluster try/except caught the 400s and
kept the rest of the run going, but those four pages silently stopped
updating. See issue #5 for the full repro and design tradeoffs.

Fix mirrors the existing adaptive-truncation pattern in
agentb/vec.py:embed_with_adaptive_truncation — halve and retry until
either the call succeeds or we hit the min_memories floor and re-raise
for the per-cluster handler.

New behavior:
- `compile_topic_adaptive(section, slug, memories, existing,
  min_memories=1)`: sort newest-first, build prompt, call LLM. On
  context-overflow 400 (or the rare `KeyError('choices')` shape some
  providers return when the prompt is so oversized the response JSON
  is malformed — observed on artforge's entities/rocky cluster), halve
  to `len(current) // 2` and retry. Stop when call succeeds or
  cluster reaches min_memories and still fails — in the latter case
  re-raise so the existing per-cluster try/except logs the topic as
  failed (same behavior as today, just for the genuinely
  unfit-at-any-size case).
- Caller in `main()` switches from `call_llm(prompt)` to
  `compile_topic_adaptive(...)`, receives back the memories actually
  used, passes both used and total to `render_page`.
- `render_page(section, slug, body, memories, total_memories=None)`:
  when `total_memories > len(memories)`, surfaces the truncation in
  three places — visible header line ("Source memories: N of M (⚠️ K
  older dropped — see footer)"), front matter (`cluster-truncated`,
  `cluster-total`, `cluster-dropped` for machine readers), and an
  expanded footer note pointing readers at `mnemo_recall` for the
  dropped entries (vapor truth — page operates on partial data, say
  so).
- Non-overflow errors (500, 401, auth, network) re-raise immediately
  without halving — only context-length 400s trigger the retry loop.
- Detection helper `_is_context_overflow_error(err)` factored out so
  future provider-specific shapes are one place to extend.

Verified locally with four smoke scenarios:
- Oversized cluster (100 memories, fail-above-25): halves
  100 → 50 → 25, succeeds; rendered page surfaces 25 of 100 with the
  dropped count in header + footer + front matter.
- Small cluster (3 memories, never fails): one call, no truncation
  noise in the output (footer matches prior format byte-for-byte).
- Non-overflow 500 / 401: re-raises immediately without retry.
- Persistent overflow at floor (every call returns 400): halves
  10 → 5 → 2 → 1, then re-raises rather than looping forever.
- `KeyError('choices')` treated as overflow signal so artforge's
  entities/rocky failure mode (where the response JSON was malformed
  rather than returning a clean 400) recovers correctly.

Token cost note: a successful halve-twice run uses ~3x the LLM calls
of the original single attempt, but a failed compile would have
produced no page update at all. Net: pages on hot entities stay fresh
at the cost of one extra retry per oversize cluster per night.

Doesn't address the legacy-paths issue separately tracked in our
internal brain — that's about WHAT the wiki harvests, this fixes
WHAT IT CAN COMPILE once harvested.

Closes #5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wiki-compile fails on hot-entity clusters that exceed reasoning model context (no chunking/truncation)

1 participant