Add context guardrails for LLM conversation management#902
Add context guardrails for LLM conversation management#902
Conversation
… overflow Implement three complementary safeguards against LLM context window overflow in long-running conversations and large document analysis: 1. Token estimation - fast heuristic counter (~3.5 chars/token) that intentionally over-estimates to trigger compaction conservatively. 2. Conversation compaction - when message history exceeds 75% of the model's context window, older messages are replaced by a concise summary while preserving recent turns (4-20 messages) verbatim. Supports pluggable summary functions for LLM-based summarization. 3. Tool output truncation - string outputs from tools are automatically capped at 50K characters at the PydanticAI wrapper level, with a notice directing the LLM to use range parameters. Includes model context window registry for 20+ models (OpenAI, Anthropic, Google), per-agent CompactionConfig on AgentConfig, and 30+ unit tests using SimpleTestCase for fast parallel execution. Closes #898 https://claude.ai/code/session_01QuMdF6v5xWrMrrMdRHvVv2
PR Review: Context Guardrails for LLM Conversation ManagementOverall this is a well-structured, well-documented addition. The module separation is clean, the test suite is solid, and the architectural choices (lightweight proxies for testability, deterministic fallback, per-agent config) are all sensible. A few real bugs and a handful of design notes below. Bugs1.
Fix: thread # pydantic_ai_tools.py — the wrapper does not receive compaction_cfg at all
result = truncate_tool_output(result) # always uses module default2. The loop that decides how many recent messages to keep only tests whether summary_budget = COMPACTION_SUMMARY_TARGET_TOKENS # or a safe over-estimate
for candidate in range(min_recent, min(max_recent, len(messages)) + 1):
recent_tokens = sum(m.token_estimate for m in messages[-candidate:])
if system_prompt_tokens + summary_budget + recent_tokens > threshold:
break
recent_count = candidateDesign issues3. The function is defined but never called anywhere — the only path to LLM-based summarization is through an externally provided 4.
__all__ = [
"estimate_token_count",
"get_context_window_for_model",
"truncate_tool_output",
"CompactionResult",
"compact_message_history",
"should_compact",
"CompactionConfig", # missing
"messages_to_proxies", # missing
]5. Misleading comment in The docstring says:
This is wrong — Minor issues6. if not self.token_estimate:
self.token_estimate = estimate_token_count(self.content)A caller passing 7. No integration test for the agent adapter path The test suite thoroughly covers the pure functions in Nits
|
Instead of recomputing compaction from scratch on every _get_message_history() call, the system now persists the result: - Added compaction_summary (TextField) and compacted_before_message_id (BigIntegerField) to Conversation model - get_conversation_messages() filters with id__gt when a bookmark exists, so old messages are never loaded from the DB - _get_message_history() reads the stored summary and only runs a fresh compaction pass when the remaining messages still exceed the context window threshold - persist_compaction() on CoreConversationManager writes the bookmark atomically via asave(update_fields=...) - Summaries merge incrementally: each compaction appends to the existing summary rather than replacing it This turns a per-call O(n) token estimation + summary generation into a one-time write followed by cheap filtered reads. https://claude.ai/code/session_01QuMdF6v5xWrMrrMdRHvVv2
Code Review — Context Guardrails for LLM Conversation ManagementOverall this is well-structured work: clear constants separation, sensible defaults, lightweight proxy pattern for testability, and the compaction algorithm is easy to follow. The issues below range from a genuine bug to a few code quality nits. Bug:
|
| Severity | Issue |
|---|---|
| Bug | raw_messages trimmed even when persist fails → context gap |
| Bug | CompactionConfig.max_tool_output_chars ignored in tool wrapper |
| Bug | Changelog says persist is atomic; it is not |
| Cleanup | _build_summary_prompt is dead code |
| Cleanup | CompactionConfig absent from __all__ |
| Nit | Misleading AgentConfig.compaction comment |
| Nit | O(n²) summation loop (trivial at current bounds) |
The core design is solid. Fixing the two data-correctness bugs before merge would be my main ask.
…_tool_output_chars, race condition - Move raw_messages trimming inside try block so that when persist_compaction fails the full message list is preserved for the current LLM call (previously context was silently lost) - Wire CompactionConfig.max_tool_output_chars through PydanticAIDependencies so per-agent tool output limits are actually honoured by the tool wrapper truncation calls - Replace plain asave() in persist_compaction with optimistic-lock pattern (filter on old cutoff before updating) to prevent concurrent requests from silently overwriting each other's bookmarks - Remove dead _build_summary_prompt function (never called anywhere) - Add CompactionConfig to __all__ exports, fix misleading comment on AgentConfig.compaction field, and correct changelog atomicity claim - Replace O(n²) recent-message token summation loop with O(n) running total (verified equivalent across 1000 randomized trials) - Add tests for persist failure path, max_tool_output_chars override, optimistic locking, and running-total loop invariants https://claude.ai/code/session_01QuMdF6v5xWrMrrMdRHvVv2
PR Review: Context Guardrails for LLM Conversation ManagementThis is a well-structured addition — the problem is real, the approach (compact-once, bookmark-persisted) is sound, and the test coverage is solid. A few bugs and design issues worth addressing before merge. Bugs1. Dead-code edge case with misleading comment ( The comment above the 2. Silent misbehaviour when If a caller passes 3. if not self.token_estimate: # True when token_estimate=0 is passed explicitly
self.token_estimate = estimate_token_count(self.content)Explicitly passing Code Quality4.
5. Unbounded growth of Each compaction appends 6. Deprecated
Minor Issues7. Constants-module docstring example is backwards The comment says 8. Missing model entry —
9. If a caller passes a very small Positive Highlights
|
The test used MagicMock(spec=AgentConfig) without setting cfg.compaction or stubbing get_conversation_messages. When _get_message_history tries arithmetic with MagicMock values (from cfg.compaction.*), unpredictable behavior prevents the code from reaching the LLM call that the test is actually exercising. Fix: provide a real CompactionConfig and stub get_conversation_messages with an empty list, matching the pattern from the passing test_pydantic_ai_error_handling. https://claude.ai/code/session_01QuMdF6v5xWrMrrMdRHvVv2
PR Review: Context Guardrails for LLM Conversation ManagementThis is a well-structured PR that solves a real problem (context overflow) with a thoughtful design. The compact-once/read-cheaply approach is sound, and the test coverage is solid. Here are my findings, ordered by severity. Bugs1. notice = TOOL_OUTPUT_TRUNCATION_NOTICE.format(limit=max_chars)
truncated = output[: max_chars - len(notice)] + notice
Fix: char_budget = max(0, max_chars - len(notice))
truncated = output[:char_budget] + noticeThe test 2.
__all__ = [
"estimate_token_count",
"get_context_window_for_model",
"truncate_tool_output",
"CompactionConfig",
"CompactionResult",
"compact_message_history",
"should_compact",
]But Design Concerns3. Double-counting In system_prompt_tokens = estimate_token_count(
(self.config.system_prompt or "") + stored_summary
)The total_after = system_prompt_tokens + summary_tokens + recent_tokens...it adds the new 4. No ORM-level integrity check for Using 5. After the second compaction, the persisted summary looks like: The duplicate prefix is harmless but slightly noisy. Consider stripping it from subsequent merges or using a single header: merged_summary = stored_summary.rstrip() + "\n\n--- (continued) ---\n" + summary_textCode Quality6. Unreachable edge case # Edge case: if even min_recent messages exceed the threshold,
# keep only the very last message to guarantee *some* context.
if recent_count < 1:
recent_count = 1
7.
asyncio.get_event_loop().run_until_complete(manager.persist_compaction(...))In Python 3.10+ this emits a asyncio.run(manager.persist_compaction(...))Or, better, make the test method 8. sentence = m.content.split(".")[0].strip()LLM responses frequently use markdown, bullet points, or Testing9.
self.assertLessEqual(len(result), 50 + len(notice) + some_slack)
self.assertTrue(result.startswith("y")) # content from beginning, not end10. No test for The optimistic-lock test is good, but there's no test verifying that Minor
Overall this is a solid addition and the architecture (compact-once, persist bookmark, cheap reads) is well thought out. The bugs above are all fixable with small changes; the truncation one in particular should be addressed before merge. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Signed-off-by: JSIV <5049984+JSv4@users.noreply.github.com>
Code Review: Context Guardrails & Conversation Compaction (#902)Overall this is a well-structured addition. The three-layer approach (token estimation → compaction → tool output truncation), the dataclass-based config, optimistic locking for concurrent writes, and the proxy pattern for pure-unit testing are all solid design choices. The test suite is thorough. A few issues worth addressing before merge: Bugs / Correctness1. # context_guardrails.py
truncated = output[: max_chars - len(notice)] + noticeIf 2. Misleading comment on the # compact_message_history
recent_count = min(min_recent, len(messages)) # ← starts at min_recent
...
# Edge case: if even min_recent messages exceed the threshold,
# keep only the very last message to guarantee *some* context.
if recent_count < 1:
recent_count = 1When Design / Architecture3. Summary grows unboundedly across compaction cycles merged_summary = stored_summary.rstrip() + "\n\n" + result.summaryAfter N compaction rounds, 4. Late import inside async def _get_message_history(self) -> ...:
from opencontractserver.llms.context_guardrails import (
compact_message_history, estimate_token_count, messages_to_proxies,
)This import runs on every call. If the motivation is to break a circular import, that's fine but worth a brief comment. If there's no circular import concern, moving it to the module level avoids the repeated overhead and makes the dependency graph explicit. 5. Uncovered Code Quality6. sentence = m.content.split(".")[0].strip()Splits on the first literal period, so "Dr. Smith said…" → "Dr", "1.5 million tokens" → "1", and any URL truncates at the first dot. Even a simple regex like 7. Deprecated async pattern in test asyncio.get_event_loop().run_until_complete(manager.persist_compaction(...))
8. No validation in Minor / Observations
What's done well
|
Summary
Implements a comprehensive context management system to prevent LLM context overflow on long-running conversations. Inspired by Claude Code's approach, this adds three complementary safeguards: fast heuristic token estimation, automatic conversation compaction via summarization, and hard limits on tool output sizes.
Key Changes
New module:
opencontractserver/llms/context_guardrails.pyestimate_token_count(): Fast heuristic token counter (~3.5 chars/token) that avoids heavyweight tokenizer importsget_context_window_for_model(): Model-aware context window lookup with exact and prefix matchingtruncate_tool_output(): Hard ceiling on individual tool outputs before they enter conversation historycompact_message_history(): Main compaction algorithm that replaces old messages with concise summaries while preserving recent turns verbatimshould_compact(): Decision heuristic based on context window utilization threshold_deterministic_summary(): Fallback non-LLM summary builder for fast/synchronous pathsmessages_to_proxies(): Bridge between Django ORM ChatMessage objects and lightweight_MessageProxyfor testingCompactionConfig: Per-agent configuration dataclass for tuning compaction behaviorNew module:
opencontractserver/constants/context_guardrails.pyCOMPACTION_THRESHOLD_RATIO(0.75),MIN_RECENT_MESSAGES(4),MAX_RECENT_MESSAGES(20)MAX_TOOL_OUTPUT_CHARS(50K), truncation notice templateCOMPACTION_SUMMARY_TARGET_TOKENS(300)CHARS_PER_TOKEN_ESTIMATE(3.5)Integration into Pydantic-AI agent adapter (
opencontractserver/llms/agents/pydantic_ai_agents.py)_get_message_history()now applies compaction when enabledTool output truncation (
opencontractserver/llms/tools/pydantic_ai_tools.py)truncate_tool_output()to all return values before insertion into historyComprehensive test suite (
opencontractserver/tests/test_context_guardrails.py)Updated exports (
opencontractserver/constants/__init__.py)Notable Implementation Details
_MessageProxydecouples compaction logic from Django ORM, enabling fast pure-unit testingCompactionConfigallows operators to tune thresholds without code changeshttps://claude.ai/code/session_01QuMdF6v5xWrMrrMdRHvVv2