Preserve context across retries and compressed history recovery#64
Merged
Preserve context across retries and compressed history recovery#64
Conversation
veithly
added a commit
that referenced
this pull request
Apr 22, 2026
The runtime was correctly preserving user turns and retrying overflow cases, but GPT-5.4 still needed a repo-level 400k budget and preflight compaction based on the configured context window instead of model-specific logic inside AgentLoop. This change moves the GPT-5.4 cap into config, keeps the loop generic, pre-compresses runtime history when the rebuilt request is close to the configured context window, and treats request-too-large TPM errors as overflow recovery instead of ordinary rate-limit retries. Constraint: Keep context-window policy in config rather than hardcoded per-model loop branches. Rejected: 1M GPT-5.4 mapping and provider/model-specific request budgeting inside AgentLoop. Confidence: High Scope-risk: Medium; touches context budgeting, overflow recovery, and related tests. Reversibility: High; limited to config, retry classification, and runtime compaction thresholds. Directive: Cap GPT-5.4 at 400k and proactively compact near the configured window before provider calls. Tested: python -m pytest tests/test_provider_retry.py tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking -q Not-tested: Live GPT-5.4 sandbox sessions against the current org quota. Related: PR #64
veithly
added a commit
that referenced
this pull request
Apr 22, 2026
Context: - add a namespaced context-window env var while keeping the legacy alias - route gateway startup through the shared config loader instead of ad-hoc env parsing - document the GPT-5.4 400K operational cap and proactive compaction behavior Constraint: Preserve YAML > env precedence and existing retry/compaction behavior. Rejected: Hardcoding provider-specific context logic in AgentLoop; keeping gateway-only CONTEXT_WINDOW parsing. Confidence: high Scope-risk: low Reversibility: high Directive: Keep context limits configurable without clearing persisted session history. Tested: python -m pytest tests/test_config_and_channels.py tests/test_provider_retry.py tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking -q Not-tested: Manual gateway boot with .env/config.yaml combinations. Related: PR #64
veithly
added a commit
that referenced
this pull request
Apr 22, 2026
Context: - preflight runtime compaction was triggering too early for long sessions - rebuilding persisted history plus full inline truncation delayed first chunks and tool results - tighten the soft threshold so proactive compaction runs only near the hard limit Constraint: Preserve overflow-triggered compaction/retry and non-destructive session history semantics. Rejected: Removing proactive compaction entirely; reintroducing destructive session trimming. Confidence: medium-high Scope-risk: low Reversibility: high Directive: Keep long WS sessions responsive without losing the active request. Tested: python -m pytest tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_provider_retry.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_config_and_channels.py -q Not-tested: Manual live websocket run against a long real session. Related: PR #64
veithly
added a commit
that referenced
this pull request
Apr 22, 2026
Compaction was preserving stale assistant conclusions in the active window, which could override the latest user request after long sessions. Neutralize older assistant turns, strengthen the history-compacted marker, and lock the behavior with regressions. Constraint: preserve persisted session history and existing search_history recovery path. Rejected: clearing old context entirely; keeping stale assistant prose visible as authoritative context. Confidence: high Scope-risk: medium Reversibility: high Directive: keep the latest real user request authoritative unless the user manually clears context. Tested: python -m pytest tests/test_runtime_compaction.py::test_insert_compaction_marker_is_runtime_only tests/test_runtime_compaction.py::test_compress_runtime_context_neutralizes_old_assistant_conclusions_but_keeps_latest_user_request -q Not-tested: full spoon-bot suite Related: PR #64
954c2aa to
11d015d
Compare
Provider failures and long GPT-5.4 sessions were letting the active request fall out of runtime context, and aggressive preflight compaction could delay websocket/tool output or leave stale assistant conclusions steering the agent. Consolidate the retry, budgeting, and compaction fixes so retries keep the active user turn, proactive compaction only runs near the configured limit, GPT-5.4 uses a configurable 400k cap, and compacted runtime history keeps the latest user intent authoritative without clearing persisted transcripts. Constraint: Keep persisted session history intact unless the user clears it manually, and keep context-window policy configurable via YAML/env instead of hardcoded loop branches. Rejected: Automatic non-overflow trimming retries, a 1M GPT-5.4 cap, and destructive session/context clearing. Confidence: High Scope-risk: Medium; touches retry, context budgeting, websocket responsiveness, and runtime compaction paths. Reversibility: High; limited to agent loop/config/retry behavior and focused tests. Directive: Preserve the active request across retries and only compact older runtime history when overflow recovery is actually needed. Tested: python -m pytest tests/test_provider_retry.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_runtime_compaction.py tests/test_runtime_message_sequence.py tests/test_config_and_channels.py -q Not-tested: Live multimodal and GPT-5.4 websocket sessions against production providers. Related: PR #64
Context compression was still letting stale assistant conclusions and repetition guards outrank the newest user instruction, especially when long multilingual requests put the final delivery contract near the end of the turn. Preserve the request head and tail, add an authoritative request-ending block copied verbatim from the latest user message, and turn anti-loop tracking into an internal repetition guard instead of a summary prompt. Constraint: Do not clear persisted history and do not rely on language-specific keyword matching. Rejected: Per-language output-contract heuristics for English/Chinese/Japanese/Spanish tails. Confidence: high Scope-risk: medium Reversibility: high Directive: Keep runtime compaction focused on preserving the latest request contract so the agent continues executing and finishes in the requested format. Tested: python -m pytest tests/test_runtime_message_sequence.py tests/test_runtime_compaction.py tests/test_streaming_thinking.py -q; real OpenAI gpt-5.4 probes with forced compression for Spanish exact-output and Japanese tool-execution flows Not-tested: Full gateway/e2e suite Related: PR #64
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing
python -m pytest tests/test_provider_retry.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_runtime_compaction.py tests/test_runtime_message_sequence.py -qgpt-5.4probes with forced runtime compaction for Spanish exact-output and Japanese tool-execution flows