Preserve context across retries and compressed history recovery by veithly · Pull Request #64 · XSpoonAi/spoon-bot

veithly · 2026-04-22T07:41:51Z

Summary

persist each user turn before provider execution so manual retries keep the active request
route thinking mode through the shared provider retry path and recognize more transient SDK errors
stop background per-step compaction, and only compact older runtime history after an explicit context-overflow failure while preserving the latest real user request, including multimodal prompts
keep compacted runs anchored to the latest user instruction by preserving the request head/tail and adding an authoritative request-ending block copied verbatim from the newest user message
turn anti-loop progress tracking into a repetition guard so completed actions do not get echoed back as summary pressure after compression

Testing

python -m pytest tests/test_provider_retry.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_runtime_compaction.py tests/test_runtime_message_sequence.py -q
real OpenAI gpt-5.4 probes with forced runtime compaction for Spanish exact-output and Japanese tool-execution flows

The runtime was correctly preserving user turns and retrying overflow cases, but GPT-5.4 still needed a repo-level 400k budget and preflight compaction based on the configured context window instead of model-specific logic inside AgentLoop. This change moves the GPT-5.4 cap into config, keeps the loop generic, pre-compresses runtime history when the rebuilt request is close to the configured context window, and treats request-too-large TPM errors as overflow recovery instead of ordinary rate-limit retries. Constraint: Keep context-window policy in config rather than hardcoded per-model loop branches. Rejected: 1M GPT-5.4 mapping and provider/model-specific request budgeting inside AgentLoop. Confidence: High Scope-risk: Medium; touches context budgeting, overflow recovery, and related tests. Reversibility: High; limited to config, retry classification, and runtime compaction thresholds. Directive: Cap GPT-5.4 at 400k and proactively compact near the configured window before provider calls. Tested: python -m pytest tests/test_provider_retry.py tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking -q Not-tested: Live GPT-5.4 sandbox sessions against the current org quota. Related: PR #64

Context: - add a namespaced context-window env var while keeping the legacy alias - route gateway startup through the shared config loader instead of ad-hoc env parsing - document the GPT-5.4 400K operational cap and proactive compaction behavior Constraint: Preserve YAML > env precedence and existing retry/compaction behavior. Rejected: Hardcoding provider-specific context logic in AgentLoop; keeping gateway-only CONTEXT_WINDOW parsing. Confidence: high Scope-risk: low Reversibility: high Directive: Keep context limits configurable without clearing persisted session history. Tested: python -m pytest tests/test_config_and_channels.py tests/test_provider_retry.py tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking -q Not-tested: Manual gateway boot with .env/config.yaml combinations. Related: PR #64

Context: - preflight runtime compaction was triggering too early for long sessions - rebuilding persisted history plus full inline truncation delayed first chunks and tool results - tighten the soft threshold so proactive compaction runs only near the hard limit Constraint: Preserve overflow-triggered compaction/retry and non-destructive session history semantics. Rejected: Removing proactive compaction entirely; reintroducing destructive session trimming. Confidence: medium-high Scope-risk: low Reversibility: high Directive: Keep long WS sessions responsive without losing the active request. Tested: python -m pytest tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_provider_retry.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_config_and_channels.py -q Not-tested: Manual live websocket run against a long real session. Related: PR #64

Compaction was preserving stale assistant conclusions in the active window, which could override the latest user request after long sessions. Neutralize older assistant turns, strengthen the history-compacted marker, and lock the behavior with regressions. Constraint: preserve persisted session history and existing search_history recovery path. Rejected: clearing old context entirely; keeping stale assistant prose visible as authoritative context. Confidence: high Scope-risk: medium Reversibility: high Directive: keep the latest real user request authoritative unless the user manually clears context. Tested: python -m pytest tests/test_runtime_compaction.py::test_insert_compaction_marker_is_runtime_only tests/test_runtime_compaction.py::test_compress_runtime_context_neutralizes_old_assistant_conclusions_but_keeps_latest_user_request -q Not-tested: full spoon-bot suite Related: PR #64

Provider failures and long GPT-5.4 sessions were letting the active request fall out of runtime context, and aggressive preflight compaction could delay websocket/tool output or leave stale assistant conclusions steering the agent. Consolidate the retry, budgeting, and compaction fixes so retries keep the active user turn, proactive compaction only runs near the configured limit, GPT-5.4 uses a configurable 400k cap, and compacted runtime history keeps the latest user intent authoritative without clearing persisted transcripts. Constraint: Keep persisted session history intact unless the user clears it manually, and keep context-window policy configurable via YAML/env instead of hardcoded loop branches. Rejected: Automatic non-overflow trimming retries, a 1M GPT-5.4 cap, and destructive session/context clearing. Confidence: High Scope-risk: Medium; touches retry, context budgeting, websocket responsiveness, and runtime compaction paths. Reversibility: High; limited to agent loop/config/retry behavior and focused tests. Directive: Preserve the active request across retries and only compact older runtime history when overflow recovery is actually needed. Tested: python -m pytest tests/test_provider_retry.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_runtime_compaction.py tests/test_runtime_message_sequence.py tests/test_config_and_channels.py -q Not-tested: Live multimodal and GPT-5.4 websocket sessions against production providers. Related: PR #64

Context compression was still letting stale assistant conclusions and repetition guards outrank the newest user instruction, especially when long multilingual requests put the final delivery contract near the end of the turn. Preserve the request head and tail, add an authoritative request-ending block copied verbatim from the latest user message, and turn anti-loop tracking into an internal repetition guard instead of a summary prompt. Constraint: Do not clear persisted history and do not rely on language-specific keyword matching. Rejected: Per-language output-contract heuristics for English/Chinese/Japanese/Spanish tails. Confidence: high Scope-risk: medium Reversibility: high Directive: Keep runtime compaction focused on preserving the latest request contract so the agent continues executing and finishes in the requested format. Tested: python -m pytest tests/test_runtime_message_sequence.py tests/test_runtime_compaction.py tests/test_streaming_thinking.py -q; real OpenAI gpt-5.4 probes with forced compression for Spanish exact-output and Japanese tool-execution flows Not-tested: Full gateway/e2e suite Related: PR #64

veithly force-pushed the codex/fix-retry-context-loss branch from 954c2aa to 11d015d Compare April 22, 2026 12:56

veithly added 2 commits April 22, 2026 20:56

veithly changed the title ~~Preserve context across provider retries and overflow recovery~~ Preserve context across retries and compressed history recovery Apr 22, 2026

veithly merged commit 128bbf5 into master Apr 23, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve context across retries and compressed history recovery#64

Preserve context across retries and compressed history recovery#64
veithly merged 2 commits intomasterfrom
codex/fix-retry-context-loss

veithly commented Apr 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

veithly commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

veithly commented Apr 22, 2026 •

edited

Loading