Skip to content

Preserve context across retries and compressed history recovery#64

Merged
veithly merged 2 commits intomasterfrom
codex/fix-retry-context-loss
Apr 23, 2026
Merged

Preserve context across retries and compressed history recovery#64
veithly merged 2 commits intomasterfrom
codex/fix-retry-context-loss

Conversation

@veithly
Copy link
Copy Markdown
Contributor

@veithly veithly commented Apr 22, 2026

Summary

  • persist each user turn before provider execution so manual retries keep the active request
  • route thinking mode through the shared provider retry path and recognize more transient SDK errors
  • stop background per-step compaction, and only compact older runtime history after an explicit context-overflow failure while preserving the latest real user request, including multimodal prompts
  • keep compacted runs anchored to the latest user instruction by preserving the request head/tail and adding an authoritative request-ending block copied verbatim from the newest user message
  • turn anti-loop progress tracking into a repetition guard so completed actions do not get echoed back as summary pressure after compression

Testing

  • python -m pytest tests/test_provider_retry.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_runtime_compaction.py tests/test_runtime_message_sequence.py -q
  • real OpenAI gpt-5.4 probes with forced runtime compaction for Spanish exact-output and Japanese tool-execution flows

veithly added a commit that referenced this pull request Apr 22, 2026
The runtime was correctly preserving user turns and retrying overflow cases,
but GPT-5.4 still needed a repo-level 400k budget and preflight compaction
based on the configured context window instead of model-specific logic inside
AgentLoop.

This change moves the GPT-5.4 cap into config, keeps the loop generic,
pre-compresses runtime history when the rebuilt request is close to the
configured context window, and treats request-too-large TPM errors as
overflow recovery instead of ordinary rate-limit retries.

Constraint: Keep context-window policy in config rather than hardcoded per-model loop branches.
Rejected: 1M GPT-5.4 mapping and provider/model-specific request budgeting inside AgentLoop.
Confidence: High
Scope-risk: Medium; touches context budgeting, overflow recovery, and related tests.
Reversibility: High; limited to config, retry classification, and runtime compaction thresholds.
Directive: Cap GPT-5.4 at 400k and proactively compact near the configured window before provider calls.
Tested: python -m pytest tests/test_provider_retry.py tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking -q
Not-tested: Live GPT-5.4 sandbox sessions against the current org quota.
Related: PR #64
veithly added a commit that referenced this pull request Apr 22, 2026
Context:
- add a namespaced context-window env var while keeping the legacy alias
- route gateway startup through the shared config loader instead of ad-hoc env parsing
- document the GPT-5.4 400K operational cap and proactive compaction behavior

Constraint: Preserve YAML > env precedence and existing retry/compaction behavior.
Rejected: Hardcoding provider-specific context logic in AgentLoop; keeping gateway-only CONTEXT_WINDOW parsing.
Confidence: high
Scope-risk: low
Reversibility: high
Directive: Keep context limits configurable without clearing persisted session history.
Tested: python -m pytest tests/test_config_and_channels.py tests/test_provider_retry.py tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking -q
Not-tested: Manual gateway boot with .env/config.yaml combinations.
Related: PR #64
veithly added a commit that referenced this pull request Apr 22, 2026
Context:
- preflight runtime compaction was triggering too early for long sessions
- rebuilding persisted history plus full inline truncation delayed first chunks and tool results
- tighten the soft threshold so proactive compaction runs only near the hard limit

Constraint: Preserve overflow-triggered compaction/retry and non-destructive session history semantics.
Rejected: Removing proactive compaction entirely; reintroducing destructive session trimming.
Confidence: medium-high
Scope-risk: low
Reversibility: high
Directive: Keep long WS sessions responsive without losing the active request.
Tested: python -m pytest tests/test_runtime_compaction.py tests/test_session_persistence.py tests/test_provider_retry.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_config_and_channels.py -q
Not-tested: Manual live websocket run against a long real session.
Related: PR #64
veithly added a commit that referenced this pull request Apr 22, 2026
Compaction was preserving stale assistant conclusions in the active window, which could override the latest user request after long sessions. Neutralize older assistant turns, strengthen the history-compacted marker, and lock the behavior with regressions.

Constraint: preserve persisted session history and existing search_history recovery path.
Rejected: clearing old context entirely; keeping stale assistant prose visible as authoritative context.
Confidence: high
Scope-risk: medium
Reversibility: high
Directive: keep the latest real user request authoritative unless the user manually clears context.
Tested: python -m pytest tests/test_runtime_compaction.py::test_insert_compaction_marker_is_runtime_only tests/test_runtime_compaction.py::test_compress_runtime_context_neutralizes_old_assistant_conclusions_but_keeps_latest_user_request -q
Not-tested: full spoon-bot suite
Related: PR #64
@veithly veithly force-pushed the codex/fix-retry-context-loss branch from 954c2aa to 11d015d Compare April 22, 2026 12:56
veithly added 2 commits April 22, 2026 20:56
Provider failures and long GPT-5.4 sessions were letting the active request fall out of runtime context, and aggressive preflight compaction could delay websocket/tool output or leave stale assistant conclusions steering the agent. Consolidate the retry, budgeting, and compaction fixes so retries keep the active user turn, proactive compaction only runs near the configured limit, GPT-5.4 uses a configurable 400k cap, and compacted runtime history keeps the latest user intent authoritative without clearing persisted transcripts.

Constraint: Keep persisted session history intact unless the user clears it manually, and keep context-window policy configurable via YAML/env instead of hardcoded loop branches.
Rejected: Automatic non-overflow trimming retries, a 1M GPT-5.4 cap, and destructive session/context clearing.
Confidence: High
Scope-risk: Medium; touches retry, context budgeting, websocket responsiveness, and runtime compaction paths.
Reversibility: High; limited to agent loop/config/retry behavior and focused tests.
Directive: Preserve the active request across retries and only compact older runtime history when overflow recovery is actually needed.
Tested: python -m pytest tests/test_provider_retry.py tests/test_session_persistence.py tests/test_streaming_thinking.py::TestAgentLoopStream tests/test_streaming_thinking.py::TestAgentLoopProcessWithThinking tests/test_runtime_compaction.py tests/test_runtime_message_sequence.py tests/test_config_and_channels.py -q
Not-tested: Live multimodal and GPT-5.4 websocket sessions against production providers.
Related: PR #64
Context compression was still letting stale assistant conclusions and repetition guards outrank the newest user instruction, especially when long multilingual requests put the final delivery contract near the end of the turn. Preserve the request head and tail, add an authoritative request-ending block copied verbatim from the latest user message, and turn anti-loop tracking into an internal repetition guard instead of a summary prompt.

Constraint: Do not clear persisted history and do not rely on language-specific keyword matching.
Rejected: Per-language output-contract heuristics for English/Chinese/Japanese/Spanish tails.
Confidence: high
Scope-risk: medium
Reversibility: high
Directive: Keep runtime compaction focused on preserving the latest request contract so the agent continues executing and finishes in the requested format.
Tested: python -m pytest tests/test_runtime_message_sequence.py tests/test_runtime_compaction.py tests/test_streaming_thinking.py -q; real OpenAI gpt-5.4 probes with forced compression for Spanish exact-output and Japanese tool-execution flows
Not-tested: Full gateway/e2e suite
Related: PR #64
@veithly veithly changed the title Preserve context across provider retries and overflow recovery Preserve context across retries and compressed history recovery Apr 22, 2026
@veithly veithly merged commit 128bbf5 into master Apr 23, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant