Skip to content

[v2] Add configurable compaction retention policy#2407

Open
SivanCola wants to merge 6 commits into
esengine:main-v2from
SivanCola:feat/keep-policy
Open

[v2] Add configurable compaction retention policy#2407
SivanCola wants to merge 6 commits into
esengine:main-v2from
SivanCola:feat/keep-policy

Conversation

@SivanCola
Copy link
Copy Markdown
Collaborator

Summary

Introduce configuration fields for compaction retention behavior, including keep policy, compact ratio, and recent message count.

Root Cause

Long-running sessions need more control over what survives compaction and when compaction triggers.

Technical Approach

Add KeepPolicy flags, TOML configuration fields, and rendered config hints for keep policy, compaction ratio, and recent keep count.

Focused Optimization Points

  • Preserves default behavior when config is unset.
  • Exposes advanced compaction knobs through reasonix.toml.
  • Keeps the configuration surface explicit and documented in rendered TOML.

Verification

Not run during this publishing pass. Draft note: reviewers should confirm the keep policy is connected to the actual compaction selection logic before marking ready.

SivanCola added 2 commits May 31, 2026 00:30
Adds KeepPolicy bitmask (KeepErrors, KeepUserMarked) to control which
messages survive compaction. Configurable via keep, compact_ratio,
and recent_keep in reasonix.toml.
@SivanCola SivanCola changed the title Add configurable compaction retention policy [v2] Add configurable compaction retention policy May 31, 2026
@SivanCola SivanCola marked this pull request as ready for review May 31, 2026 09:25
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a2cc76228

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/config/config.go
Comment thread internal/agent/agent.go
@SivanCola
Copy link
Copy Markdown
Collaborator Author

Resolved the merge conflict with main-v2 in internal/agent/compact.go and pushed commit 2bfbd78b to the PR branch.\n\nRoot cause: both sides added adjacent compaction-related functionality: this PR introduced the configurable keep-policy helpers, while main-v2 added the summarize-from / summarize-up-to checkpoint flow.\n\nSolution: preserved both changes. Automatic compaction still applies the keep policy before summarization, and the new manual summarization entry points from main-v2 remain available.\n\nVerification:\n- git diff --check\n- go test ./internal/agent ./internal/control ./internal/cli\n- go test ./...\n\nGitHub now reports the PR as mergeable.

esengine pushed a commit that referenced this pull request Jun 4, 2026
The bot needs to show whether a task actually triggered compaction — that's the signal the cache/compaction PRs (#2405-#2407) are measured by. metricsSink counts CompactionStarted; e2ebench adds a Compactions total to the summary and a per-task Compact column.
esengine added a commit that referenced this pull request Jun 4, 2026
* ci(e2e-bot): drive the PR head, not main-v2

The bot built reasonix from main-v2, so /e2e on a PR measured main-v2's agent, not the PR's code — worthless for pre-merge validation. Build the agent from the PR head (falling back to main-v2 only when the head predates run --metrics), shrink the e2e context_window to 20000 so the tiny suite actually crosses the compaction trigger, and add a compaction task whose six prose chapters force multi-file reads past the threshold while hiding the graded facts in the first and last chapter.

Harness and suite still come from main-v2 so a PR can't weaken its own grader or tests.

* feat(run): count compactions in run --metrics and the e2e report

The bot needs to show whether a task actually triggered compaction — that's the signal the cache/compaction PRs (#2405-#2407) are measured by. metricsSink counts CompactionStarted; e2ebench adds a Compactions total to the summary and a per-task Compact column.

* test(e2e): make the compaction task a sequential clue-chain

A single 'read all six files' prompt let the agent batch the reads into one turn, so the whole corpus landed in the kept tail with no foldable middle and compaction never fired. Chaining each chapter to the next forces one read per turn; history accumulates and folds, and a real run now triggers 3 auto-compactions. The final chapter restates the full deliverable so the task stays solvable across a degraded summary.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
@esengine
Copy link
Copy Markdown
Owner

esengine commented Jun 4, 2026

Running the suite to see how the configurable compaction retention policy affects the compaction task's cache-hit rate — this is the exact area the #3091 cache investigation pointed at (main-v2 baseline sits at 68% on that task). Watching whether the retention knob moves it.

/e2e

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

🤖 Reasonix e2e benchmark

Accuracy: 4/4 (100%) · Cache hit: 76% · Tokens: 167,099 (prompt 164,874 / completion 2,225) · Compactions: 3 · Cost: ¥ 0.0463

Task Result Steps Prompt Completion Cache hit Compact Cost
compaction ✅ pass 8 102,625 898 63% 3 ¥ 0.0413
fix-add-bug ✅ pass 3 18,395 270 98% 0 ¥ 0.0012
fizzbuzz ✅ pass 3 18,704 498 99% 0 ¥ 0.0015
palindrome ✅ pass 4 25,150 559 98% 0 ¥ 0.0022

Real provider run. Cache-hit % is cached prompt tokens / total prompt tokens.

agent: main-v2 fallback (PR head lacks run --metrics) · triggered by @esengine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants