Skip to content

fix(todo): refactor SetTodoList to persist state and prevent tool call storms#1742

Merged
RealKai42 merged 8 commits intomainfrom
kaiyi/montpellier
Apr 3, 2026
Merged

fix(todo): refactor SetTodoList to persist state and prevent tool call storms#1742
RealKai42 merged 8 commits intomainfrom
kaiyi/montpellier

Conversation

@RealKai42
Copy link
Copy Markdown
Collaborator

@RealKai42 RealKai42 commented Apr 3, 2026

Summary

  • Refactored SetTodoList to persist todos to SessionState (root agent) and independent state.json (sub-agents), enabling proper state management and isolation
  • Added query mode: omit todos param to read current state; pass [] to clear all todos
  • Added anti-storm guidance in tool description to prevent repeated calls without real progress
  • Protected against race conditions in Session.save_state() for the new todos field

Background

After updating to v1.29.0, users with Shell disabled experienced a "SetTodoList storm" where the model would repeatedly call SetTodoList without making real progress (up to 100 times per turn). Root causes:

  1. SetTodoList returned output="" — the model received no meaningful confirmation
  2. Without Shell, the model had no execution outlet and kept replanning

Changes

Core (src/kimi_cli/)

  • session_state.py: Added TodoItemState model and SessionState.todos field
  • tools/todo/__init__.py: Full rewrite — DI via Runtime, read/write/clear modes, root/subagent storage isolation
  • tools/todo/set_todo_list.md: Updated description with usage modes and anti-storm guidance
  • session.py: Added todos refresh in save_state() to prevent race condition overwrites

Tests

  • New tests/tools/test_todo.py with 12 test cases covering all modes, persistence, isolation, and edge cases
  • New tests/core/test_session.py::test_save_state_preserves_external_todos
  • Updated E2E snapshots and schema/description snapshots

Test plan

  • Unit tests: 2033 passed, 0 failed
  • E2E tests: all passing
  • make check (ruff + pyright): 0 errors
  • Pre-commit hooks: all passed
  • Manual test: verify SetTodoList no longer storms with Shell disabled

Closes #1710


Open with Devin

…l storms

Refactored the SetTodoList tool to address issue #1710 where the model
would repeatedly call SetTodoList in a loop ("storm") especially when
Shell was disabled.

Key changes:
- Persist todos to SessionState (root agent) and independent state.json
  (sub-agents) for proper state management
- Add query mode: omit todos param to read current state, pass [] to clear
- Add anti-storm guidance in tool description preventing repeated calls
  without progress
- Protect against race conditions in Session.save_state() for todos field
- Return non-empty output ("Todo list updated") so models get confirmation

Closes #1710
Copilot AI review requested due to automatic review settings April 3, 2026 06:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the SetTodoList tool to persist todo state (root session vs. subagent isolation), adds a query/clear mode to reduce repeated tool calls, and updates session persistence to avoid overwriting externally-written todos.

Changes:

  • Make SetTodoList support update/query/clear via optional todos param, with non-empty tool output and updated tool description guidance.
  • Persist root todos into SessionState (state.json) and persist subagent todos into per-subagent state.json.
  • Add/adjust unit + e2e tests and update schema/description snapshots; refresh Session.save_state() to preserve externally-updated todos.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/tools/test_tool_schemas.py Updates expected JSON schema to allow todos: null / omitted.
tests/tools/test_tool_descriptions.py Updates SetTodoList description snapshot with usage modes + anti-storm guidance.
tests/tools/test_todo.py Adds new unit tests for query/update/clear and root/subagent persistence behavior.
tests/core/test_session.py Adds regression test ensuring save_state() doesn’t overwrite externally-written todos.
tests/conftest.py Updates set_todo_list_tool fixture to inject Runtime.
tests_e2e/test_wire_prompt.py Updates e2e snapshot to expect non-empty SetTodoList output.
tests_e2e/test_wire_approvals_tools.py Updates e2e snapshots to expect non-empty SetTodoList output.
src/kimi_cli/tools/todo/set_todo_list.md Documents the new usage modes and anti-storm guidance.
src/kimi_cli/tools/todo/init.py Rewrites tool to support modes + persistence for root/subagents.
src/kimi_cli/session.py Reloads todos in save_state() to avoid race-condition overwrites.
src/kimi_cli/session_state.py Adds TodoItemState and SessionState.todos persisted field.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +111 to +113
todos = self._runtime.session.state.todos
return [Todo(title=t.title, status=t.status) for t in todos]

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In root Query mode, _load_root_todos() reads from self._runtime.session.state.todos only. This can return stale data if another component/process writes updated todos to state.json (the Session.save_state() change here explicitly acknowledges externally-mutated todos). Consider loading from disk via load_session_state(session.dir) (and optionally refreshing session.state.todos) so Query mode reflects the persisted source of truth.

Suggested change
todos = self._runtime.session.state.todos
return [Todo(title=t.title, status=t.status) for t in todos]
from kimi_cli.session_state import load_session_state
session = self._runtime.session
fresh = load_session_state(session.dir)
session.state.todos = fresh.todos
return [Todo(title=t.title, status=t.status) for t in fresh.todos]

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +151
return json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
logger.warning("Corrupted subagent todo state, using defaults: {path}", path=path)
return {}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_read_subagent_state() assumes the decoded JSON is a dict and only catches JSONDecodeError/OSError. If state.json is valid JSON but not an object (e.g., []), or if it contains invalid UTF-8 (UnicodeDecodeError), subsequent .get()/indexing will raise and fail the tool. Consider catching UnicodeDecodeError and validating isinstance(data, dict) (fallback to {}) to keep reads/writes robust against corrupted-but-parseable files.

Suggested change
return json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
logger.warning("Corrupted subagent todo state, using defaults: {path}", path=path)
return {}
data = json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError, UnicodeDecodeError):
logger.warning("Corrupted subagent todo state, using defaults: {path}", path=path)
return {}
if not isinstance(data, dict):
logger.warning("Invalid subagent todo state type, using defaults: {path}", path=path)
return {}
return data

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3987feed55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

session.state.todos = items

def _load_root_todos(self) -> list[Todo]:
todos = self._runtime.session.state.todos
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reload root todos from disk before serving query mode

Query mode is intended to return the current persisted todo list, but this path reads only runtime.session.state.todos, which can be stale if another process/component has updated state.json directly (the same cross-writer scenario this commit already accounts for in Session.save_state). In that case the model sees outdated todos and may overwrite newer items on its next update.

Useful? React with 👍 / 👎.

if not path.exists():
return {}
try:
return json.loads(path.read_text(encoding="utf-8"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate loaded subagent state is an object

The corruption fallback only handles JSON parse/IO failures, but json.loads may successfully return a non-dict (for example []), and later callers assume mapping methods (data.get(...) / key assignment). That causes a runtime exception instead of graceful recovery, so a malformed-but-valid state.json can break both read and write modes.

Useful? React with 👍 / 👎.

- Load root todos from disk in query mode to ensure freshness
- Catch UnicodeDecodeError in subagent state file reading
- Validate that parsed JSON is a dict before using it
- Add cast for pyright type narrowing after isinstance check

Addresses PR review feedback from #1742
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2e201553e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +131 to +133
raw_todos = data.get("todos", [])
result: list[Todo] = []
for item in raw_todos:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate todos value is a list before iteration

_load_subagent_todos assumes data["todos"] is iterable, but valid JSON like {"todos": null} or {"todos": 1} will make for item in raw_todos raise TypeError and surface as a tool runtime error. This commit already treats malformed subagent state as recoverable, so this case should also fall back safely (for example to []) instead of failing query mode for subagents.

Useful? React with 👍 / 👎.

Guard against non-list values in subagent state.json (e.g. {"todos": null})
that would cause TypeError during iteration.

Addresses PR review feedback from #1742
devin-ai-integration[bot]

This comment was marked as resolved.

Replace standalone read-modify-write in _save_root_todos with
session.save_state() to reuse existing merge semantics and avoid
overwriting worker-owned fields.

Remove todos from externally-mutable field refresh in save_state()
since todos are worker-owned, not externally-mutable.

Addresses PR review feedback from #1742
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bd46e7a31

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

class TodoItemState(BaseModel):
"""A single todo item stored in session or subagent state."""

title: str
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce non-empty titles in persisted todo state

TodoItemState allows empty title values, but root query mode later reconstructs each item as Todo(title=...) (which requires min_length=1). If state.json contains an empty title (e.g., from a manual edit or another writer), SetTodoList query mode will raise a runtime error instead of returning the list or recovering, so a malformed-but-parseable root state can break the tool. Adding the same title constraint to TodoItemState (or handling invalid entries during _load_root_todos) avoids this crash path.

Useful? React with 👍 / 👎.

Signed-off-by: Kai <me@kaiyi.cool>
Add per-item try/except in _load_root_todos to skip invalid items
(e.g. empty titles) instead of crashing query mode, consistent with
the subagent loading path.

Addresses PR review feedback from #1742
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cfe8c25f0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

archived_at: float | None = None
auto_archive_exempt: bool = False
# Todo list state
todos: list[TodoItemState] = Field(default_factory=lambda: list[TodoItemState]())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Merge persisted todos before saving session state

Adding SessionState.todos makes todos vulnerable to cross-process clobbering because Session.save_state() still reloads only title/archive fields before writing self.state back. In a multi-writer flow (e.g., process A calls SetTodoList, process B later calls save_state() for approvals or plan mode), process B can overwrite the newer on-disk todos with its stale in-memory list. This is a data-loss regression introduced with the new todos field and should be handled the same way as other externally mutable state (or otherwise protected with field-level merge semantics).

Useful? React with 👍 / 👎.

Replace misleading lambda: list[TodoItemState]() with default_factory=list
for consistency with other list fields in SessionState.
@RealKai42
Copy link
Copy Markdown
Collaborator Author

@codex

@RealKai42
Copy link
Copy Markdown
Collaborator Author

@chatgpt-codex-connector

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@RealKai42 RealKai42 merged commit a8f09bc into main Apr 3, 2026
23 of 26 checks passed
@RealKai42 RealKai42 deleted the kaiyi/montpellier branch April 3, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

更新到1.29.0后,可能出现SetTodoList风暴 || After updating to 1.29.0, SetTodoList storm may occur

2 participants