feat: Zero2Agent full architecture overhaul + comprehensive tests by t0ugh-sys · Pull Request #82 · t0ugh-sys/Anvil

t0ugh-sys · 2026-06-02T03:03:13Z

Summary

Based on comprehensive analysis of Zero2Agent tutorial (12 Coding Agent articles + agent basics + framework survey + interview questions), implementing all missing architectural patterns and adding comprehensive test coverage.

Changes (9 commits, 365 tests)

Architecture Improvements

Pattern	Source	File	Status
Agent Loop	s01	tool_use_loop.py	✅ Enhanced
Tool Dispatch Map	s02	tools/init.py	✅ Enhanced
TodoManager (single in_progress)	s03	todo.py	✅ Enhanced
Subagent Model Override	s04	subagents.py	✅ NEW
Skill Loading (SKILL.md)	s05	skills.py	✅ Enhanced
3-Layer Context Compression	s06	compression.py	✅ Enhanced
Persistent Task DAG	s07	task_graph.py	✅ Enhanced
Background Tasks (streaming)	s08	background.py	✅ Enhanced
Agent Teams (ping-pong detection)	s09	team_runtime.py	✅ Enhanced
Plan Approval Protocol	s10	team_runtime.py	✅ NEW
Coordinator System Prompt	s11	prompts.py	✅ NEW
Worktree Isolation	s12	worktree_manager.py	✅ Enhanced

Key Features Added

LoopDetector + TokenBudget: Prevent infinite loops and context overflow
Hooks Integration: PreToolUse/PostToolUse wired into dispatch
Reflexion Mode: Self-critique after tool failures
Time-based Microcompact: Clear ALL tool results after 30min gap (cache expired)
Structured Task Notifications: TaskNotification with XML + usage stats
JSON Repair: Bracket matching + trailing comma + single-quote + regex fallback
Dry-run Mode: write_file/apply_patch preview
Constraint Extraction: Force-inject rules/deadlines from goal
SSRF Protection: All private IP ranges blocked

Bug Fixes

bug: HybridTokenCounter.estimate_messages always returns last API input_tokens #77: HybridTokenCounter.estimate_messages — total_chars computed but never used
Security: shell=False default + shlex.split in hooks
Error handling: JSON parse failures return ok=False, observer exceptions logged

Test Coverage

365 tests (300 original + 65 new)
New test files: test_compression.py (28), test_background.py (9), test_todo.py (12), test_zero2agent_improvements.py (15)
All previously untested modules now have dedicated tests

Issues Closed

bug: session _rebuild_from_tail drops tool_history and permission_stats #74: _rebuild_from_tail already fixed
tools/__init__.py: dead code _tool_defs, register_tool_def, get_tool_def #75: dead code already removed
bug: HybridTokenCounter.estimate_messages always returns last API input_tokens #77: HybridTokenCounter calibration bug fixed

Round 12 — extract _read_messages() helper from JsonlTeamInboxStore. drain() and peek() had identical 10-line JSONL reading blocks. Now both delegate to _read_messages(), cutting ~15 lines. 300/300 tests passing.

…s/team_runtime/cli - __init__.py: convert 28 eager submodule imports to lazy __getattr__ pattern, reducing import startup cost significantly - skills.py: remove dead _skill_doc_path and _legacy_skill_doc_path wrappers - team_runtime.py: remove unused plan_approval_response enum value - cli.py: remove unused Dict import

Inspired by Zero2Agent (onefly.top/zero2Agent) best practices: - hooks.py: Wire PreToolUse/PostToolUse hooks into tool dispatch (was completely dead code, now integrated into _dispatch_tool_calls) - policies.py: Add LoopDetector (repeated tool+args detection) and TokenBudget (cumulative token usage guard) - tool_use_loop.py: Add Reflexion pattern — inject self-critique into state_summary after tool failures so decider can recover - tool_use_loop.py: Add 80% context threshold auto-compact (was only triggering at 100%) - background.py: Replace subprocess.run with Popen for streaming output; add read_output() and kill_task() methods All 300 tests pass.

Security: - hooks.py: Default to shlex.split instead of shell=True (new shell=False default, explicit shell=True opt-in) - search_tools.py: Expand SSRF blocklist with private IP ranges (127.x, 10.x, 192.168.x, 172.16-31.x, 0.0.0.0) Error handling: - core/agent.py: Log observer callback exceptions instead of silently swallowing them - github_tools.py: JSON parse failures now return ok=False instead of masking as ok=True (3 instances) - session.py: Log tail-window fast-path failures at DEBUG level Dead code: - tools/__init__.py: Remove unused builtin_tool_specs_map() - runtime.py: Remove unused import json All 300 tests pass.

…action Based on Zero2Agent interview best practices: - agent_protocol.py: Enhanced JSON repair for malformed LLM output (brace-matching extraction, trailing comma removal, single-quote replacement, regex fallback) - tools/base.py: Add dry_run flag to ToolContext - tools/file_tools.py: write_file and apply_patch support dry-run preview mode (returns what would change without executing) - team_runtime.py: Ping-pong detection in teammate loop — tracks consecutive messages from same sender, blocks at threshold (default 5) to prevent infinite inter-agent loops - tool_use_loop.py: Key constraint extraction from goal text — regex patterns detect rules, versions, deadlines, limits, paths and inject them into state_summary to prevent context drift All 300 tests pass.

…ions, coordinator mode Zero2Agent s06/s10/s11 improvements: - compression.py: add time_based_micro_compact — clear ALL tool results after 30min inactivity gap (prompt cache expired, old results waste context) - compression.py: add created_at timestamp to TranscriptEntry for gap detection - tool_use_loop.py: integrate time-based microcompact in compact flow - team_runtime.py: add plan_approval_request/response to TeamMessageType - team_runtime.py: add request_plan_approval, approve_plan, reject_plan methods - subagents.py: add TaskNotification dataclass with XML serialization - subagents.py: add build_notification() to SubAgentResult with usage stats - subagents.py: track duration_ms in run_once - subagents.py: add model override field to SubAgentSpec - prompts.py: add COORDINATOR_SYSTEM_PROMPT + COORDINATOR_TOOLS_SPEC

…rovements - token_estimation.py: fix estimate_messages — old code computed total_chars but never used it, always returned input_tokens regardless of content. Now stores total_chars from calibration and uses chars-per-token ratio. - token_estimation.py: add total_chars parameter to update_from_response - tests/test_zero2agent_improvements.py: 15 new tests covering: - time_based_micro_compact (6 tests) - TaskNotification XML/dict/build_notification (4 tests) - Plan approval protocol (3 tests) - Coordinator prompt + tools spec (2 tests) - Closed issues #74, #75 (already fixed in previous rounds)

- tests/test_compression.py: 28 tests covering micro_compact_messages, micro_compact_entries, group_messages_by_rounds, partial_compact, CompactConfig, CompactManager, archive, TranscriptEntry - tests/test_background.py: 9 tests covering BackgroundTaskInfo, BackgroundCommandRunner spawn/drain/snapshot/kill/read_output - tests/test_todo.py: 12 tests covering TodoItem, TodoManager write/ validation/snapshot, render_todo_lines, TodoSnapshot - Close issues #74, #75 (already fixed in previous rounds) - Fix issue #77: HybridTokenCounter.estimate_messages calibration bug Total: 365 tests (300 original + 65 new)

t0ugh-sys added 6 commits June 1, 2026 22:27

refactor: deduplicate drain/peek JSONL reading in team_runtime

0bc2ff6

Round 12 — extract _read_messages() helper from JsonlTeamInboxStore. drain() and peek() had identical 10-line JSONL reading blocks. Now both delegate to _read_messages(), cutting ~15 lines. 300/300 tests passing.

t0ugh-sys changed the title ~~feat: Zero2Agent-inspired improvements — hooks, loop detection, reflexion, streaming~~ feat: Zero2Agent-inspired improvements — full architecture overhaul Jun 2, 2026

t0ugh-sys added 2 commits June 2, 2026 15:58

t0ugh-sys changed the title ~~feat: Zero2Agent-inspired improvements — full architecture overhaul~~ feat: Zero2Agent full architecture overhaul + comprehensive tests Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Zero2Agent full architecture overhaul + comprehensive tests#82

feat: Zero2Agent full architecture overhaul + comprehensive tests#82
t0ugh-sys wants to merge 8 commits into
mainfrom
fix/remaining-issues

t0ugh-sys commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

t0ugh-sys commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes (9 commits, 365 tests)

Architecture Improvements

Key Features Added

Bug Fixes

Test Coverage

Issues Closed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

t0ugh-sys commented Jun 2, 2026 •

edited

Loading