Skip to content

feat: Zero2Agent full architecture overhaul + comprehensive tests#82

Open
t0ugh-sys wants to merge 8 commits into
mainfrom
fix/remaining-issues
Open

feat: Zero2Agent full architecture overhaul + comprehensive tests#82
t0ugh-sys wants to merge 8 commits into
mainfrom
fix/remaining-issues

Conversation

@t0ugh-sys
Copy link
Copy Markdown
Owner

@t0ugh-sys t0ugh-sys commented Jun 2, 2026

Summary

Based on comprehensive analysis of Zero2Agent tutorial (12 Coding Agent articles + agent basics + framework survey + interview questions), implementing all missing architectural patterns and adding comprehensive test coverage.

Changes (9 commits, 365 tests)

Architecture Improvements

Pattern Source File Status
Agent Loop s01 tool_use_loop.py ✅ Enhanced
Tool Dispatch Map s02 tools/init.py ✅ Enhanced
TodoManager (single in_progress) s03 todo.py ✅ Enhanced
Subagent Model Override s04 subagents.py ✅ NEW
Skill Loading (SKILL.md) s05 skills.py ✅ Enhanced
3-Layer Context Compression s06 compression.py ✅ Enhanced
Persistent Task DAG s07 task_graph.py ✅ Enhanced
Background Tasks (streaming) s08 background.py ✅ Enhanced
Agent Teams (ping-pong detection) s09 team_runtime.py ✅ Enhanced
Plan Approval Protocol s10 team_runtime.py ✅ NEW
Coordinator System Prompt s11 prompts.py ✅ NEW
Worktree Isolation s12 worktree_manager.py ✅ Enhanced

Key Features Added

  • LoopDetector + TokenBudget: Prevent infinite loops and context overflow
  • Hooks Integration: PreToolUse/PostToolUse wired into dispatch
  • Reflexion Mode: Self-critique after tool failures
  • Time-based Microcompact: Clear ALL tool results after 30min gap (cache expired)
  • Structured Task Notifications: TaskNotification with XML + usage stats
  • JSON Repair: Bracket matching + trailing comma + single-quote + regex fallback
  • Dry-run Mode: write_file/apply_patch preview
  • Constraint Extraction: Force-inject rules/deadlines from goal
  • SSRF Protection: All private IP ranges blocked

Bug Fixes

Test Coverage

  • 365 tests (300 original + 65 new)
  • New test files: test_compression.py (28), test_background.py (9), test_todo.py (12), test_zero2agent_improvements.py (15)
  • All previously untested modules now have dedicated tests

Issues Closed

t0ugh-sys added 6 commits June 1, 2026 22:27
Round 12 — extract _read_messages() helper from JsonlTeamInboxStore.

drain() and peek() had identical 10-line JSONL reading blocks.
Now both delegate to _read_messages(), cutting ~15 lines.

300/300 tests passing.
…s/team_runtime/cli

- __init__.py: convert 28 eager submodule imports to lazy __getattr__
  pattern, reducing import startup cost significantly
- skills.py: remove dead _skill_doc_path and _legacy_skill_doc_path wrappers
- team_runtime.py: remove unused plan_approval_response enum value
- cli.py: remove unused Dict import
Inspired by Zero2Agent (onefly.top/zero2Agent) best practices:

- hooks.py: Wire PreToolUse/PostToolUse hooks into tool dispatch
  (was completely dead code, now integrated into _dispatch_tool_calls)
- policies.py: Add LoopDetector (repeated tool+args detection)
  and TokenBudget (cumulative token usage guard)
- tool_use_loop.py: Add Reflexion pattern — inject self-critique
  into state_summary after tool failures so decider can recover
- tool_use_loop.py: Add 80% context threshold auto-compact
  (was only triggering at 100%)
- background.py: Replace subprocess.run with Popen for streaming
  output; add read_output() and kill_task() methods

All 300 tests pass.
Security:
- hooks.py: Default to shlex.split instead of shell=True
  (new shell=False default, explicit shell=True opt-in)
- search_tools.py: Expand SSRF blocklist with private IP ranges
  (127.x, 10.x, 192.168.x, 172.16-31.x, 0.0.0.0)

Error handling:
- core/agent.py: Log observer callback exceptions instead of
  silently swallowing them
- github_tools.py: JSON parse failures now return ok=False
  instead of masking as ok=True (3 instances)
- session.py: Log tail-window fast-path failures at DEBUG level

Dead code:
- tools/__init__.py: Remove unused builtin_tool_specs_map()
- runtime.py: Remove unused import json

All 300 tests pass.
…action

Based on Zero2Agent interview best practices:

- agent_protocol.py: Enhanced JSON repair for malformed LLM output
  (brace-matching extraction, trailing comma removal, single-quote
  replacement, regex fallback)
- tools/base.py: Add dry_run flag to ToolContext
- tools/file_tools.py: write_file and apply_patch support dry-run
  preview mode (returns what would change without executing)
- team_runtime.py: Ping-pong detection in teammate loop — tracks
  consecutive messages from same sender, blocks at threshold (default 5)
  to prevent infinite inter-agent loops
- tool_use_loop.py: Key constraint extraction from goal text — regex
  patterns detect rules, versions, deadlines, limits, paths and inject
  them into state_summary to prevent context drift

All 300 tests pass.
…ions, coordinator mode

Zero2Agent s06/s10/s11 improvements:
- compression.py: add time_based_micro_compact — clear ALL tool results after
  30min inactivity gap (prompt cache expired, old results waste context)
- compression.py: add created_at timestamp to TranscriptEntry for gap detection
- tool_use_loop.py: integrate time-based microcompact in compact flow
- team_runtime.py: add plan_approval_request/response to TeamMessageType
- team_runtime.py: add request_plan_approval, approve_plan, reject_plan methods
- subagents.py: add TaskNotification dataclass with XML serialization
- subagents.py: add build_notification() to SubAgentResult with usage stats
- subagents.py: track duration_ms in run_once
- subagents.py: add model override field to SubAgentSpec
- prompts.py: add COORDINATOR_SYSTEM_PROMPT + COORDINATOR_TOOLS_SPEC
@t0ugh-sys t0ugh-sys changed the title feat: Zero2Agent-inspired improvements — hooks, loop detection, reflexion, streaming feat: Zero2Agent-inspired improvements — full architecture overhaul Jun 2, 2026
t0ugh-sys added 2 commits June 2, 2026 15:58
…rovements

- token_estimation.py: fix estimate_messages — old code computed total_chars
  but never used it, always returned input_tokens regardless of content.
  Now stores total_chars from calibration and uses chars-per-token ratio.
- token_estimation.py: add total_chars parameter to update_from_response
- tests/test_zero2agent_improvements.py: 15 new tests covering:
  - time_based_micro_compact (6 tests)
  - TaskNotification XML/dict/build_notification (4 tests)
  - Plan approval protocol (3 tests)
  - Coordinator prompt + tools spec (2 tests)
- Closed issues #74, #75 (already fixed in previous rounds)
- tests/test_compression.py: 28 tests covering micro_compact_messages,
  micro_compact_entries, group_messages_by_rounds, partial_compact,
  CompactConfig, CompactManager, archive, TranscriptEntry
- tests/test_background.py: 9 tests covering BackgroundTaskInfo,
  BackgroundCommandRunner spawn/drain/snapshot/kill/read_output
- tests/test_todo.py: 12 tests covering TodoItem, TodoManager write/
  validation/snapshot, render_todo_lines, TodoSnapshot
- Close issues #74, #75 (already fixed in previous rounds)
- Fix issue #77: HybridTokenCounter.estimate_messages calibration bug

Total: 365 tests (300 original + 65 new)
@t0ugh-sys t0ugh-sys changed the title feat: Zero2Agent-inspired improvements — full architecture overhaul feat: Zero2Agent full architecture overhaul + comprehensive tests Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant