arch: Tool Node complexity and feedback loop design

## Problem

The current architecture discussions have oversimplified the **Tool Node** component. The tool node is not just a pass-through that executes MCP calls - it needs to handle complex verification, retry logic, and multi-modal feedback correlation between the game state and the LLM orchestrator.

Without proper design of this component, the bot will experience:
- False positives (tool reports success but game state didn't change)
- Undetected failures (popups, loading screens blocking transitions)
- Unnecessary LLM round-trips for transient failures that should be retried at the tool level
- Poor recovery decisions due to insufficient context

## Desired Outcome

A well-designed Tool Node that:
1. **Executes + Verifies** - Tool success is verified against actual game state
2. **Handles retries transparently** - Transient failures (ADB hiccups, loading delays) are retried without LLM involvement
3. **Correlates multi-modal feedback** - Combines tool returns, state queries, OCR, and vision
4. **Enriches errors** - Provides structured context to LLM for recovery decisions
5. **Maintains belief state** - Tracks where we think we are vs. reality

## Detailed Architecture

### 1. Execute + Verify Pattern

```python
async def tool_node(state: GameState):
    tool_call = state["messages"][-1].tool_calls[0]
    
    # Execute
    result = await call_mcp_tool(tool_call)
    
    # VERIFY: Tool said success, but did game actually change?
    if result.get("success"):
        observed = await verify_state_change(tool_call, result)
        if observed != result.get("expected_state"):
            # Tool lied (or popup/loading interfered)
            result["success"] = False
            result["error"] = f"Expected {result['expected_state']}, observed {observed}"
            result["needs_vision"] = True
    
    return {"messages": [ToolMessage(content=format_result(result))]}
```

### 2. State Belief Tracking

The bot maintains belief state that may diverge from reality:

```python
class GameState(TypedDict):
    believed_page: str      # What we think we're on
    believed_task: str      # What we think we're doing  
    last_action_success: bool
    screenshot_verified: bool  # Did we actually look?
```

### 3. Screenshot-Based Verification

Some tools need post-execution screenshot verification:

```python
async def verify_with_screenshot(expected_state: str) -> Tuple[str, bytes]:
    screenshot = await maamcp.screencap()
    
    # Quick OCR first (cheap)
    ocr_result = await maamcp.ocr(screenshot)
    
    # If OCR ambiguous, use LLM vision (expensive)
    if not confident_match(ocr_result, expected_state):
        vision_result = await gemini.analyze_screenshot(
            screenshot,
            f"Are we on {expected_state}? Check for popups, loading."
        )
        return vision_result.state, screenshot
    
    return expected_state, screenshot
```

### 4. Retry Logic (Tool-Level)

Some failures shouldn't bother the LLM:

```python
async def execute_with_verify(tool_call, max_retries=2):
    for attempt in range(max_retries):
        result = await call_tool(tool_call)
        
        # Transient ADB errors - retry immediately
        if is_transient_adb_error(result.get("error")):
            await asyncio.sleep(1)
            continue
        
        # State mismatch - might be loading, wait and retry
        if is_loading_state(result):
            await asyncio.sleep(2)
            continue
        
        # Verify state actually changed
        observed = await get_current_page()
        if observed == tool_call["expected_state"]:
            return {"success": True, "observed": observed}
    
    # All retries failed - return enriched error to LLM
    return {
        "success": False,
        "error": "State verification failed",
        "observed": observed,
        "expected": tool_call["expected_state"],
        "suggested_action": "use_vision" if is_unknown_state(observed) else "retry"
    }
```

### 5. Feedback Channels

| Channel | Example | Latency |
|---------|---------|---------|
| Tool return | `{success: true, observed_state: "page_main"}` | Immediate |
| State query | `get_current_state()` → `"page_main"` | ~100ms |
| Screenshot OCR | OCR text on screen | ~500ms |
| Vision LLM | "I see a popup blocking the screen" | ~2s |

**Complexity**: Bot must correlate these:

```python
# Tool said success, but...
tool_result = {"success": True, "observed_state": "page_commission"}

# State query disagrees
state_query = await alas_mcp.get_current_state()  # "page_main"

# Screenshot shows popup
screenshot = await maamcp.screencap()
ocr = await maamcp.ocr(screenshot)  # Contains "New Event!"

# Conclusion: Tool succeeded but popup blocked transition
# Action: Dismiss popup, retry
```

### 6. Structured Error Feedback

LLM needs structured context for recovery:

```python
# Good feedback
{
    "tool": "alas_goto",
    "target": "page_commission",
    "result": {
        "tool_success": True,  # ALAS thought it worked
        "state_verification": "FAILED",
        "expected": "page_commission",
        "observed": "page_main",
        "screenshot_analysis": {
            "blocking_element": "event_popup",
            "ocr_detected": ["New Event!", "Claim Reward"],
            "suggested_action": "dismiss_popup_then_retry"
        }
    }
}
```

## Acceptance Criteria

- [ ] Tool Node implementation that verifies state changes post-execution
- [ ] Retry logic for transient failures (ADB timeouts, loading states)
- [ ] Multi-modal feedback correlation (tool returns + state queries + OCR + vision)
- [ ] Belief state tracking (expected vs observed)
- [ ] Structured error enrichment for LLM recovery decisions
- [ ] Configuration for retry policies per tool type
- [ ] Tests for popup detection and recovery scenarios

## Notes

**Key Insight**: The tool node is a **verification and enrichment layer**, not just a pass-through. It handles the messy reality that:
1. Tools can report success while the game shows popups/loading
2. ADB can timeout transiently and needs retry
3. State must be verified via screenshot when uncertain
4. LLM should receive structured context, not raw errors

**Related Docs**:
- `docs/plans/llm_driven_gameplay_architecture.md` - Higher-level orchestration
- `docs/plans/recovery_agent_architecture.md` - Recovery patterns
- `agent_orchestrator/alas_mcp_server.py` - Current MCP implementation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch: Tool Node complexity and feedback loop design #21

Problem

Desired Outcome

Detailed Architecture

1. Execute + Verify Pattern

2. State Belief Tracking

3. Screenshot-Based Verification

4. Retry Logic (Tool-Level)

5. Feedback Channels

6. Structured Error Feedback

Acceptance Criteria

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Channel	Example	Latency
Tool return	`{success: true, observed_state: "page_main"}`	Immediate
State query	`get_current_state()` → `"page_main"`	~100ms
Screenshot OCR	OCR text on screen	~500ms
Vision LLM	"I see a popup blocking the screen"	~2s

arch: Tool Node complexity and feedback loop design #21

Description

Problem

Desired Outcome

Detailed Architecture

1. Execute + Verify Pattern

2. State Belief Tracking

3. Screenshot-Based Verification

4. Retry Logic (Tool-Level)

5. Feedback Channels

6. Structured Error Feedback

Acceptance Criteria

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions