Skip to content

arch: Tool Node complexity and feedback loop design #21

@Coldaine

Description

@Coldaine

Problem

The current architecture discussions have oversimplified the Tool Node component. The tool node is not just a pass-through that executes MCP calls - it needs to handle complex verification, retry logic, and multi-modal feedback correlation between the game state and the LLM orchestrator.

Without proper design of this component, the bot will experience:

  • False positives (tool reports success but game state didn't change)
  • Undetected failures (popups, loading screens blocking transitions)
  • Unnecessary LLM round-trips for transient failures that should be retried at the tool level
  • Poor recovery decisions due to insufficient context

Desired Outcome

A well-designed Tool Node that:

  1. Executes + Verifies - Tool success is verified against actual game state
  2. Handles retries transparently - Transient failures (ADB hiccups, loading delays) are retried without LLM involvement
  3. Correlates multi-modal feedback - Combines tool returns, state queries, OCR, and vision
  4. Enriches errors - Provides structured context to LLM for recovery decisions
  5. Maintains belief state - Tracks where we think we are vs. reality

Detailed Architecture

1. Execute + Verify Pattern

async def tool_node(state: GameState):
    tool_call = state["messages"][-1].tool_calls[0]
    
    # Execute
    result = await call_mcp_tool(tool_call)
    
    # VERIFY: Tool said success, but did game actually change?
    if result.get("success"):
        observed = await verify_state_change(tool_call, result)
        if observed != result.get("expected_state"):
            # Tool lied (or popup/loading interfered)
            result["success"] = False
            result["error"] = f"Expected {result['expected_state']}, observed {observed}"
            result["needs_vision"] = True
    
    return {"messages": [ToolMessage(content=format_result(result))]}

2. State Belief Tracking

The bot maintains belief state that may diverge from reality:

class GameState(TypedDict):
    believed_page: str      # What we think we're on
    believed_task: str      # What we think we're doing  
    last_action_success: bool
    screenshot_verified: bool  # Did we actually look?

3. Screenshot-Based Verification

Some tools need post-execution screenshot verification:

async def verify_with_screenshot(expected_state: str) -> Tuple[str, bytes]:
    screenshot = await maamcp.screencap()
    
    # Quick OCR first (cheap)
    ocr_result = await maamcp.ocr(screenshot)
    
    # If OCR ambiguous, use LLM vision (expensive)
    if not confident_match(ocr_result, expected_state):
        vision_result = await gemini.analyze_screenshot(
            screenshot,
            f"Are we on {expected_state}? Check for popups, loading."
        )
        return vision_result.state, screenshot
    
    return expected_state, screenshot

4. Retry Logic (Tool-Level)

Some failures shouldn't bother the LLM:

async def execute_with_verify(tool_call, max_retries=2):
    for attempt in range(max_retries):
        result = await call_tool(tool_call)
        
        # Transient ADB errors - retry immediately
        if is_transient_adb_error(result.get("error")):
            await asyncio.sleep(1)
            continue
        
        # State mismatch - might be loading, wait and retry
        if is_loading_state(result):
            await asyncio.sleep(2)
            continue
        
        # Verify state actually changed
        observed = await get_current_page()
        if observed == tool_call["expected_state"]:
            return {"success": True, "observed": observed}
    
    # All retries failed - return enriched error to LLM
    return {
        "success": False,
        "error": "State verification failed",
        "observed": observed,
        "expected": tool_call["expected_state"],
        "suggested_action": "use_vision" if is_unknown_state(observed) else "retry"
    }

5. Feedback Channels

Channel Example Latency
Tool return {success: true, observed_state: "page_main"} Immediate
State query get_current_state()"page_main" ~100ms
Screenshot OCR OCR text on screen ~500ms
Vision LLM "I see a popup blocking the screen" ~2s

Complexity: Bot must correlate these:

# Tool said success, but...
tool_result = {"success": True, "observed_state": "page_commission"}

# State query disagrees
state_query = await alas_mcp.get_current_state()  # "page_main"

# Screenshot shows popup
screenshot = await maamcp.screencap()
ocr = await maamcp.ocr(screenshot)  # Contains "New Event!"

# Conclusion: Tool succeeded but popup blocked transition
# Action: Dismiss popup, retry

6. Structured Error Feedback

LLM needs structured context for recovery:

# Good feedback
{
    "tool": "alas_goto",
    "target": "page_commission",
    "result": {
        "tool_success": True,  # ALAS thought it worked
        "state_verification": "FAILED",
        "expected": "page_commission",
        "observed": "page_main",
        "screenshot_analysis": {
            "blocking_element": "event_popup",
            "ocr_detected": ["New Event!", "Claim Reward"],
            "suggested_action": "dismiss_popup_then_retry"
        }
    }
}

Acceptance Criteria

  • Tool Node implementation that verifies state changes post-execution
  • Retry logic for transient failures (ADB timeouts, loading states)
  • Multi-modal feedback correlation (tool returns + state queries + OCR + vision)
  • Belief state tracking (expected vs observed)
  • Structured error enrichment for LLM recovery decisions
  • Configuration for retry policies per tool type
  • Tests for popup detection and recovery scenarios

Notes

Key Insight: The tool node is a verification and enrichment layer, not just a pass-through. It handles the messy reality that:

  1. Tools can report success while the game shows popups/loading
  2. ADB can timeout transiently and needs retry
  3. State must be verified via screenshot when uncertain
  4. LLM should receive structured context, not raw errors

Related Docs:

  • docs/plans/llm_driven_gameplay_architecture.md - Higher-level orchestration
  • docs/plans/recovery_agent_architecture.md - Recovery patterns
  • agent_orchestrator/alas_mcp_server.py - Current MCP implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions