-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
The current architecture discussions have oversimplified the Tool Node component. The tool node is not just a pass-through that executes MCP calls - it needs to handle complex verification, retry logic, and multi-modal feedback correlation between the game state and the LLM orchestrator.
Without proper design of this component, the bot will experience:
- False positives (tool reports success but game state didn't change)
- Undetected failures (popups, loading screens blocking transitions)
- Unnecessary LLM round-trips for transient failures that should be retried at the tool level
- Poor recovery decisions due to insufficient context
Desired Outcome
A well-designed Tool Node that:
- Executes + Verifies - Tool success is verified against actual game state
- Handles retries transparently - Transient failures (ADB hiccups, loading delays) are retried without LLM involvement
- Correlates multi-modal feedback - Combines tool returns, state queries, OCR, and vision
- Enriches errors - Provides structured context to LLM for recovery decisions
- Maintains belief state - Tracks where we think we are vs. reality
Detailed Architecture
1. Execute + Verify Pattern
async def tool_node(state: GameState):
tool_call = state["messages"][-1].tool_calls[0]
# Execute
result = await call_mcp_tool(tool_call)
# VERIFY: Tool said success, but did game actually change?
if result.get("success"):
observed = await verify_state_change(tool_call, result)
if observed != result.get("expected_state"):
# Tool lied (or popup/loading interfered)
result["success"] = False
result["error"] = f"Expected {result['expected_state']}, observed {observed}"
result["needs_vision"] = True
return {"messages": [ToolMessage(content=format_result(result))]}2. State Belief Tracking
The bot maintains belief state that may diverge from reality:
class GameState(TypedDict):
believed_page: str # What we think we're on
believed_task: str # What we think we're doing
last_action_success: bool
screenshot_verified: bool # Did we actually look?3. Screenshot-Based Verification
Some tools need post-execution screenshot verification:
async def verify_with_screenshot(expected_state: str) -> Tuple[str, bytes]:
screenshot = await maamcp.screencap()
# Quick OCR first (cheap)
ocr_result = await maamcp.ocr(screenshot)
# If OCR ambiguous, use LLM vision (expensive)
if not confident_match(ocr_result, expected_state):
vision_result = await gemini.analyze_screenshot(
screenshot,
f"Are we on {expected_state}? Check for popups, loading."
)
return vision_result.state, screenshot
return expected_state, screenshot4. Retry Logic (Tool-Level)
Some failures shouldn't bother the LLM:
async def execute_with_verify(tool_call, max_retries=2):
for attempt in range(max_retries):
result = await call_tool(tool_call)
# Transient ADB errors - retry immediately
if is_transient_adb_error(result.get("error")):
await asyncio.sleep(1)
continue
# State mismatch - might be loading, wait and retry
if is_loading_state(result):
await asyncio.sleep(2)
continue
# Verify state actually changed
observed = await get_current_page()
if observed == tool_call["expected_state"]:
return {"success": True, "observed": observed}
# All retries failed - return enriched error to LLM
return {
"success": False,
"error": "State verification failed",
"observed": observed,
"expected": tool_call["expected_state"],
"suggested_action": "use_vision" if is_unknown_state(observed) else "retry"
}5. Feedback Channels
| Channel | Example | Latency |
|---|---|---|
| Tool return | {success: true, observed_state: "page_main"} |
Immediate |
| State query | get_current_state() → "page_main" |
~100ms |
| Screenshot OCR | OCR text on screen | ~500ms |
| Vision LLM | "I see a popup blocking the screen" | ~2s |
Complexity: Bot must correlate these:
# Tool said success, but...
tool_result = {"success": True, "observed_state": "page_commission"}
# State query disagrees
state_query = await alas_mcp.get_current_state() # "page_main"
# Screenshot shows popup
screenshot = await maamcp.screencap()
ocr = await maamcp.ocr(screenshot) # Contains "New Event!"
# Conclusion: Tool succeeded but popup blocked transition
# Action: Dismiss popup, retry6. Structured Error Feedback
LLM needs structured context for recovery:
# Good feedback
{
"tool": "alas_goto",
"target": "page_commission",
"result": {
"tool_success": True, # ALAS thought it worked
"state_verification": "FAILED",
"expected": "page_commission",
"observed": "page_main",
"screenshot_analysis": {
"blocking_element": "event_popup",
"ocr_detected": ["New Event!", "Claim Reward"],
"suggested_action": "dismiss_popup_then_retry"
}
}
}Acceptance Criteria
- Tool Node implementation that verifies state changes post-execution
- Retry logic for transient failures (ADB timeouts, loading states)
- Multi-modal feedback correlation (tool returns + state queries + OCR + vision)
- Belief state tracking (expected vs observed)
- Structured error enrichment for LLM recovery decisions
- Configuration for retry policies per tool type
- Tests for popup detection and recovery scenarios
Notes
Key Insight: The tool node is a verification and enrichment layer, not just a pass-through. It handles the messy reality that:
- Tools can report success while the game shows popups/loading
- ADB can timeout transiently and needs retry
- State must be verified via screenshot when uncertain
- LLM should receive structured context, not raw errors
Related Docs:
docs/plans/llm_driven_gameplay_architecture.md- Higher-level orchestrationdocs/plans/recovery_agent_architecture.md- Recovery patternsagent_orchestrator/alas_mcp_server.py- Current MCP implementation