Skip to content

MCP server v2: non-blocking primitives, health probe, CLI mode #35

@Coldaine

Description

@Coldaine

Problem

The current MCP server (agent_orchestrator/alas_mcp_server.py) has three architectural flaws that block reliable LLM-driven operation:

1. Blocking tool calls lock the agent

alas_login_ensure_main calls LoginHandler.handle_app_login(), which is a while 1: loop that can spin for up to 300 seconds before timing out (LOGIN_MAX_TOTAL_SECONDS = 300). When called via MCP over stdio, the entire JSON-RPC transport is blocked. The agent cannot take screenshots, check health, or do anything else — it's stuck waiting for a Python function that may never return.

This is the root cause of the agent getting "stuck" when calling login.

2. No ADB health probe

When ADB dies or the emulator isn't rendering, adb_screenshot either hangs or returns a black frame. There's no way to ask "is the connection alive?" before committing to a call. The agent can't distinguish between:

  • Game loading screen (wait)
  • ADB transport broken (reconnect)
  • Emulator process dead (restart)

3. No CLI mode

alas_mcp_server.py only runs as mcp.run(transport="stdio"). There's no way to run python alas_mcp_server.py screenshot from a terminal. When something goes wrong, you can't poke at it without an MCP client.

4. Tool proliferation anti-pattern

alas_login_ensure_main is a dedicated MCP tool wrapping one ALAS workflow. Following this pattern would require alas_commission_run, alas_dorm_collect, etc. — dozens of tools that are all thin wrappers.

alas_call_tool(name) already exists and can invoke any registered tool. Dedicated workflow wrappers should not exist as separate MCP tools.

Proposed Design

Tool surface (all non-blocking or bounded)

Tool What it does Max time
adb_health NEW - Check ADB transport, emulator process, return structured status < 2s
adb_screenshot Take screenshot (existing) < 3s
adb_tap Tap coordinate (existing) < 1s
adb_swipe Swipe (existing) < 1s
alas_state Current page name (rename from alas_get_current_state) < 3s
alas_goto Navigate with timeout (existing, add bound) < 30s
alas_tools List available tools (rename from alas_list_tools) < 1s
alas_run Run a named tool with timeout (rename from alas_call_tool) < 60s

Remove

  • alas_login_ensure_main — login becomes the agent calling screenshot/tap/state in a loop, not a single blocking function

Add: adb_health tool

@mcp.tool()
def adb_health() -> Dict[str, Any]:
    """Check ADB and emulator connectivity.
    
    Returns structured status:
    {
        "adb_connected": bool,
        "emulator_running": bool,  
        "screenshot_ok": bool,
        "serial": str,
        "error": str | None
    }
    """

Add: CLI mode

# Debug from terminal without MCP client
python alas_mcp_server.py cli --health
python alas_mcp_server.py cli --screenshot out.png
python alas_mcp_server.py cli --state
python alas_mcp_server.py cli --tap 640 360
python alas_mcp_server.py cli --run commission.run
python alas_mcp_server.py cli --list-tools

# MCP mode (existing, default)
python alas_mcp_server.py         # stdio MCP server

Add: Timeout wrapper on all tools

Every MCP tool call should be wrapped with a configurable timeout so that a hung ADB call doesn't block the transport indefinitely.

Acceptance Criteria

  • alas_login_ensure_main removed from MCP tool surface
  • adb_health tool added and returning structured connectivity status
  • All MCP tools have bounded execution time (timeout wrapper)
  • CLI mode added (python alas_mcp_server.py cli --<command>)
  • Tool names simplified (alas_get_current_statealas_state, etc.)
  • Black screenshot detection: adb_screenshot reports if image is all-black

Context

  • Current MCP server: agent_orchestrator/alas_mcp_server.py (300 lines)
  • Blocking login handler: alas_wrapped/module/handler/login.py (300s while 1: loop)
  • Related: Issue arch: Tool Node complexity and feedback loop design #21 (Tool Node design) describes verification patterns that depend on this foundation
  • CLAUDE.md mandates: "MCP-only control path" and "the LLM agent IS the scheduler"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions