Skip to content

feat(mcp): add optics mcp server exposing keywords over MCP#308

Open
chinmayajha wants to merge 5 commits into
mozarkai:mainfrom
chinmayajha:feat/optics-mcp-server
Open

feat(mcp): add optics mcp server exposing keywords over MCP#308
chinmayajha wants to merge 5 commits into
mozarkai:mainfrom
chinmayajha:feat/optics-mcp-server

Conversation

@chinmayajha

Copy link
Copy Markdown
Collaborator

What this adds

A new optics mcp command that runs a Model Context Protocol server. It lets an MCP client like Claude or Cursor drive a live device or browser through optics — start a session, run keywords, and look at the screen.

Usage

pip install 'optics-framework[mcp]'

optics mcp                    # stdio, for local clients
optics mcp --transport http   # networked

The server exposes:

  • Toolsstart_session, terminate_session, screenshot, and every optics keyword (press_element, enter_text, swipe, assert_presence, …). Each keyword tool takes a session_id.
  • Resources — the keyword catalog at optics://keywords, and live device state per session (screenshot, page source, interactive elements).

Full guide: docs/usage/mcp_usage.md.

Dependency change

fastmcp needs starlette >= 1.0, which FastAPI only supports from 0.119. So the FastAPI pin moves from <0.119 to <0.138. This affects every install, not just the mcp extra — the lock now resolves FastAPI 0.137 / starlette 1.3. The existing test suite passes on the new versions.

Testing

  • Unit tests for the server: tool and resource registration, parameter schemas, dispatch, and error handling.
  • Tried end to end against a real Android device over Appium — start a session, take a screenshot, swipe, tear down.

Follow-ups

A couple of things left for later, tracked as separate issues (linked in a comment):

  • The screenshot resource comes back as application/octet-stream rather than image/png — a fastmcp quirk with templated resources. The screenshot tool returns a proper image in the meantime.
  • optics mcp and optics serve each keep their own sessions and can't share them.

Add a Model Context Protocol server (fastmcp) and an `optics mcp` command so MCP clients (Claude, Cursor) can drive a live device/browser through optics: start a session, run keywords as tools, and read device state via resources. Built as a thin wrapper over the existing keyword execution path.
Add docs/usage/mcp_usage.md (install, usage, the user journey, and troubleshooting), wire it into the docs nav, and record the MCP server in CLAUDE.md.
Tests for tool/resource registration, the string-typed parameter schemas, keyword dispatch and stringification, error translation, and the optional-dependency guard.
fastmcp needs starlette>=1.0, which FastAPI supports from 0.119, so the pin moves from <0.119 to <0.138; fastmcp is pinned >=3.4.2 (earlier 3.4.x omit the starlette pin). Lock regenerated.
@chinmayajha

Copy link
Copy Markdown
Collaborator Author

Follow-up issues

The two follow-ups from the description, tracked separately:

Comment thread pyproject.toml
… agreement

The recent action_keyword.py refactor and MCP-server commits shifted many
file:line anchors in CLAUDE.md. Re-verify every cited anchor against source
and correct the drifted ones (cli, action_keyword, strategies, live,
nl_agent, session_manager, optics_builder, base_factory, config_handler,
error, factories, and minor nudges elsewhere).

Also fix a factual error: LLM backends do not set a NAME attribute
(GeminiLLM has none) - engines are resolved by module filename matching the
config key, with NAME only used for driver<->element-source matching via
REQUIRED_DRIVER_TYPE. Reword the driver and LLM 'where to put things'
bullets accordingly.

Add a 'Working agreement' section: no AI co-author trailers, assume Codex
reviews every PR, split large pushes into reviewable commits, and run a
post-PR gap analysis that turns approved findings into tracked GitHub
issues referenced from the PR.
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants