tauri-webdriver

Open-source macOS WebDriver for Tauri apps.

Enables automated end-to-end testing of Tauri desktop applications on macOS, where no native WKWebView WebDriver exists.

Disclosure: The code for this project was written in collaboration with Claude Code

The Problem

Tauri apps use WKWebView on macOS. Unlike Linux (WebKitWebDriver) and Windows (Edge WebDriver), Apple does not provide a WebDriver implementation for WKWebView. This means Tauri developers cannot run automated e2e tests on macOS using standard WebDriver tools like WebDriverIO or Selenium.

This is a blocker for any Tauri app with platform-specific code (e.g., deep links, native menus, file associations) that must be tested on every platform.

The Solution

tauri-webdriver provides two crates that together bridge the gap:

tauri-plugin-webdriver-automation -- A Tauri plugin that runs inside your app (debug builds only). It starts a local HTTP server that can interact with your app's webview: find elements, click buttons, read text, manage windows, and execute JavaScript.
tauri-webdriver-automation (CLI binary: tauri-wd) -- A standalone CLI binary that implements the W3C WebDriver protocol. It launches your Tauri app, connects to the plugin's HTTP server, and translates standard WebDriver commands into plugin API calls. WebDriverIO, Selenium, or any W3C-compatible client can connect to it.

WebDriverIO/Selenium                 tauri-wd CLI                Your Tauri App
  (test runner)        ──HTTP──>    (W3C WebDriver)   ──HTTP──>  (plugin server)
                        :4444                                     :{dynamic port}

Who Is This For?

Tauri app developers who need automated e2e tests on macOS
CI/CD pipelines that run tests and need a macOS solution
Anyone with platform-specific Tauri code that must be verified on macOS (deep links, native APIs, system integrations)

Quick Start

1. Add the plugin to your Tauri app

cd src-tauri
cargo add tauri-plugin-webdriver-automation

Register it in your app (debug builds only):

let mut builder = tauri::Builder::default();
#[cfg(debug_assertions)]
{
    builder = builder.plugin(tauri_plugin_webdriver_automation::init());
}

2. Install the CLI

cargo install tauri-webdriver-automation

3. Configure WebDriverIO

// wdio.conf.mjs
export const config = {
    port: 4444,
    capabilities: [{
        'tauri:options': {
            binary: './src-tauri/target/debug/my-app',
        }
    }],
    // ... your test config
};

4. Start your frontend dev server

If your Tauri app uses a dev server (Vite, Next.js, etc.), it must be running before tauri-wd launches your app. The debug binary loads your frontend from devUrl (e.g., http://localhost:5173), not from embedded files.

# Keep this running in a separate terminal (adjust for your setup)
npx vite --port 5173          # Vite
# npx next dev --port 3000    # Next.js
# npx webpack serve            # Webpack

To have your agent (Claude Code, Codex, etc) handle this automatically, add a note to your project's CLAUDE.md or AGENTS.md. Starting a dev server on a port that's already in use may prompt interactively and hang automated tools, so check first and only start if needed:

## WebDriver Automation
The Tauri debug binary loads the frontend from devUrl. Before launching
the app with tauri-wd, ensure these are running:
curl -s http://127.0.0.1:5173 > /dev/null 2>&1 || (cd apps/desktop && npx vite --port 5173 &)
curl -s http://127.0.0.1:4444/status > /dev/null 2>&1 || tauri-wd --port 4444 &

5. Run tests

# Terminal 1: Start the WebDriver server (supports concurrent sessions)
tauri-wd --port 4444

# Terminal 2: Run your tests
npx wdio run wdio.conf.mjs

Local Disk Cleanup

Rust build artifacts can take several GB in this repo. To clean local-only files:

bash scripts/clean.sh

For a deeper cleanup (also removes tests/wdio/node_modules and generated files in screenshots/):

bash scripts/clean.sh --deep

After cleanup, the next build is slower because Rust has to recompile from scratch.

MCP Integration

tauri-webdriver works with mcp-tauri-automation to enable AI-driven automation of Tauri apps via the Model Context Protocol. This lets AI agents (like Claude Code) launch, inspect, interact with, and screenshot your Tauri app through natural language.

Setup

1. Install the MCP server:

git clone https://github.com/danielraffel/mcp-tauri-automation.git
cd mcp-tauri-automation
npm install && npm run build

2. Add to Claude Code:

claude mcp add --transport stdio tauri-automation \
  --scope user \
  -- node /absolute/path/to/mcp-tauri-automation/dist/index.js

Optionally set a default app path so you don't have to specify it every time:

claude mcp add --transport stdio tauri-automation \
  --env TAURI_APP_PATH=/path/to/your-app/src-tauri/target/debug/your-app \
  --scope user \
  -- node /absolute/path/to/mcp-tauri-automation/dist/index.js

3. Start tauri-wd and use with Claude:

# Keep this running in a separate terminal
tauri-wd --port 4444

If your app uses a frontend dev server, make sure it's running first (see step 4 in Quick Start).

Then ask Claude: "Launch my Tauri app and take a screenshot"

Note: mcp-tauri-automation is a fork of Radek44/mcp-tauri-automation with additional tools (execute_script, get_page_title, get_page_url, multi-strategy selectors, configurable timeouts, wait_for_navigation). These additions have been submitted upstream. For cross-platform MCP support, see the original project.

Supported W3C WebDriver Operations

All operations follow the W3C WebDriver specification. See the full technical specification for detailed request/response formats and plugin API documentation.

Sessions

W3C Endpoint	Method	Description
`/status`	GET	Server readiness status
`/session`	POST	Create a new session with `tauri:options` capabilities
`/session/{id}`	DELETE	Delete session and terminate the app
`/session/{id}/timeouts`	GET	Get current timeout configuration
`/session/{id}/timeouts`	POST	Set implicit, page load, and script timeouts

Navigation

W3C Endpoint	Method	Description
`/session/{id}/url`	POST	Navigate to URL
`/session/{id}/url`	GET	Get current page URL
`/session/{id}/title`	GET	Get page title
`/session/{id}/source`	GET	Get full page HTML source
`/session/{id}/back`	POST	Navigate back in history
`/session/{id}/forward`	POST	Navigate forward in history
`/session/{id}/refresh`	POST	Refresh the current page

Windows

W3C Endpoint	Method	Description
`/session/{id}/window`	GET	Get current window handle
`/session/{id}/window`	POST	Switch to window by handle
`/session/{id}/window`	DELETE	Close current window
`/session/{id}/window/handles`	GET	Get all window handles
`/session/{id}/window/new`	POST	Create a new window
`/session/{id}/window/rect`	GET	Get window position and size
`/session/{id}/window/rect`	POST	Set window position and size
`/session/{id}/window/maximize`	POST	Maximize window
`/session/{id}/window/minimize`	POST	Minimize window
`/session/{id}/window/fullscreen`	POST	Make window fullscreen

Elements

W3C Endpoint	Method	Description
`/session/{id}/element`	POST	Find element (CSS, XPath, tag name, link text, partial link text)
`/session/{id}/elements`	POST	Find all matching elements
`/session/{id}/element/active`	GET	Get the currently focused element
`/session/{id}/element/{eid}/element`	POST	Find element scoped to a parent element
`/session/{id}/element/{eid}/elements`	POST	Find all elements scoped to a parent
`/session/{id}/element/{eid}/click`	POST	Click an element
`/session/{id}/element/{eid}/clear`	POST	Clear an input element
`/session/{id}/element/{eid}/value`	POST	Send keystrokes to an element (file paths for `<input type="file">`)
`/session/{id}/element/{eid}/text`	GET	Get element's visible text
`/session/{id}/element/{eid}/name`	GET	Get element's tag name
`/session/{id}/element/{eid}/attribute/{name}`	GET	Get an HTML attribute value
`/session/{id}/element/{eid}/property/{name}`	GET	Get a JavaScript property value
`/session/{id}/element/{eid}/css/{name}`	GET	Get a computed CSS property value
`/session/{id}/element/{eid}/rect`	GET	Get element's bounding rectangle
`/session/{id}/element/{eid}/enabled`	GET	Check if element is enabled
`/session/{id}/element/{eid}/selected`	GET	Check if element is selected
`/session/{id}/element/{eid}/displayed`	GET	Check if element is visible
`/session/{id}/element/{eid}/computedrole`	GET	Get computed ARIA role
`/session/{id}/element/{eid}/computedlabel`	GET	Get computed ARIA label

Shadow DOM

W3C Endpoint	Method	Description
`/session/{id}/element/{eid}/shadow`	GET	Get shadow root of a web component
`/session/{id}/shadow/{sid}/element`	POST	Find element inside a shadow root
`/session/{id}/shadow/{sid}/elements`	POST	Find all elements inside a shadow root

Frames

W3C Endpoint	Method	Description
`/session/{id}/frame`	POST	Switch to frame by index, element reference, or `null` for top
`/session/{id}/frame/parent`	POST	Switch to parent frame

Script Execution

W3C Endpoint	Method	Description
`/session/{id}/execute/sync`	POST	Execute synchronous JavaScript
`/session/{id}/execute/async`	POST	Execute asynchronous JavaScript with callback

Screenshots

W3C Endpoint	Method	Description
`/session/{id}/screenshot`	GET	Full page screenshot (base64 PNG)
`/session/{id}/element/{eid}/screenshot`	GET	Element screenshot (base64 PNG)

Cookies

W3C Endpoint	Method	Description
`/session/{id}/cookie`	GET	Get all cookies
`/session/{id}/cookie/{name}`	GET	Get a cookie by name
`/session/{id}/cookie`	POST	Add a cookie
`/session/{id}/cookie/{name}`	DELETE	Delete a cookie by name
`/session/{id}/cookie`	DELETE	Delete all cookies

Alerts

W3C Endpoint	Method	Description
`/session/{id}/alert/dismiss`	POST	Dismiss (cancel) the current dialog
`/session/{id}/alert/accept`	POST	Accept (OK) the current dialog
`/session/{id}/alert/text`	GET	Get the dialog message text
`/session/{id}/alert/text`	POST	Send text to a prompt dialog

Actions

W3C Endpoint	Method	Description
`/session/{id}/actions`	POST	Perform actions: key (keyDown/keyUp), pointer (move/down/up), wheel (scroll)
`/session/{id}/actions`	DELETE	Release all actions

Print

W3C Endpoint	Method	Description
`/session/{id}/print`	POST	Print page to PDF (base64-encoded)

Architecture

Two Rust crates work together in a simple 2-hop design:

WebDriverIO/Selenium ──HTTP:4444──> tauri-wd CLI ──HTTP:{dynamic}──> tauri-plugin-webdriver-automation
                                    (W3C WebDriver)                   (axum server in-app)

The plugin (tauri-plugin-webdriver-automation) runs inside your Tauri app in debug builds. On startup it binds an axum HTTP server to 127.0.0.1 on a random port and prints [webdriver] listening on port {N} to stdout. It injects a JavaScript bridge (init.js) into every webview that provides element finding, an async script callback mechanism, dialog interception, and an in-memory cookie store (needed because WKWebView doesn't support document.cookie on tauri:// URLs). All DOM interaction happens by evaluating JS in the webview and receiving results back via Tauri IPC.

The CLI (tauri-wd) is a standalone binary that implements the W3C WebDriver HTTP protocol on port 4444. When a test framework creates a session, the CLI launches your app binary, watches stdout for the port announcement, and then translates every W3C request into a plugin HTTP call. Elements are tracked as (selector, index, using) triples internally, mapped to W3C UUID strings for the session lifetime. Shadow DOM elements use a separate in-memory cache since document.querySelectorAll() can't reach into shadow roots. Frame/iframe context is managed by a stack that scopes JS evaluation to the correct contentDocument.

Session flow: Test client sends POST /session with tauri:options.binary pointing to your app. The CLI spawns the binary with TAURI_WEBVIEW_AUTOMATION=true, reads the plugin port from stdout, and returns a session ID. All subsequent W3C commands are forwarded to the plugin as JSON-over-HTTP POST requests. When the session is deleted, the app process is killed.

For the full internal API reference, see SPEC.md.

Alternatives

When we started building this, we were developing a macOS Tauri app and frustrated that there was no easy way to do automated e2e testing -- Apple doesn't provide a WebDriver for WKWebView. We ended up building this at roughly the same time others tackled the same problem. Probably not the greatest use of time in hindsight, but here we are.

tauri-plugin-webdriver -- Open-source Tauri plugin that embeds a WebDriver server directly in the plugin (single-crate architecture vs our two-crate approach). Supports macOS, Linux, and Windows -- if you need cross-platform WebDriver support, this is the more mature choice.
CrabNebula Cloud -- Commercial hosted testing service with macOS WebDriver support.

Additional info

Blog post with some additional background info.

License

MIT OR Apache-2.0 (dual-licensed, same as Tauri)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
crates		crates
scripts		scripts
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
SPEC.md		SPEC.md
TEST_PLAN.md		TEST_PLAN.md

Folders and files

Latest commit

History

Repository files navigation

tauri-webdriver

Table of Contents

The Problem

The Solution

Who Is This For?

Quick Start

1. Add the plugin to your Tauri app

2. Install the CLI

3. Configure WebDriverIO

4. Start your frontend dev server

5. Run tests

Local Disk Cleanup

MCP Integration

Setup

Supported W3C WebDriver Operations

Sessions

Navigation

Windows

Elements

Shadow DOM

Frames

Script Execution

Screenshots

Cookies

Alerts

Actions

Print

Architecture

Alternatives

Additional info

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages