Skip to content

syrin-labs/iris

Repository files navigation

Syrin Iris — eyes for your coding agent

Your AI agent writes the code. Iris tells it whether the code actually works — with evidence, not screenshots.

Iris in action — an AI agent verifying a real running app from the inside: pass/fail verdicts with evidence, the file:line to fix, and a regression caught before it shipped

npm downloads license types

Iris gives your coding agent a verdict, not just a view. The moment it finishes a change, Iris checks — from inside your real running app — that the right things actually happened: the API call returned 200, the modal opened, the route changed, the store updated, no console error slipped in. If something silently broke, it says what, why, and (on React) the exact file:line to fix.

TypeScript · Model Context Protocol · React-first · dev-only · localhost-only · no telemetry · Apache-2.0 SDK

⚡ Quickstart · ▶︎ Watch the demo · 📊 Benchmarks · 🤔 Iris vs Playwright? · 📚 Docs


⚡ Quickstart — give it to your agent

You don't set this up. Your agent does. Paste one line into Claude Code, Cursor, OpenCode, or any MCP agent:

Follow https://raw.githubusercontent.com/syrin-labs/iris/main/SKILL.md

That's the whole install. The skill detects whether Iris is already wired up — runs the setup wizard the first time, then verifies your app every time after. Prefer to do it yourself? npx @syrin/iris init registers the MCP server for every agent you have, or see the full install matrix ↓.


👀 What is this, really?

Modern coding agents are "effectively programming with a blindfold on." Iris takes the blindfold off — and instead of a blurry screenshot, it hands back a verdict with evidence.

🧑‍🎨 If you "vibe code" (and don't write tests)

Your agent says "done ✅", you open the browser, and… the button does nothing. Every time, you are the QA department.

Iris lets your agent check its own work — automatically, on every edit. It catches the broken thing before you ever see it, and tells the agent how to fix it. You just keep building.

🧪 If you're a testing expert

Iris is an in-process verification + deterministic regression layer for agent-built web apps. It asserts program truth — store/React state, network cardinality, emitted signals, console — not just the rendered DOM.

Recorded flows replay with no LLM → a CI gate that diffs the verdict exactly: 0% flake, ~175 tokens/run. It complements Playwright; it doesn't replace it.


🧠 How it works

Your running app already knows everything that just happened — in code. Iris exposes that to your agent over MCP as one tight loop:

flowchart LR
    A["🤖 Your AI agent<br/>(Claude Code, Cursor…)"] -->|"look · act · observe · assert"| B(("👁<br/>Iris"))
    B <-->|"structured events,<br/>not pixels"| C["🖥 Your real running app<br/>DOM · network · console<br/>store · React fiber"]
    B -->|"✅ / ❌ verdict + evidence<br/>+ file:line to fix"| A
    style B fill:#8b7bff,stroke:#5b4bd0,color:#fff
    style A fill:#15131f,stroke:#3a3550,color:#fff
    style C fill:#1c2433,stroke:#2f3d57,color:#fff
Loading

One call checks many things at once and comes back with proof — deterministic (structured events, not a vision model), cheap (any model, no screenshot), and pointed at the code:

// The agent clicked "Pay". Did the right things actually happen? One call, ~33 tokens, no screenshot:
iris_assert({
  predicate: { allOf: [
    { kind: "net",     method: "POST", urlContains: "/api/order", status: 200 },
    { kind: "element", query: { role: "dialog", name: "Order confirmed" }, state: "visible" },
    { kind: "signal",  name: "order:saved" },          // the charge actually committed
    { kind: "console", level: "error", absent: true }  // …and nothing errored
  ]}
})
// → { pass: false,
//     evidence: { net: { status: 500, url: "/api/order" } },
//     failureReason: "POST /api/order returned 500, expected 200",
//     source: { file: "src/checkout/PayButton.tsx", line: 42 } }   ❌ caught before you ever saw it

✅ What Iris catches that a screenshot (or a DOM tool) can't

A screenshot sees pixels. The DOM sees markup. Iris sees the program — so it catches the bugs that look fine on screen:

The bug Looks fine on screen? Iris catches it because it reads…
Pay button silently returns 500 ✅ looks fine the network response, tied to the click
A console error slipped in, UI still renders ✅ looks fine the console stream since the action
The form fired the request twice (double-submit) ✅ looks fine request cardinality (net { count: 1 })
The badge shows "12" but the store holds 0 (UI lies) ✅ looks fine the app's state, not the rendered number
A click corrupted unrelated data on another screen ✅ looks fine a state invariant (blast-radius)
The component re-renders 60×/sec with no visible change ✅ looks fine the React commit stream
"Deploy succeeded" but the deploy actually failed ✅ looks fine the store's real status

Most of these are impossible for any out-of-the-browser tool to detect — the truth never reaches the DOM.


🧰 Turn the test cases you never automated into checks the agent runs on every edit

Every team has acceptance criteria and "I just eyeball it" steps that never became tests. A test case maps almost 1:1 to an Iris check:

Your test case (plain English) Iris check
"Login with valid creds lands on the dashboard" net /api/login 200 and element tab "Dashboard" visible
"Deleting an item removes it from the list" element {text, scope: list} absent
"Submitting shows a success toast" text "Saved" visible
"Paying actually charges the customer" signal "order:saved" and net /api/charge 200
"Checkout fires exactly one charge" net /api/charge { count: 1 }
"No console errors on checkout" console level:error absent

Record a flow once; Iris replays it deterministically on every edit — your CI Playwright suite still gates releases, but Iris is the checklist your agent runs while it codes, including the long tail nobody ever automated.


📊 Honest benchmarks

We'd rather you hear the caveats from us than catch us hiding them. Every number below is produced by a committed harness — full detail and the cases where we lose in bench/SCORECARD.md.

1 · Cheap enough to run on every edit. Iris asks narrow questions instead of dumping the whole page:

Per verify step Tokens
Full accessibility-tree snapshot (e.g. Playwright MCP) ~7,300
Iris verify loop (query + observe + assert) ~100

→ a 20-step flow costs ~2,000 tokens with Iris vs ~146,000 with full-tree snapshots. (Honest version: force Iris to dump the whole tree too and the gap is only ~1.8× — the 73× is from not needing the whole tree. Full math →)

2 · The real moat — re-running a regression suite. A test's job is the same check, every commit. Iris replays with no model; a screenshot/DOM agent must re-drive the whole flow with the LLM every run:

Re-verify a known flow Cost / run Flake vs Iris
Iris deterministic replay ~175 tok 0%
Playwright/DevTools (LLM re-drive) ~30,000 tok sampled 128–184× more
A 4-flow suite (iris_flow_verify) ~47 tok (flat in K) 0% 2,574×

3 · Caught the most, in a real agent loop. A live gpt-4o tool-use loop over 5 broken-app scenarios (authoritative usage tokens, Layer B):

Bugs caught
Iris 5 / 5 most accurate
Playwright MCP 4 / 5
Chrome DevTools MCP 3 / 5

…and where Iris does not win (use the right tool)

Being inside the page costs real browser-level fidelity. These are genuine competitor strengths:

  • Pixel/paint regressions (fonts, paint order, GPU) → a screenshot is ground truth. Measured: a CSS filter that re-tinted 2.3% of pixels — a screenshot caught it; Iris's always-on read (computed style, not pixels) missed it.
  • Trusted native input, cross-browser (WebKit/Firefox), multi-tab / network mockingPlaywright.
  • A site you don't own / can't add a dependency to → Iris must embed a dev-only SDK; Playwright/DevTools test anything.
  • Visual / computed-style / theme bugsparity — any tool with a JS evaluate reads computed style; Iris is just more ergonomic.

🤔 When to use Iris — and when to reach for Playwright / DevTools

You are… Reach for Because
an agent building a React/Next app you own, verifying each edit Iris in-loop, ~100 tok/check, sees state + file:line, refuses destructive clicks
running a regression suite on every commit / in CI Iris deterministic replay: 0% flake, 128–2574× cheaper than re-driving with an LLM
chasing a bug whose truth is in state, not the DOM Iris desync, double-submit, side-effects, silent errors — no DOM tool sees these
testing a third-party site / many browsers / real input Playwright Iris can't instrument code you don't ship, or drive other engines
verifying true pixels (visual regression) Playwright (or Iris driven) a screenshot is the rendered frame; Iris's always-on read is computed-style
debugging protocol-level network/perf on any site DevTools DevTools MCP speaks raw CDP

Rule of thumb: own the app + an agent is building it → Iris is your cheap, deterministic, state-aware inner loop. Driving someone else's site, many engines, or true pixels → Playwright/DevTools. Plenty of teams use both.


📦 Install — the full options

Easiest — paste one prompt (recommended)
Follow https://raw.githubusercontent.com/syrin-labs/iris/main/SKILL.md

Setup wizard on first run, verification on every run after. Works with any MCP-capable agent.

Persistent skill — register once, type /iris forever

Claude Code

curl --create-dirs -o .claude/skills/iris.md \
  https://raw.githubusercontent.com/syrin-labs/iris/main/SKILL.md

OpenCode

opencode skill add https://raw.githubusercontent.com/syrin-labs/iris/main/SKILL.md

Then type /iris — setup on first use, test the app on every use after.

Manual — install + wire the MCP server yourself

1. Install (one package re-exports the whole graph — SDK, React adapter, source-mapping plugins, spec runner):

npm i -D @syrin/iris        # or pnpm / yarn / bun

2. Register the MCP server with your agent — npx @syrin/iris is the server:

// Claude Code — .mcp.json
{ "mcpServers": { "iris": { "command": "npx", "args": ["@syrin/iris"] } } }

3. Connect the dev-only SDK from your app's entry point (the SDK is tree-shaken out of production):

// main.tsx / your dev entry — dev only
import { iris } from '@syrin/iris';
if (import.meta.env.DEV) iris.connect({ session: 'my-app' });
// React? add `import { install } from "@syrin/iris"; install()` before connect for component → file:line.

4. Tell your agent to verify. Full walkthrough → Getting Started.


📚 Learn more

🔩 What's inside

A pnpm + turbo monorepo. One umbrella package (@syrin/iris) re-exports everything:

Package Role
@syrin/iris-protocol the wire contract (zod schemas, constants)
@syrin/iris-browser the dev-only instrumentation SDK (DOM/network/console/state observers)
@syrin/iris-server the bridge + MCP server + the iris CLI
@syrin/iris-react React adapter — DOM ref → component → source file:line
@syrin/iris-babel-plugin / @syrin/iris-next stamp source coordinates (React 19 / Next.js)

🔒 Status & safety

Iris is dev-only and localhost-only by design — the SDK is tree-shaken out of production builds, the bridge binds to localhost, and there is no telemetry. It observes your app on your machine; nothing leaves it.

License

Iris uses a per-package license model so it is safe to embed in your own app and fair to build a business on. Each package's LICENSE file is authoritative; see the root LICENSE for the full breakdown.

  • Embedded in your app — Apache-2.0. @syrin/iris-browser, -protocol, -react, -babel-plugin, -next, -vite-plugin, -eslint-plugin run inside / compile into your application. Use them anywhere, including in the apps you ship to your own customers. No copyleft; explicit patent grant.
  • Server / CLI / MCP — FSL-1.1-ALv2. @syrin/iris-server (and @syrin/iris-test, the @syrin/iris umbrella) are free for any use except offering Iris itself as a competing hosted service; each release converts to Apache-2.0 after two years.
  • Enterprise features — the Iris Enterprise License. Source-available under packages/server/src/ee/; free for development and evaluation, a subscription license key is required in production.

OEM, embedding, or commercial licensing questions: hey@syrin.ai

© 2026 Syrin Labs

About

Your AI writes the code. Iris tells it whether the code actually works - with evidence, not screenshots.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors