From 34a9ea4bd764cbbbc2c0aa8734011ff159e1a7cd Mon Sep 17 00:00:00 2001 From: Srinidhi G S Date: Wed, 8 Apr 2026 20:45:52 +0530 Subject: [PATCH 01/80] WIP --- docs/cloud-runs.md | 269 ++++++++++++++++++++++++++++++++ package-lock.json | 21 +++ packages/cli/bin/finalrun.ts | 33 ++++ packages/cli/package.json | 2 + packages/cli/src/cloudRunner.ts | 134 ++++++++++++++++ 5 files changed, 459 insertions(+) create mode 100644 docs/cloud-runs.md create mode 100644 packages/cli/src/cloudRunner.ts diff --git a/docs/cloud-runs.md b/docs/cloud-runs.md new file mode 100644 index 0000000..4e2e76a --- /dev/null +++ b/docs/cloud-runs.md @@ -0,0 +1,269 @@ +# FinalRun Cloud Runs + +## Overview + +FinalRun supports running tests on cloud-based devices in addition to local execution. Cloud runs follow the same CLI-first architecture — tests are authored locally, triggered via CLI, and executed remotely on cloud devices. + +Cloud runs are **immutable and reproducible**. Re-running requires executing the CLI command again. There is no way to re-trigger a run from the UI. + +--- + +## Requirements + +### Run Visibility & Tracking +- Display list of all runs with status (`queued`, `running`, `completed`) +- Show real-time updates for currently running executions +- Include metadata: start time, duration, trigger source (CLI command) + +### Progress Tracking +- Show overall progress (e.g. 6/10 tests passed, 60%) +- Display per-test execution status (`pending`, `running`, `completed`) +- Update progress in real-time as tests execute + +### Test Case Details +- List all test cases within a run in execution order +- Show status, duration, and file path for each test +- Allow expanding a test case to view steps, logs, and artifacts + +### Step-Level Visibility +- Display all steps within a test case in execution order +- Show step-level status +- Highlight the exact step where failure occurred + +### Artifacts +- Capture and display screenshots for each step +- Record and provide video playback for each test case +- Artifacts are mapped to their steps and tests for debugging + +### Suite & Structure Representation +- Show logical structure of suites and their test cases +- Preserve execution order defined in the suite file +- Allow users to understand grouping of test cases + +--- + +## Data Model + +### `runs` + +Represents a single CLI-triggered upload and its full execution lifecycle. + +``` +runs +---- +id (PK) +org_id (FK → orgs) +fr_user_id (FK → users) +command -- e.g. "finalrun cloud tests/auth/" +zip_url -- S3 path of uploaded zip +status -- parsing | queued | running | completed +total_tests -- set after parsing +completed_tests -- incremented as tests finish +created_at +started_at +completed_at +updated_at +``` + +**Notes:** +- Each run is immutable — re-run requires triggering CLI again +- `zip_url` is the source of truth for the uploaded project snapshot +- `total_tests` and `completed_tests` are denormalized for fast progress queries + +--- + +### `run_nodes` + +Represents every suite and test case within a run. Self-referencing to support the suite → tests hierarchy. + +``` +run_nodes +--------- +id (PK) +run_id (FK → runs) +parent_id (FK → run_nodes) -- null = top-level node +type -- 'suite' | 'test' +name -- from YAML name field +file_path -- e.g. tests/auth/login_test.yaml +order_index -- execution order, relative to siblings +status -- queued | running | completed | skipped +error_message -- failure reason +total_tests -- suites only +completed_tests -- suites only +video_url -- tests only +content JSONB -- full parsed YAML, stored at upload time +started_at +completed_at +updated_at +``` + +**Notes:** +- Created server-side after extracting the zip and parsing YAML specs — full tree is always visible before execution starts +- `order_index` is relative to siblings under the same `parent_id` +- `path` stores the full position in the tree (e.g. `001.002`) for correct flat ordering without recursive queries +- `content JSONB` is populated server-side by parsing specs from the uploaded zip — the AI runner reads steps directly from here, no ZIP download needed at execution time +- `total_tests` and `completed_tests` are only meaningful for `type='suite'` +- `video_url` is only populated for `type='test'` + +--- + +### `run_steps` + +Represents actual AI actions taken during test execution. Not 1:1 with YAML steps — the AI decides how many actions to take per YAML instruction. + +``` +run_steps +--------- +id (PK) +run_node_id (FK → run_nodes where type='test') +step_index -- global order within the test execution +yaml_step_index -- maps back to the YAML step array position +description -- what the AI decided to do +screenshot_url +logs +status -- passed | failed +executed_at +``` + +**Notes:** +- Created dynamically as the AI executes — not pre-populated from YAML +- `yaml_step_index` maps AI actions back to the originating YAML instruction so the UI can group them +- Multiple `run_steps` rows may share the same `yaml_step_index` (one YAML instruction → many AI actions) + +--- + +## CLI Command + +### Command + +``` +finalrun cloud [selectors...] +``` + +### Arguments + +| Argument | Description | +|---|---| +| `selectors` | Workspace-relative YAML files, directories, or globs under `.finalrun/tests/` | + +### Options + +| Option | Description | +|---|---| +| `--env ` | Environment name | +| `--platform ` | Target platform (`android` or `ios`) | + +### Examples + +```bash +# Run a suite file +finalrun cloud tests/regression_suite.yaml + +# Run a single test +finalrun cloud tests/auth/login_test.yaml + +# Run all tests in a directory (recursive) +finalrun cloud tests/auth/ + +# Run multiple selectors +finalrun cloud tests/auth/ tests/checkout/payment_test.yaml + +# Run with glob +finalrun cloud "tests/**/*.yaml" +``` + +### Implementation + +The `cloud` command is added to `packages/cli/bin/finalrun.ts` as a top-level command, consistent with `test` and `check`. Business logic lives in a new `packages/cli/src/cloudRunner.ts`, keeping `finalrun.ts` as a thin entry point. + +--- + +## File Type Inference + +All test and suite files live under `.finalrun/tests/`. There is no dedicated suites directory — a suite file is simply a YAML file with a `tests:` key. + +The CLI determines file type by parsing the YAML: + +| Key present | Type | +|---|---| +| `steps:` | Test | +| `tests:` | Suite | + +This works regardless of where the file lives within `.finalrun/tests/`, including nested subdirectories. + +### Suite Constraints (v1) + +- Suites are flat — the `tests:` array contains only plain string test file paths +- Suites referencing other suites are not supported in v1 +- The schema is designed to support nested suites in a future version without breaking changes — a `- suite: path` entry syntax can be added to the `tests:` array alongside plain string test paths + +--- + +## Upload Flow + +When `finalrun cloud` is executed, the CLI performs the following steps: + +### Step 1 — Resolve Workspace + +Locate the `.finalrun` directory by walking up from the current working directory. This logic is built into the compiled CLI binary — the `finalrun-agent` source code does not need to be present. The CLI can be run from any project that has a `.finalrun/` directory, as long as the `finalrun` binary is installed globally via npm. + +### Step 2 — Collect and Parse Files + +Use the existing `selectSpecFiles()` to collect all YAML files matching the provided selectors, with full support for: +- Single file paths +- Directories (recursive) +- Globs +- Comma-separated selectors + +Parse each collected file to determine its type (`steps:` = test, `tests:` = suite). + +For each suite file, load the test files referenced in its `tests:` array. + +### Step 3 — ZIP and Upload + +Create a ZIP of the `.finalrun/` directory, **excluding env files** (`.finalrun/env/*.yaml`). Env files contain mappings to local secret environment variables and are never relevant to cloud execution. + +Upload the ZIP to S3 and store the resulting `zip_url`. + +### Step 4 — POST Payload to API + +Send `{command, zip_url}` to the FinalRun API. The server extracts the zip, parses the YAML specs using the CLI's spec parser, builds the node tree (suites and tests), validates secrets against the org's cloud configuration, and creates the `runs` row and all `run_nodes` rows in a single transaction. + +### Step 5 — Print Run URL + +Output the cloud run URL to the terminal so the user can track progress. + +``` +✔ Run created: https://app.finalrun.io/runs/abc123 +``` + +--- + +## Upload Payload + +```json +{ + "command": "finalrun cloud tests/regression_suite.yaml", + "zip_url": "s3://finalrun/uploads/abc123.zip" +} +``` + +**Key points:** + +- The payload is minimal — only the CLI command and a reference to the uploaded zip +- The cloud server extracts the zip, parses the YAML specs, and builds the node tree (suites, tests, hierarchy) server-side using the same CLI spec parser +- `command` contains the selectors so the server knows which tests to run from the zip +- Secret validation happens server-side after parsing — the server checks all `${secrets.*}` references against the org's configured secrets +- Env files are excluded from the ZIP and never sent to the cloud + +--- + +## Secrets Handling + +Secrets are pre-configured in the FinalRun cloud dashboard, scoped to the org. + +- YAML files reference secrets as `${secrets.key}` placeholders +- These placeholders are preserved as-is in `content JSONB` on `run_nodes` +- At execution time, the cloud runner resolves placeholders from the org's stored secrets — the same way the local CLI resolves them from environment variables +- Env files (`.finalrun/env/*.yaml`) are never uploaded — they are only relevant locally for mapping secret keys to env var names +- If a test references a secret that has not been configured in the cloud dashboard, the upload is aborted at Step 4 with a clear error listing the missing secrets diff --git a/package-lock.json b/package-lock.json index dec72bf..a8d4241 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1561,6 +1561,16 @@ "tslib": "^2.8.0" } }, + "node_modules/@types/adm-zip": { + "version": "0.5.8", + "resolved": "https://registry.npmjs.org/@types/adm-zip/-/adm-zip-0.5.8.tgz", + "integrity": "sha512-RVVH7QvZYbN+ihqZ4kX/dMiowf6o+Jk1fNwiSdx0NahBJLU787zkULhGhJM8mf/obmLGmgdMM0bXsQTmyfbR7Q==", + "dev": true, + "license": "MIT", + "dependencies": { + "@types/node": "*" + } + }, "node_modules/@types/esrecurse": { "version": "4.3.1", "resolved": "https://registry.npmjs.org/@types/esrecurse/-/esrecurse-4.3.1.tgz", @@ -1883,6 +1893,15 @@ "acorn": "^6.0.0 || ^7.0.0 || ^8.0.0" } }, + "node_modules/adm-zip": { + "version": "0.5.17", + "resolved": "https://registry.npmjs.org/adm-zip/-/adm-zip-0.5.17.tgz", + "integrity": "sha512-+Ut8d9LLqwEvHHJl1+PIHqoyDxFgVN847JTVM3Izi3xHDWPE4UtzzXysMZQs64DMcrJfBeS/uoEP4AD3HQHnQQ==", + "license": "MIT", + "engines": { + "node": ">=12.0" + } + }, "node_modules/agent-base": { "version": "7.1.4", "resolved": "https://registry.npmjs.org/agent-base/-/agent-base-7.1.4.tgz", @@ -3967,6 +3986,7 @@ "@finalrun/goal-executor": "*", "@grpc/grpc-js": "^1.12.0", "@grpc/proto-loader": "^0.7.0", + "adm-zip": "^0.5.17", "ai": "^6.0.134", "chalk": "^5.4.0", "commander": "^13.1.0", @@ -3982,6 +4002,7 @@ "finalrun-agent": "dist/bin/finalrun.js" }, "devDependencies": { + "@types/adm-zip": "^0.5.8", "tsx": "^4.19.0", "typescript": "^5.7.0" }, diff --git a/packages/cli/bin/finalrun.ts b/packages/cli/bin/finalrun.ts index 99919d9..97cb4d8 100644 --- a/packages/cli/bin/finalrun.ts +++ b/packages/cli/bin/finalrun.ts @@ -17,6 +17,7 @@ import { startOrReuseWorkspaceReportServer, } from '../src/reportServerManager.js'; import { normalizeSpecSelectors, TEST_SELECTION_REQUIRED_ERROR } from '../src/testSelection.js'; +import { runCloud } from '../src/cloudRunner.js'; import { PreExecutionFailureError, runTests } from '../src/testRunner.js'; import { formatRunIndexForConsole, loadRunIndex } from '../src/runIndex.js'; import { serveReportWorkspace } from '../src/reportServer.js'; @@ -162,6 +163,31 @@ program }); }); +program + .command('cloud') + .description('Run tests on FinalRun cloud devices') + .option('--env ', 'Environment name') + .option('--platform ', 'Target platform (android or ios)') + .option('--app ', 'Optional app override (.apk or .app)') + .option('--suite ', 'Suite manifest under .finalrun/suites') + .argument('[selectors...]', 'Workspace-relative YAML files, directories, or globs under .finalrun/tests') + .action(async (selectors: string[] | undefined, options: CloudCommandOptions) => { + await runCommand(async () => { + Logger.init({ level: LogLevel.INFO, resetSinks: true }); + const normalizedSelectors = normalizeSpecSelectors(selectors); + if (normalizedSelectors.length === 0 && !options.suite) { + throw new Error(TEST_SELECTION_REQUIRED_ERROR); + } + await runCloud({ + selectors: normalizedSelectors, + suitePath: options.suite, + envName: options.env, + platform: options.platform, + appPath: options.app, + }); + }); + }); + program .command('start-server') .description('Start or reuse the local FinalRun report server for this workspace') @@ -235,6 +261,13 @@ interface CheckCommandOptions { suite?: string; } +interface CloudCommandOptions { + env?: string; + platform?: string; + app?: string; + suite?: string; +} + interface DoctorCommandOptions { platform?: string; } diff --git a/packages/cli/package.json b/packages/cli/package.json index 51d58e3..f2e5c3b 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -80,6 +80,7 @@ "@finalrun/goal-executor": "*", "@grpc/grpc-js": "^1.12.0", "@grpc/proto-loader": "^0.7.0", + "adm-zip": "^0.5.17", "ai": "^6.0.134", "chalk": "^5.4.0", "commander": "^13.1.0", @@ -94,6 +95,7 @@ "node": ">=20.0.0" }, "devDependencies": { + "@types/adm-zip": "^0.5.8", "tsx": "^4.19.0", "typescript": "^5.7.0" } diff --git a/packages/cli/src/cloudRunner.ts b/packages/cli/src/cloudRunner.ts new file mode 100644 index 0000000..a7f6e2f --- /dev/null +++ b/packages/cli/src/cloudRunner.ts @@ -0,0 +1,134 @@ +import * as fs from 'node:fs'; +import * as os from 'node:os'; +import * as path from 'node:path'; +import AdmZip from 'adm-zip'; +import { Logger } from '@finalrun/common'; +import { runCheck } from './checkRunner.js'; + +const FINALRUN_CLOUD_URL = process.env['FINALRUN_CLOUD_URL'] || 'https://cloud.finalrun.io'; + +export interface CloudRunnerOptions { + selectors: string[]; + suitePath?: string; + envName?: string; + platform?: string; + appPath?: string; +} + +export async function runCloud(options: CloudRunnerOptions): Promise { + Logger.i('Preparing cloud run...'); + + // 1. Validate specs locally (fast fail before upload) + const checked = await runCheck({ + selectors: options.selectors, + suitePath: options.suitePath, + envName: options.envName, + platform: options.platform, + requireSelection: true, + }); + + // 2. Collect resolved file paths + const filesToZip: Array<{ absolutePath: string; relativePath: string }> = []; + + // Add suite file if present + if (checked.suite) { + filesToZip.push({ + absolutePath: checked.suite.sourcePath, + relativePath: path.join('suites', checked.suite.relativePath), + }); + } + + // Add test files + for (const spec of checked.specs) { + filesToZip.push({ + absolutePath: spec.sourcePath, + relativePath: path.join('tests', spec.relativePath), + }); + } + + // Add config.yaml if present + const configPath = path.join(process.cwd(), '.finalrun', 'config.yaml'); + if (fs.existsSync(configPath)) { + filesToZip.push({ + absolutePath: configPath, + relativePath: 'config.yaml', + }); + } + + // Add env files if present + const envDir = path.join(process.cwd(), '.finalrun', 'env'); + if (fs.existsSync(envDir)) { + const envFiles = fs.readdirSync(envDir).filter((f) => f.endsWith('.yaml') || f.endsWith('.yml')); + for (const envFile of envFiles) { + filesToZip.push({ + absolutePath: path.join(envDir, envFile), + relativePath: path.join('env', envFile), + }); + } + } + + // 3. Create zip with only selected files + Logger.i(`Zipping ${filesToZip.length} file(s)...`); + const zip = new AdmZip(); + for (const file of filesToZip) { + const dir = path.dirname(file.relativePath); + zip.addLocalFile(file.absolutePath, dir); + } + + const zipPath = path.join(os.tmpdir(), `finalrun-cloud-${Date.now()}.zip`); + zip.writeZip(zipPath); + + try { + // 4. Upload to cloud service + const command = `finalrun cloud ${options.selectors.join(' ')}`; + Logger.i(`Submitting ${checked.specs.length} test(s) to cloud...`); + + const formData = new FormData(); + const zipBuffer = fs.readFileSync(zipPath); + formData.append('file', new Blob([zipBuffer]), 'specs.zip'); + formData.append('command', command); + formData.append('selectors', JSON.stringify(options.selectors)); + if (options.suitePath) { + formData.append('suitePath', options.suitePath); + } + if (options.envName) { + formData.append('envName', options.envName); + } + if (options.platform) { + formData.append('platform', options.platform); + } + if (options.appPath) { + const appBuffer = fs.readFileSync(options.appPath); + const appFileName = path.basename(options.appPath); + formData.append('appFile', new Blob([appBuffer]), appFileName); + Logger.i(`Uploading app: ${appFileName}`); + } + + const url = `${FINALRUN_CLOUD_URL}/api/v1/execute`; + const response = await fetch(url, { + method: 'POST', + body: formData, + }); + + if (!response.ok) { + const body = await response.text(); + throw new Error(`Cloud service returned ${response.status}: ${body}`); + } + + const result = (await response.json()) as { success: boolean; results?: unknown[] }; + if (result.success) { + console.log(`\n\x1b[32m✔ Cloud run completed successfully\x1b[0m`); + } else { + console.log(`\n\x1b[31m✖ Cloud run failed\x1b[0m`); + } + + console.log(JSON.stringify(result, null, 2)); + } finally { + // Clean up temp zip + try { + fs.unlinkSync(zipPath); + } catch { + // ignore cleanup errors + } + } +} From cfc0db36968e85e40a38b6821acb4bb674a3950f Mon Sep 17 00:00:00 2001 From: Srinidhi G S Date: Sat, 11 Apr 2026 01:25:22 +0530 Subject: [PATCH 02/80] WIP --- packages/cli/bin/finalrun.ts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/packages/cli/bin/finalrun.ts b/packages/cli/bin/finalrun.ts index 0a44277..7d146e1 100644 --- a/packages/cli/bin/finalrun.ts +++ b/packages/cli/bin/finalrun.ts @@ -17,7 +17,7 @@ import { startOrReuseWorkspaceReportServer, stopWorkspaceReportServer, } from '../src/reportServerManager.js'; -import { normalizeSpecSelectors, TEST_SELECTION_REQUIRED_ERROR } from '../src/testSelection.js'; +import { normalizeTestSelectors, TEST_SELECTION_REQUIRED_ERROR } from '../src/testSelection.js'; import { runCloud } from '../src/cloudRunner.js'; import { PreExecutionFailureError, runTests } from '../src/testRunner.js'; import { formatRunIndexForConsole, loadRunIndex } from '../src/runIndex.js'; @@ -182,7 +182,7 @@ program .action(async (selectors: string[] | undefined, options: CloudCommandOptions) => { await runCommand(async () => { Logger.init({ level: LogLevel.INFO, resetSinks: true }); - const normalizedSelectors = normalizeSpecSelectors(selectors); + const normalizedSelectors = normalizeTestSelectors(selectors); if (normalizedSelectors.length === 0 && !options.suite) { throw new Error(TEST_SELECTION_REQUIRED_ERROR); } From a734846ab99efd812645ab129069d210f91d856d Mon Sep 17 00:00:00 2001 From: Srinidhi G S Date: Sat, 11 Apr 2026 01:26:11 +0530 Subject: [PATCH 03/80] adding cloud command and other minor changes --- packages/cli/src/cloudRunner.ts | 116 +++++++++++++++++++++++++++++--- 1 file changed, 107 insertions(+), 9 deletions(-) diff --git a/packages/cli/src/cloudRunner.ts b/packages/cli/src/cloudRunner.ts index a7f6e2f..b9a8a01 100644 --- a/packages/cli/src/cloudRunner.ts +++ b/packages/cli/src/cloudRunner.ts @@ -31,7 +31,7 @@ export async function runCloud(options: CloudRunnerOptions): Promise { const filesToZip: Array<{ absolutePath: string; relativePath: string }> = []; // Add suite file if present - if (checked.suite) { + if (checked.suite?.sourcePath && checked.suite.relativePath) { filesToZip.push({ absolutePath: checked.suite.sourcePath, relativePath: path.join('suites', checked.suite.relativePath), @@ -39,7 +39,8 @@ export async function runCloud(options: CloudRunnerOptions): Promise { } // Add test files - for (const spec of checked.specs) { + for (const spec of checked.tests) { + if (!spec.sourcePath || !spec.relativePath) continue; filesToZip.push({ absolutePath: spec.sourcePath, relativePath: path.join('tests', spec.relativePath), @@ -81,7 +82,7 @@ export async function runCloud(options: CloudRunnerOptions): Promise { try { // 4. Upload to cloud service const command = `finalrun cloud ${options.selectors.join(' ')}`; - Logger.i(`Submitting ${checked.specs.length} test(s) to cloud...`); + Logger.i(`Submitting ${checked.tests.length} test(s) to cloud...`); const formData = new FormData(); const zipBuffer = fs.readFileSync(zipPath); @@ -115,14 +116,16 @@ export async function runCloud(options: CloudRunnerOptions): Promise { throw new Error(`Cloud service returned ${response.status}: ${body}`); } - const result = (await response.json()) as { success: boolean; results?: unknown[] }; - if (result.success) { - console.log(`\n\x1b[32m✔ Cloud run completed successfully\x1b[0m`); - } else { - console.log(`\n\x1b[31m✖ Cloud run failed\x1b[0m`); + const result = (await response.json()) as { success: boolean; runId?: string; error?: string }; + if (!result.success || !result.runId) { + console.log(`\n\x1b[31m✖ Cloud submission failed\x1b[0m`); + console.log(JSON.stringify(result, null, 2)); + return; } - console.log(JSON.stringify(result, null, 2)); + Logger.i(`Run submitted: ${result.runId}`); + Logger.i(`Polling status...\n`); + await pollRunUntilFinished(result.runId); } finally { // Clean up temp zip try { @@ -132,3 +135,98 @@ export async function runCloud(options: CloudRunnerOptions): Promise { } } } + +interface RunDetailsResponse { + success: boolean; + run?: { + id: string; + status: string; + totalTests: number; + completedTests: number; + }; + nodes?: Array<{ + id: string; + type: string; + name: string; + status: string; + errorMessage?: string | null; + videoUrl?: string | null; + }>; +} + +async function pollRunUntilFinished(runId: string): Promise { + const url = `${FINALRUN_CLOUD_URL}/api/v1/runs/${runId}`; + const POLL_INTERVAL_MS = 5_000; + const MAX_WAIT_MS = 30 * 60 * 1000; // 30 minutes + const start = Date.now(); + let lastStatus = ''; + const seenNodeStatus = new Map(); + + while (Date.now() - start < MAX_WAIT_MS) { + let body: RunDetailsResponse; + try { + const res = await fetch(url); + if (!res.ok) { + Logger.w(`poll HTTP ${res.status}, retrying...`); + await sleep(POLL_INTERVAL_MS); + continue; + } + body = (await res.json()) as RunDetailsResponse; + } catch (e) { + Logger.w(`poll failed: ${e instanceof Error ? e.message : String(e)}`); + await sleep(POLL_INTERVAL_MS); + continue; + } + + if (!body.success || !body.run) { + Logger.w(`run not found, retrying...`); + await sleep(POLL_INTERVAL_MS); + continue; + } + + const { run, nodes = [] } = body; + + if (run.status !== lastStatus) { + Logger.i(`run status: ${run.status} (${run.completedTests}/${run.totalTests} tests)`); + lastStatus = run.status; + } + + // Print transitions for each test node + for (const node of nodes) { + if (node.type !== 'test') continue; + const previous = seenNodeStatus.get(node.id); + if (previous !== node.status) { + const icon = statusIcon(node.status); + const suffix = node.errorMessage ? ` — ${node.errorMessage}` : ''; + Logger.i(` ${icon} ${node.name}: ${node.status}${suffix}`); + seenNodeStatus.set(node.id, node.status); + } + } + + if (run.status === 'completed' || run.status === 'failed' || run.status === 'aborted') { + const passed = nodes.filter((n) => n.type === 'test' && n.status === 'completed').length; + const total = nodes.filter((n) => n.type === 'test').length; + const colour = run.status === 'completed' ? '\x1b[32m' : '\x1b[31m'; + console.log(`\n${colour}Run ${run.status}: ${passed}/${total} passed\x1b[0m`); + return; + } + + await sleep(POLL_INTERVAL_MS); + } + + Logger.w(`run did not finish within ${MAX_WAIT_MS / 1000}s — check the cloud server`); +} + +function statusIcon(status: string): string { + switch (status) { + case 'completed': return '✔'; + case 'failed': return '✖'; + case 'running': return '◉'; + case 'queued': return '○'; + default: return '·'; + } +} + +function sleep(ms: number): Promise { + return new Promise((resolve) => setTimeout(resolve, ms)); +} From ac4e4cefaf609037e0d924e84257e91f3038fbd0 Mon Sep 17 00:00:00 2001 From: Srinidhi G S Date: Sat, 11 Apr 2026 23:08:27 +0530 Subject: [PATCH 04/80] WIP - cloud suite and cloud test --- packages/cli/bin/finalrun.ts | 39 ++++++++++++---- packages/cli/src/cloudRunner.ts | 81 +++++++++++++++++++++++++++------ 2 files changed, 97 insertions(+), 23 deletions(-) diff --git a/packages/cli/bin/finalrun.ts b/packages/cli/bin/finalrun.ts index 38566c2..3cdf28f 100644 --- a/packages/cli/bin/finalrun.ts +++ b/packages/cli/bin/finalrun.ts @@ -170,24 +170,44 @@ program }); }); -program +const cloud = program .command('cloud') - .description('Run tests on FinalRun cloud devices') - .option('--env ', 'Environment name') + .description('Run tests on FinalRun cloud devices'); + +cloud + .command('test [selectors...]') + .description('Run repo-local FinalRun YAML tests from .finalrun/tests on cloud devices') + .option('--env ', 'Environment name (for example dev or staging)') .option('--platform ', 'Target platform (android or ios)') - .option('--app ', 'Optional app override (.apk or .app)') - .option('--suite ', 'Suite manifest under .finalrun/suites') - .argument('[selectors...]', 'Workspace-relative YAML files, directories, or globs under .finalrun/tests') + .requiredOption('--app ', 'Path to the .apk or .app to install on the cloud device') .action(async (selectors: string[] | undefined, options: CloudCommandOptions) => { await runCommand(async () => { Logger.init({ level: LogLevel.INFO, resetSinks: true }); const normalizedSelectors = normalizeTestSelectors(selectors); - if (normalizedSelectors.length === 0 && !options.suite) { + if (normalizedSelectors.length === 0) { throw new Error(TEST_SELECTION_REQUIRED_ERROR); } await runCloud({ selectors: normalizedSelectors, - suitePath: options.suite, + envName: options.env, + platform: options.platform, + appPath: options.app, + }); + }); + }); + +cloud + .command('suite ') + .description('Run a FinalRun suite manifest from .finalrun/suites on cloud devices') + .option('--env ', 'Environment name (for example dev or staging)') + .option('--platform ', 'Target platform (android or ios)') + .requiredOption('--app ', 'Path to the .apk or .app to install on the cloud device') + .action(async (suitePath: string, options: CloudCommandOptions) => { + await runCommand(async () => { + Logger.init({ level: LogLevel.INFO, resetSinks: true }); + await runCloud({ + selectors: [], + suitePath: suitePath.trim(), envName: options.env, platform: options.platform, appPath: options.app, @@ -278,8 +298,7 @@ interface CheckCommandOptions extends CommonCommandOptions { interface CloudCommandOptions { env?: string; platform?: string; - app?: string; - suite?: string; + app: string; } interface DoctorCommandOptions { diff --git a/packages/cli/src/cloudRunner.ts b/packages/cli/src/cloudRunner.ts index b9a8a01..17f2831 100644 --- a/packages/cli/src/cloudRunner.ts +++ b/packages/cli/src/cloudRunner.ts @@ -12,7 +12,7 @@ export interface CloudRunnerOptions { suitePath?: string; envName?: string; platform?: string; - appPath?: string; + appPath: string; } export async function runCloud(options: CloudRunnerOptions): Promise { @@ -81,14 +81,43 @@ export async function runCloud(options: CloudRunnerOptions): Promise { try { // 4. Upload to cloud service - const command = `finalrun cloud ${options.selectors.join(' ')}`; - Logger.i(`Submitting ${checked.tests.length} test(s) to cloud...`); + // Capture the raw CLI invocation, exactly as the user typed it (minus the + // node binary path). process.argv = [node, finalrun(.ts), ...userArgs]. + const command = `finalrun ${process.argv.slice(2).join(' ')}`; + + // Display name: suite name for suite runs, test name for single-test runs, + // " + N more" for multi-test runs, null otherwise. + let runName: string | null = null; + if (options.suitePath) { + runName = checked.suite?.name ?? path.basename(options.suitePath, path.extname(options.suitePath)); + } else if (checked.tests.length === 1) { + runName = checked.tests[0]?.name ?? null; + } else if (checked.tests.length > 1) { + const first = checked.tests[0]?.name ?? path.basename(checked.tests[0]?.relativePath ?? ''); + const remaining = checked.tests.length - 1; + runName = `${first} + ${remaining} more`; + } + + // Run type classification — based on user intent (selectors), not the + // expansion result. The server falls back to its own classification if + // this field is omitted. + const runType: 'folder' | 'single_test' | 'multi_test' | 'suite' = options.suitePath + ? 'suite' + : options.selectors.length === 0 + ? 'folder' + : options.selectors.length === 1 + ? checked.tests.length === 1 ? 'single_test' : 'folder' + : 'multi_test'; const formData = new FormData(); const zipBuffer = fs.readFileSync(zipPath); formData.append('file', new Blob([zipBuffer]), 'specs.zip'); formData.append('command', command); formData.append('selectors', JSON.stringify(options.selectors)); + formData.append('runType', runType); + if (runName) { + formData.append('name', runName); + } if (options.suitePath) { formData.append('suitePath', options.suitePath); } @@ -98,23 +127,42 @@ export async function runCloud(options: CloudRunnerOptions): Promise { if (options.platform) { formData.append('platform', options.platform); } - if (options.appPath) { - const appBuffer = fs.readFileSync(options.appPath); - const appFileName = path.basename(options.appPath); - formData.append('appFile', new Blob([appBuffer]), appFileName); - Logger.i(`Uploading app: ${appFileName}`); - } + + const appBuffer = fs.readFileSync(options.appPath); + const appFileName = path.basename(options.appPath); + const appSize = appBuffer.byteLength; + formData.append('appFile', new Blob([appBuffer]), appFileName); + formData.append('appFilename', appFileName); + + const submissionLabel = options.suitePath + ? `suite ${path.basename(options.suitePath)} (${checked.tests.length} test(s))` + : `${checked.tests.length} test(s)`; + const uploadStart = Date.now(); + const { default: ora } = await import('ora'); + const spinner = ora( + `Uploading ${appFileName} (${formatBytes(appSize)}) and submitting ${submissionLabel}...`, + ).start(); const url = `${FINALRUN_CLOUD_URL}/api/v1/execute`; - const response = await fetch(url, { - method: 'POST', - body: formData, - }); + let response: Response; + try { + response = await fetch(url, { + method: 'POST', + body: formData, + }); + } catch (e) { + const elapsed = ((Date.now() - uploadStart) / 1000).toFixed(1); + spinner.fail(`Upload failed after ${elapsed}s`); + throw e; + } + const elapsed = ((Date.now() - uploadStart) / 1000).toFixed(1); if (!response.ok) { + spinner.fail(`Upload failed after ${elapsed}s (HTTP ${response.status})`); const body = await response.text(); throw new Error(`Cloud service returned ${response.status}: ${body}`); } + spinner.succeed(`Uploaded ${formatBytes(appSize)} in ${elapsed}s`); const result = (await response.json()) as { success: boolean; runId?: string; error?: string }; if (!result.success || !result.runId) { @@ -230,3 +278,10 @@ function statusIcon(status: string): string { function sleep(ms: number): Promise { return new Promise((resolve) => setTimeout(resolve, ms)); } + +function formatBytes(bytes: number): string { + if (bytes < 1024) return `${bytes} B`; + if (bytes < 1024 ** 2) return `${(bytes / 1024).toFixed(1)} KB`; + if (bytes < 1024 ** 3) return `${(bytes / 1024 ** 2).toFixed(1)} MB`; + return `${(bytes / 1024 ** 3).toFixed(2)} GB`; +} From 1d11e6cf4bbd7ede18fc07631cf74e1c8039dcab Mon Sep 17 00:00:00 2001 From: srinidhi-lwt Date: Mon, 13 Apr 2026 23:31:36 +0530 Subject: [PATCH 05/80] Cloud CLI: fire-and-forget submission, runType/name/appFilename metadata MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - CLI exits immediately after submission with run ID + status URL - Hard-checks for HTTP 201 from server (fails on anything else) - Sends runType (single_test/multi_test/suite), name, appFilename in FormData - Captures full process.argv as the command string - Removes pollRunUntilFinished, RunDetailsResponse, statusIcon, sleep - Drops 'folder' runType — collapses to 'multi_test' Co-Authored-By: Claude Opus 4.6 (1M context) --- packages/cli/src/cloudRunner.ts | 130 +++++--------------------------- 1 file changed, 19 insertions(+), 111 deletions(-) diff --git a/packages/cli/src/cloudRunner.ts b/packages/cli/src/cloudRunner.ts index 17f2831..839192f 100644 --- a/packages/cli/src/cloudRunner.ts +++ b/packages/cli/src/cloudRunner.ts @@ -98,16 +98,13 @@ export async function runCloud(options: CloudRunnerOptions): Promise { runName = `${first} + ${remaining} more`; } - // Run type classification — based on user intent (selectors), not the - // expansion result. The server falls back to its own classification if - // this field is omitted. - const runType: 'folder' | 'single_test' | 'multi_test' | 'suite' = options.suitePath + // Run type classification. The server falls back to its own classification + // if this field is omitted. + const runType: 'single_test' | 'multi_test' | 'suite' = options.suitePath ? 'suite' - : options.selectors.length === 0 - ? 'folder' - : options.selectors.length === 1 - ? checked.tests.length === 1 ? 'single_test' : 'folder' - : 'multi_test'; + : checked.tests.length === 1 + ? 'single_test' + : 'multi_test'; const formData = new FormData(); const zipBuffer = fs.readFileSync(zipPath); @@ -157,7 +154,9 @@ export async function runCloud(options: CloudRunnerOptions): Promise { } const elapsed = ((Date.now() - uploadStart) / 1000).toFixed(1); - if (!response.ok) { + // The server returns 201 Created on successful submission. Anything else + // is an error — surface the body and exit non-zero. + if (response.status !== 201) { spinner.fail(`Upload failed after ${elapsed}s (HTTP ${response.status})`); const body = await response.text(); throw new Error(`Cloud service returned ${response.status}: ${body}`); @@ -166,14 +165,18 @@ export async function runCloud(options: CloudRunnerOptions): Promise { const result = (await response.json()) as { success: boolean; runId?: string; error?: string }; if (!result.success || !result.runId) { - console.log(`\n\x1b[31m✖ Cloud submission failed\x1b[0m`); - console.log(JSON.stringify(result, null, 2)); - return; + throw new Error( + `Cloud submission failed: ${result.error ?? JSON.stringify(result)}`, + ); } - Logger.i(`Run submitted: ${result.runId}`); - Logger.i(`Polling status...\n`); - await pollRunUntilFinished(result.runId); + // Fire-and-forget: the run is now queued. Print the polling URL and exit. + // The user can curl the status URL to track progress; the CLI does not + // wait for the run to finish. + console.log(`\n\x1b[32m✓ Run submitted\x1b[0m`); + console.log(` Run ID: ${result.runId}`); + console.log(` Status URL: ${FINALRUN_CLOUD_URL}/api/v1/runs/${result.runId}`); + console.log(`\n The run is now queued. Use the status URL above to track progress.`); } finally { // Clean up temp zip try { @@ -184,101 +187,6 @@ export async function runCloud(options: CloudRunnerOptions): Promise { } } -interface RunDetailsResponse { - success: boolean; - run?: { - id: string; - status: string; - totalTests: number; - completedTests: number; - }; - nodes?: Array<{ - id: string; - type: string; - name: string; - status: string; - errorMessage?: string | null; - videoUrl?: string | null; - }>; -} - -async function pollRunUntilFinished(runId: string): Promise { - const url = `${FINALRUN_CLOUD_URL}/api/v1/runs/${runId}`; - const POLL_INTERVAL_MS = 5_000; - const MAX_WAIT_MS = 30 * 60 * 1000; // 30 minutes - const start = Date.now(); - let lastStatus = ''; - const seenNodeStatus = new Map(); - - while (Date.now() - start < MAX_WAIT_MS) { - let body: RunDetailsResponse; - try { - const res = await fetch(url); - if (!res.ok) { - Logger.w(`poll HTTP ${res.status}, retrying...`); - await sleep(POLL_INTERVAL_MS); - continue; - } - body = (await res.json()) as RunDetailsResponse; - } catch (e) { - Logger.w(`poll failed: ${e instanceof Error ? e.message : String(e)}`); - await sleep(POLL_INTERVAL_MS); - continue; - } - - if (!body.success || !body.run) { - Logger.w(`run not found, retrying...`); - await sleep(POLL_INTERVAL_MS); - continue; - } - - const { run, nodes = [] } = body; - - if (run.status !== lastStatus) { - Logger.i(`run status: ${run.status} (${run.completedTests}/${run.totalTests} tests)`); - lastStatus = run.status; - } - - // Print transitions for each test node - for (const node of nodes) { - if (node.type !== 'test') continue; - const previous = seenNodeStatus.get(node.id); - if (previous !== node.status) { - const icon = statusIcon(node.status); - const suffix = node.errorMessage ? ` — ${node.errorMessage}` : ''; - Logger.i(` ${icon} ${node.name}: ${node.status}${suffix}`); - seenNodeStatus.set(node.id, node.status); - } - } - - if (run.status === 'completed' || run.status === 'failed' || run.status === 'aborted') { - const passed = nodes.filter((n) => n.type === 'test' && n.status === 'completed').length; - const total = nodes.filter((n) => n.type === 'test').length; - const colour = run.status === 'completed' ? '\x1b[32m' : '\x1b[31m'; - console.log(`\n${colour}Run ${run.status}: ${passed}/${total} passed\x1b[0m`); - return; - } - - await sleep(POLL_INTERVAL_MS); - } - - Logger.w(`run did not finish within ${MAX_WAIT_MS / 1000}s — check the cloud server`); -} - -function statusIcon(status: string): string { - switch (status) { - case 'completed': return '✔'; - case 'failed': return '✖'; - case 'running': return '◉'; - case 'queued': return '○'; - default: return '·'; - } -} - -function sleep(ms: number): Promise { - return new Promise((resolve) => setTimeout(resolve, ms)); -} - function formatBytes(bytes: number): string { if (bytes < 1024) return `${bytes} B`; if (bytes < 1024 ** 2) return `${(bytes / 1024).toFixed(1)} KB`; From 8d415d249b7785605a643bd8113f9b4dbb7f8748 Mon Sep 17 00:00:00 2001 From: srinidhi-lwt Date: Tue, 14 Apr 2026 19:02:13 +0530 Subject: [PATCH 06/80] app upload and resuing the app --- packages/cli/bin/finalrun.ts | 19 ++++- packages/cli/src/cloudRunner.ts | 140 ++++++++++++++++++++++++++++---- 2 files changed, 137 insertions(+), 22 deletions(-) diff --git a/packages/cli/bin/finalrun.ts b/packages/cli/bin/finalrun.ts index 3cdf28f..c5c26f6 100644 --- a/packages/cli/bin/finalrun.ts +++ b/packages/cli/bin/finalrun.ts @@ -18,7 +18,7 @@ import { stopWorkspaceReportServer, } from '../src/reportServerManager.js'; import { normalizeTestSelectors, TEST_SELECTION_REQUIRED_ERROR } from '../src/testSelection.js'; -import { runCloud } from '../src/cloudRunner.js'; +import { runCloud, uploadApp } from '../src/cloudRunner.js'; import { PreExecutionFailureError, runTests, type TestRunnerResult } from '../src/testRunner.js'; import { formatRunIndexForConsole, loadRunIndex } from '../src/runIndex.js'; import { serveReportWorkspace } from '../src/reportServer.js'; @@ -179,7 +179,7 @@ cloud .description('Run repo-local FinalRun YAML tests from .finalrun/tests on cloud devices') .option('--env ', 'Environment name (for example dev or staging)') .option('--platform ', 'Target platform (android or ios)') - .requiredOption('--app ', 'Path to the .apk or .app to install on the cloud device') + .option('--app ', 'Path to the .apk or .app to install (omit to use the latest uploaded app)') .action(async (selectors: string[] | undefined, options: CloudCommandOptions) => { await runCommand(async () => { Logger.init({ level: LogLevel.INFO, resetSinks: true }); @@ -201,7 +201,7 @@ cloud .description('Run a FinalRun suite manifest from .finalrun/suites on cloud devices') .option('--env ', 'Environment name (for example dev or staging)') .option('--platform ', 'Target platform (android or ios)') - .requiredOption('--app ', 'Path to the .apk or .app to install on the cloud device') + .option('--app ', 'Path to the .apk or .app to install (omit to use the latest uploaded app)') .action(async (suitePath: string, options: CloudCommandOptions) => { await runCommand(async () => { Logger.init({ level: LogLevel.INFO, resetSinks: true }); @@ -215,6 +215,17 @@ cloud }); }); +cloud + .command('upload') + .description('Upload an app binary to FinalRun cloud for use in subsequent test runs') + .requiredOption('--app ', 'Path to the .apk or .app to upload') + .action(async (options: { app: string }) => { + await runCommand(async () => { + Logger.init({ level: LogLevel.INFO, resetSinks: true }); + await uploadApp(options.app); + }); + }); + program .command('start-server') .description('Start or reuse the local FinalRun report server for a workspace') @@ -298,7 +309,7 @@ interface CheckCommandOptions extends CommonCommandOptions { interface CloudCommandOptions { env?: string; platform?: string; - app: string; + app?: string; } interface DoctorCommandOptions { diff --git a/packages/cli/src/cloudRunner.ts b/packages/cli/src/cloudRunner.ts index 839192f..a2a86bd 100644 --- a/packages/cli/src/cloudRunner.ts +++ b/packages/cli/src/cloudRunner.ts @@ -12,7 +12,25 @@ export interface CloudRunnerOptions { suitePath?: string; envName?: string; platform?: string; - appPath: string; + appPath?: string; +} + +interface AppUploadEntry { + id: string; + filename: string; + createdAt: string; +} + +async function fetchLatestAppUpload(): Promise { + const res = await fetch(`${FINALRUN_CLOUD_URL}/api/v1/app_uploads`); + if (!res.ok) { + throw new Error(`Failed to fetch app uploads (HTTP ${res.status})`); + } + const data = (await res.json()) as { success: boolean; appUploads: AppUploadEntry[] }; + if (!data.success || !data.appUploads || data.appUploads.length === 0) { + throw new Error('No app uploaded yet. Use --app to upload one.'); + } + return data.appUploads[0]!; } export async function runCloud(options: CloudRunnerOptions): Promise { @@ -27,7 +45,22 @@ export async function runCloud(options: CloudRunnerOptions): Promise { requireSelection: true, }); - // 2. Collect resolved file paths + // 2. Resolve app — either from --app flag or latest upload + let appMode: { type: 'file'; path: string } | { type: 'existing'; upload: AppUploadEntry }; + + if (options.appPath) { + if (!fs.existsSync(options.appPath)) { + throw new Error(`App file not found: ${options.appPath}`); + } + appMode = { type: 'file', path: options.appPath }; + } else { + const latest = await fetchLatestAppUpload(); + const uploadTime = new Date(latest.createdAt).toLocaleString(undefined, { timeZoneName: 'short' }); + console.log(`\n Using app: \x1b[36m${latest.filename}\x1b[0m (uploaded ${uploadTime})\n`); + appMode = { type: 'existing', upload: latest }; + } + + // 3. Collect resolved file paths const filesToZip: Array<{ absolutePath: string; relativePath: string }> = []; // Add suite file if present @@ -68,7 +101,7 @@ export async function runCloud(options: CloudRunnerOptions): Promise { } } - // 3. Create zip with only selected files + // 4. Create zip with only selected files Logger.i(`Zipping ${filesToZip.length} file(s)...`); const zip = new AdmZip(); for (const file of filesToZip) { @@ -80,7 +113,7 @@ export async function runCloud(options: CloudRunnerOptions): Promise { zip.writeZip(zipPath); try { - // 4. Upload to cloud service + // 5. Upload to cloud service // Capture the raw CLI invocation, exactly as the user typed it (minus the // node binary path). process.argv = [node, finalrun(.ts), ...userArgs]. const command = `finalrun ${process.argv.slice(2).join(' ')}`; @@ -125,20 +158,31 @@ export async function runCloud(options: CloudRunnerOptions): Promise { formData.append('platform', options.platform); } - const appBuffer = fs.readFileSync(options.appPath); - const appFileName = path.basename(options.appPath); - const appSize = appBuffer.byteLength; - formData.append('appFile', new Blob([appBuffer]), appFileName); - formData.append('appFilename', appFileName); + // Attach app binary or reference existing upload + let spinnerMessage: string; + if (appMode.type === 'file') { + const appBuffer = fs.readFileSync(appMode.path); + const appFileName = path.basename(appMode.path); + const appSize = appBuffer.byteLength; + formData.append('appFile', new Blob([appBuffer]), appFileName); + formData.append('appFilename', appFileName); + + const submissionLabel = options.suitePath + ? `suite ${path.basename(options.suitePath)} (${checked.tests.length} test(s))` + : `${checked.tests.length} test(s)`; + spinnerMessage = `Uploading ${appFileName} (${formatBytes(appSize)}) and submitting ${submissionLabel}...`; + } else { + formData.append('appUploadId', appMode.upload.id); + + const submissionLabel = options.suitePath + ? `suite ${path.basename(options.suitePath)} (${checked.tests.length} test(s))` + : `${checked.tests.length} test(s)`; + spinnerMessage = `Submitting ${submissionLabel} with ${appMode.upload.filename}...`; + } - const submissionLabel = options.suitePath - ? `suite ${path.basename(options.suitePath)} (${checked.tests.length} test(s))` - : `${checked.tests.length} test(s)`; const uploadStart = Date.now(); const { default: ora } = await import('ora'); - const spinner = ora( - `Uploading ${appFileName} (${formatBytes(appSize)}) and submitting ${submissionLabel}...`, - ).start(); + const spinner = ora(spinnerMessage).start(); const url = `${FINALRUN_CLOUD_URL}/api/v1/execute`; let response: Response; @@ -157,11 +201,17 @@ export async function runCloud(options: CloudRunnerOptions): Promise { // The server returns 201 Created on successful submission. Anything else // is an error — surface the body and exit non-zero. if (response.status !== 201) { - spinner.fail(`Upload failed after ${elapsed}s (HTTP ${response.status})`); + spinner.fail(`Submission failed after ${elapsed}s (HTTP ${response.status})`); const body = await response.text(); throw new Error(`Cloud service returned ${response.status}: ${body}`); } - spinner.succeed(`Uploaded ${formatBytes(appSize)} in ${elapsed}s`); + + if (appMode.type === 'file') { + const appSize = fs.statSync(appMode.path).size; + spinner.succeed(`Uploaded ${formatBytes(appSize)} in ${elapsed}s`); + } else { + spinner.succeed(`Submitted in ${elapsed}s`); + } const result = (await response.json()) as { success: boolean; runId?: string; error?: string }; if (!result.success || !result.runId) { @@ -175,8 +225,14 @@ export async function runCloud(options: CloudRunnerOptions): Promise { // wait for the run to finish. console.log(`\n\x1b[32m✓ Run submitted\x1b[0m`); console.log(` Run ID: ${result.runId}`); - console.log(` Status URL: ${FINALRUN_CLOUD_URL}/api/v1/runs/${result.runId}`); + console.log(` Status URL: ${FINALRUN_CLOUD_URL}/runs/${result.runId}`); console.log(`\n The run is now queued. Use the status URL above to track progress.`); + + if (appMode.type === 'file') { + const appFileName = path.basename(appMode.path); + console.log(`\n \x1b[33mTip:\x1b[0m You don't need to upload the app every time. Without --app,`); + console.log(` FinalRun uses your latest uploaded app (${appFileName}).`); + } } finally { // Clean up temp zip try { @@ -187,6 +243,54 @@ export async function runCloud(options: CloudRunnerOptions): Promise { } } +export async function uploadApp(appPath: string): Promise { + if (!fs.existsSync(appPath)) { + throw new Error(`App file not found: ${appPath}`); + } + + const appBuffer = fs.readFileSync(appPath); + const appFileName = path.basename(appPath); + const appSize = appBuffer.byteLength; + + const { default: ora } = await import('ora'); + const spinner = ora(`Uploading ${appFileName} (${formatBytes(appSize)})...`).start(); + const uploadStart = Date.now(); + + const formData = new FormData(); + formData.append('appFile', new Blob([appBuffer]), appFileName); + + let response: Response; + try { + response = await fetch(`${FINALRUN_CLOUD_URL}/api/v1/app_uploads`, { + method: 'POST', + body: formData, + }); + } catch (e) { + const elapsed = ((Date.now() - uploadStart) / 1000).toFixed(1); + spinner.fail(`Upload failed after ${elapsed}s`); + throw e; + } + + const elapsed = ((Date.now() - uploadStart) / 1000).toFixed(1); + if (response.status !== 201) { + spinner.fail(`Upload failed after ${elapsed}s (HTTP ${response.status})`); + const body = await response.text(); + throw new Error(`Cloud service returned ${response.status}: ${body}`); + } + + spinner.succeed(`Uploaded ${appFileName} (${formatBytes(appSize)}) in ${elapsed}s`); + + const result = (await response.json()) as { success: boolean; appUpload?: { id: string }; error?: string }; + if (!result.success || !result.appUpload) { + throw new Error(`Upload failed: ${result.error ?? JSON.stringify(result)}`); + } + + console.log(`\n \x1b[32m✓ App uploaded\x1b[0m`); + console.log(` App ID: ${result.appUpload.id}`); + console.log(` Filename: ${appFileName}`); + console.log(`\n This app will be used automatically when you run tests without --app.`); +} + function formatBytes(bytes: number): string { if (bytes < 1024) return `${bytes} B`; if (bytes < 1024 ** 2) return `${(bytes / 1024).toFixed(1)} KB`; From a491d32ee4daabc9736b14e01a945566da2f5317 Mon Sep 17 00:00:00 2001 From: srinidhi-lwt Date: Tue, 14 Apr 2026 19:04:42 +0530 Subject: [PATCH 07/80] removed MD file --- docs/cloud-runs.md | 269 --------------------------------------------- 1 file changed, 269 deletions(-) delete mode 100644 docs/cloud-runs.md diff --git a/docs/cloud-runs.md b/docs/cloud-runs.md deleted file mode 100644 index 4e2e76a..0000000 --- a/docs/cloud-runs.md +++ /dev/null @@ -1,269 +0,0 @@ -# FinalRun Cloud Runs - -## Overview - -FinalRun supports running tests on cloud-based devices in addition to local execution. Cloud runs follow the same CLI-first architecture — tests are authored locally, triggered via CLI, and executed remotely on cloud devices. - -Cloud runs are **immutable and reproducible**. Re-running requires executing the CLI command again. There is no way to re-trigger a run from the UI. - ---- - -## Requirements - -### Run Visibility & Tracking -- Display list of all runs with status (`queued`, `running`, `completed`) -- Show real-time updates for currently running executions -- Include metadata: start time, duration, trigger source (CLI command) - -### Progress Tracking -- Show overall progress (e.g. 6/10 tests passed, 60%) -- Display per-test execution status (`pending`, `running`, `completed`) -- Update progress in real-time as tests execute - -### Test Case Details -- List all test cases within a run in execution order -- Show status, duration, and file path for each test -- Allow expanding a test case to view steps, logs, and artifacts - -### Step-Level Visibility -- Display all steps within a test case in execution order -- Show step-level status -- Highlight the exact step where failure occurred - -### Artifacts -- Capture and display screenshots for each step -- Record and provide video playback for each test case -- Artifacts are mapped to their steps and tests for debugging - -### Suite & Structure Representation -- Show logical structure of suites and their test cases -- Preserve execution order defined in the suite file -- Allow users to understand grouping of test cases - ---- - -## Data Model - -### `runs` - -Represents a single CLI-triggered upload and its full execution lifecycle. - -``` -runs ----- -id (PK) -org_id (FK → orgs) -fr_user_id (FK → users) -command -- e.g. "finalrun cloud tests/auth/" -zip_url -- S3 path of uploaded zip -status -- parsing | queued | running | completed -total_tests -- set after parsing -completed_tests -- incremented as tests finish -created_at -started_at -completed_at -updated_at -``` - -**Notes:** -- Each run is immutable — re-run requires triggering CLI again -- `zip_url` is the source of truth for the uploaded project snapshot -- `total_tests` and `completed_tests` are denormalized for fast progress queries - ---- - -### `run_nodes` - -Represents every suite and test case within a run. Self-referencing to support the suite → tests hierarchy. - -``` -run_nodes ---------- -id (PK) -run_id (FK → runs) -parent_id (FK → run_nodes) -- null = top-level node -type -- 'suite' | 'test' -name -- from YAML name field -file_path -- e.g. tests/auth/login_test.yaml -order_index -- execution order, relative to siblings -status -- queued | running | completed | skipped -error_message -- failure reason -total_tests -- suites only -completed_tests -- suites only -video_url -- tests only -content JSONB -- full parsed YAML, stored at upload time -started_at -completed_at -updated_at -``` - -**Notes:** -- Created server-side after extracting the zip and parsing YAML specs — full tree is always visible before execution starts -- `order_index` is relative to siblings under the same `parent_id` -- `path` stores the full position in the tree (e.g. `001.002`) for correct flat ordering without recursive queries -- `content JSONB` is populated server-side by parsing specs from the uploaded zip — the AI runner reads steps directly from here, no ZIP download needed at execution time -- `total_tests` and `completed_tests` are only meaningful for `type='suite'` -- `video_url` is only populated for `type='test'` - ---- - -### `run_steps` - -Represents actual AI actions taken during test execution. Not 1:1 with YAML steps — the AI decides how many actions to take per YAML instruction. - -``` -run_steps ---------- -id (PK) -run_node_id (FK → run_nodes where type='test') -step_index -- global order within the test execution -yaml_step_index -- maps back to the YAML step array position -description -- what the AI decided to do -screenshot_url -logs -status -- passed | failed -executed_at -``` - -**Notes:** -- Created dynamically as the AI executes — not pre-populated from YAML -- `yaml_step_index` maps AI actions back to the originating YAML instruction so the UI can group them -- Multiple `run_steps` rows may share the same `yaml_step_index` (one YAML instruction → many AI actions) - ---- - -## CLI Command - -### Command - -``` -finalrun cloud [selectors...] -``` - -### Arguments - -| Argument | Description | -|---|---| -| `selectors` | Workspace-relative YAML files, directories, or globs under `.finalrun/tests/` | - -### Options - -| Option | Description | -|---|---| -| `--env ` | Environment name | -| `--platform ` | Target platform (`android` or `ios`) | - -### Examples - -```bash -# Run a suite file -finalrun cloud tests/regression_suite.yaml - -# Run a single test -finalrun cloud tests/auth/login_test.yaml - -# Run all tests in a directory (recursive) -finalrun cloud tests/auth/ - -# Run multiple selectors -finalrun cloud tests/auth/ tests/checkout/payment_test.yaml - -# Run with glob -finalrun cloud "tests/**/*.yaml" -``` - -### Implementation - -The `cloud` command is added to `packages/cli/bin/finalrun.ts` as a top-level command, consistent with `test` and `check`. Business logic lives in a new `packages/cli/src/cloudRunner.ts`, keeping `finalrun.ts` as a thin entry point. - ---- - -## File Type Inference - -All test and suite files live under `.finalrun/tests/`. There is no dedicated suites directory — a suite file is simply a YAML file with a `tests:` key. - -The CLI determines file type by parsing the YAML: - -| Key present | Type | -|---|---| -| `steps:` | Test | -| `tests:` | Suite | - -This works regardless of where the file lives within `.finalrun/tests/`, including nested subdirectories. - -### Suite Constraints (v1) - -- Suites are flat — the `tests:` array contains only plain string test file paths -- Suites referencing other suites are not supported in v1 -- The schema is designed to support nested suites in a future version without breaking changes — a `- suite: path` entry syntax can be added to the `tests:` array alongside plain string test paths - ---- - -## Upload Flow - -When `finalrun cloud` is executed, the CLI performs the following steps: - -### Step 1 — Resolve Workspace - -Locate the `.finalrun` directory by walking up from the current working directory. This logic is built into the compiled CLI binary — the `finalrun-agent` source code does not need to be present. The CLI can be run from any project that has a `.finalrun/` directory, as long as the `finalrun` binary is installed globally via npm. - -### Step 2 — Collect and Parse Files - -Use the existing `selectSpecFiles()` to collect all YAML files matching the provided selectors, with full support for: -- Single file paths -- Directories (recursive) -- Globs -- Comma-separated selectors - -Parse each collected file to determine its type (`steps:` = test, `tests:` = suite). - -For each suite file, load the test files referenced in its `tests:` array. - -### Step 3 — ZIP and Upload - -Create a ZIP of the `.finalrun/` directory, **excluding env files** (`.finalrun/env/*.yaml`). Env files contain mappings to local secret environment variables and are never relevant to cloud execution. - -Upload the ZIP to S3 and store the resulting `zip_url`. - -### Step 4 — POST Payload to API - -Send `{command, zip_url}` to the FinalRun API. The server extracts the zip, parses the YAML specs using the CLI's spec parser, builds the node tree (suites and tests), validates secrets against the org's cloud configuration, and creates the `runs` row and all `run_nodes` rows in a single transaction. - -### Step 5 — Print Run URL - -Output the cloud run URL to the terminal so the user can track progress. - -``` -✔ Run created: https://app.finalrun.io/runs/abc123 -``` - ---- - -## Upload Payload - -```json -{ - "command": "finalrun cloud tests/regression_suite.yaml", - "zip_url": "s3://finalrun/uploads/abc123.zip" -} -``` - -**Key points:** - -- The payload is minimal — only the CLI command and a reference to the uploaded zip -- The cloud server extracts the zip, parses the YAML specs, and builds the node tree (suites, tests, hierarchy) server-side using the same CLI spec parser -- `command` contains the selectors so the server knows which tests to run from the zip -- Secret validation happens server-side after parsing — the server checks all `${secrets.*}` references against the org's configured secrets -- Env files are excluded from the ZIP and never sent to the cloud - ---- - -## Secrets Handling - -Secrets are pre-configured in the FinalRun cloud dashboard, scoped to the org. - -- YAML files reference secrets as `${secrets.key}` placeholders -- These placeholders are preserved as-is in `content JSONB` on `run_nodes` -- At execution time, the cloud runner resolves placeholders from the org's stored secrets — the same way the local CLI resolves them from environment variables -- Env files (`.finalrun/env/*.yaml`) are never uploaded — they are only relevant locally for mapping secret keys to env var names -- If a test references a secret that has not been configured in the cloud dashboard, the upload is aborted at Step 4 with a clear error listing the missing secrets From 63c6bd55d38cce0c5a41bbcfc649307fc96f7b2e Mon Sep 17 00:00:00 2001 From: srinidhi-lwt Date: Thu, 16 Apr 2026 23:23:02 +0530 Subject: [PATCH 08/80] auth KEY in headers --- packages/cli/src/cloudRunner.ts | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/packages/cli/src/cloudRunner.ts b/packages/cli/src/cloudRunner.ts index a2a86bd..b411669 100644 --- a/packages/cli/src/cloudRunner.ts +++ b/packages/cli/src/cloudRunner.ts @@ -6,6 +6,17 @@ import { Logger } from '@finalrun/common'; import { runCheck } from './checkRunner.js'; const FINALRUN_CLOUD_URL = process.env['FINALRUN_CLOUD_URL'] || 'https://cloud.finalrun.io'; +const FINALRUN_API_KEY = process.env['FINALRUN_API_KEY'] || ''; + +function getAuthHeaders(): Record { + if (!FINALRUN_API_KEY) { + throw new Error( + 'FINALRUN_API_KEY is not set. Get your API key from the FinalRun Cloud dashboard and set it:\n' + + ' export FINALRUN_API_KEY=fr_your_key_here', + ); + } + return { Authorization: `Bearer ${FINALRUN_API_KEY}` }; +} export interface CloudRunnerOptions { selectors: string[]; @@ -22,7 +33,9 @@ interface AppUploadEntry { } async function fetchLatestAppUpload(): Promise { - const res = await fetch(`${FINALRUN_CLOUD_URL}/api/v1/app_uploads`); + const res = await fetch(`${FINALRUN_CLOUD_URL}/api/v1/app_uploads`, { + headers: getAuthHeaders(), + }); if (!res.ok) { throw new Error(`Failed to fetch app uploads (HTTP ${res.status})`); } @@ -189,6 +202,7 @@ export async function runCloud(options: CloudRunnerOptions): Promise { try { response = await fetch(url, { method: 'POST', + headers: getAuthHeaders(), body: formData, }); } catch (e) { @@ -263,6 +277,7 @@ export async function uploadApp(appPath: string): Promise { try { response = await fetch(`${FINALRUN_CLOUD_URL}/api/v1/app_uploads`, { method: 'POST', + headers: getAuthHeaders(), body: formData, }); } catch (e) { From 56b158d723ade21b59568de482b592817c1f8a7c Mon Sep 17 00:00:00 2001 From: srinidhi-lwt Date: Fri, 17 Apr 2026 15:18:44 +0530 Subject: [PATCH 09/80] app parser and app inspector for app validation --- package-lock.json | 310 ++++++++++++++++++++ packages/cli/package.json | 2 + packages/cli/src/appInspector.ts | 96 ++++++ packages/cli/src/cloudRunner.ts | 56 ++++ packages/cli/src/types/app-info-parser.d.ts | 20 ++ 5 files changed, 484 insertions(+) create mode 100644 packages/cli/src/appInspector.ts create mode 100644 packages/cli/src/types/app-info-parser.d.ts diff --git a/package-lock.json b/package-lock.json index 4272b27..6dd60c1 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1860,6 +1860,15 @@ "node": ">= 20" } }, + "node_modules/@xmldom/xmldom": { + "version": "0.8.12", + "resolved": "https://registry.npmjs.org/@xmldom/xmldom/-/xmldom-0.8.12.tgz", + "integrity": "sha512-9k/gHF6n/pAi/9tqr3m3aqkuiNosYTurLLUtc7xQ9sxB/wm7WPygCv8GYa6mS0fLJEHhqMC1ATYhz++U/lRHqg==", + "license": "MIT", + "engines": { + "node": ">=10.0.0" + } + }, "node_modules/abbrev": { "version": "3.0.1", "resolved": "https://registry.npmjs.org/abbrev/-/abbrev-3.0.1.tgz", @@ -1974,6 +1983,32 @@ "url": "https://github.com/chalk/ansi-styles?sponsor=1" } }, + "node_modules/app-info-parser": { + "version": "1.1.6", + "resolved": "https://registry.npmjs.org/app-info-parser/-/app-info-parser-1.1.6.tgz", + "integrity": "sha512-ZAFCM0bN88cbpsMoRhL/JfdX3b+nb5iBEXcu30xABvbaqtw6tXfHujDnuKSpNmA3P0uwpkIxTV/Wun5HfEch8A==", + "license": "MIT", + "dependencies": { + "bplist-parser": "^0.2.0", + "bytebuffer": "^5.0.1", + "cgbi-to-png": "^1.0.7", + "commander": "^7.2.0", + "isomorphic-unzip": "^1.1.5", + "plist": "^3.0.1" + }, + "bin": { + "app-info-parser": "bin/index.js" + } + }, + "node_modules/app-info-parser/node_modules/commander": { + "version": "7.2.0", + "resolved": "https://registry.npmjs.org/commander/-/commander-7.2.0.tgz", + "integrity": "sha512-QrWXB+ZQSVPmIWIhtEO9H+gwHaMGYiF5ChvoJ+K9ZGHG/sVsa6yiesAD1GC/x46sET00Xlwo1u49RVVVzvcSkw==", + "license": "MIT", + "engines": { + "node": ">= 10" + } + }, "node_modules/balanced-match": { "version": "4.0.4", "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-4.0.4.tgz", @@ -1984,6 +2019,26 @@ "node": "18 || 20 || >=22" } }, + "node_modules/base64-js": { + "version": "1.5.1", + "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", + "integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, "node_modules/baseline-browser-mapping": { "version": "2.10.12", "resolved": "https://registry.npmjs.org/baseline-browser-mapping/-/baseline-browser-mapping-2.10.12.tgz", @@ -1996,6 +2051,36 @@ "node": ">=6.0.0" } }, + "node_modules/big-integer": { + "version": "1.6.52", + "resolved": "https://registry.npmjs.org/big-integer/-/big-integer-1.6.52.tgz", + "integrity": "sha512-QxD8cf2eVqJOOz63z6JIN9BzvVs/dlySa5HGSBH5xtR8dPteIRQnBxxKqkNTiT6jbDTF6jAfrd4oMcND9RGbQg==", + "license": "Unlicense", + "engines": { + "node": ">=0.6" + } + }, + "node_modules/bplist-creator": { + "version": "0.1.0", + "resolved": "https://registry.npmjs.org/bplist-creator/-/bplist-creator-0.1.0.tgz", + "integrity": "sha512-sXaHZicyEEmY86WyueLTQesbeoH/mquvarJaQNbjuOQO+7gbFcDEWqKmcWA4cOTLzFlfgvkiVxolk1k5bBIpmg==", + "license": "MIT", + "dependencies": { + "stream-buffers": "2.2.x" + } + }, + "node_modules/bplist-parser": { + "version": "0.2.0", + "resolved": "https://registry.npmjs.org/bplist-parser/-/bplist-parser-0.2.0.tgz", + "integrity": "sha512-z0M+byMThzQmD9NILRniCUXYsYpjwnlO8N5uCFaCqIOpqRsJCrQL9NK3JsD67CN5a08nF5oIL2bD6loTdHOuKw==", + "license": "MIT", + "dependencies": { + "big-integer": "^1.6.44" + }, + "engines": { + "node": ">= 5.10.0" + } + }, "node_modules/brace-expansion": { "version": "5.0.5", "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-5.0.5.tgz", @@ -2009,6 +2094,68 @@ "node": "18 || 20 || >=22" } }, + "node_modules/buffer": { + "version": "5.7.1", + "resolved": "https://registry.npmjs.org/buffer/-/buffer-5.7.1.tgz", + "integrity": "sha512-EHcyIPBQ4BSGlvjB16k5KgAJ27CIsHY/2JBmCRReo48y9rQ3MaUzWX3KVlBa4U7MyX02HdVj0K7C3WaB3ju7FQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT", + "dependencies": { + "base64-js": "^1.3.1", + "ieee754": "^1.1.13" + } + }, + "node_modules/buffer-crc32": { + "version": "0.2.13", + "resolved": "https://registry.npmjs.org/buffer-crc32/-/buffer-crc32-0.2.13.tgz", + "integrity": "sha512-VO9Ht/+p3SN7SKWqcrgEzjGbRSJYTx+Q1pTQC0wrWqHx0vpJraQ6GtHx8tvcg1rlK1byhU5gccxgOgj7B0TDkQ==", + "license": "MIT", + "engines": { + "node": "*" + } + }, + "node_modules/bufferpack": { + "version": "0.0.6", + "resolved": "https://registry.npmjs.org/bufferpack/-/bufferpack-0.0.6.tgz", + "integrity": "sha512-MTWvLHElqczrIVhge9qHtqgNigJFyh0+tCDId5yCbFAfuekHWIG+uAgvoHVflwrDPuY/e47JE1ki5qcM7w4uLg==", + "engines": { + "node": "*" + } + }, + "node_modules/bytebuffer": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/bytebuffer/-/bytebuffer-5.0.1.tgz", + "integrity": "sha512-IuzSdmADppkZ6DlpycMkm8l9zeEq16fWtLvunEwFiYciR/BHo4E8/xs5piFquG+Za8OWmMqHF8zuRviz2LHvRQ==", + "license": "Apache-2.0", + "dependencies": { + "long": "~3" + }, + "engines": { + "node": ">=0.8" + } + }, + "node_modules/bytebuffer/node_modules/long": { + "version": "3.2.0", + "resolved": "https://registry.npmjs.org/long/-/long-3.2.0.tgz", + "integrity": "sha512-ZYvPPOMqUwPoDsbJaR10iQJYnMuZhRTvHYl62ErLIEX7RgFlziSBUUvrt3OVfc47QlHHpzPZYP17g3Fv7oeJkg==", + "license": "Apache-2.0", + "engines": { + "node": ">=0.6" + } + }, "node_modules/caniuse-lite": { "version": "1.0.30001782", "resolved": "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001782.tgz", @@ -2042,6 +2189,18 @@ "url": "https://github.com/sponsors/mesqueeb" } }, + "node_modules/cgbi-to-png": { + "version": "1.0.7", + "resolved": "https://registry.npmjs.org/cgbi-to-png/-/cgbi-to-png-1.0.7.tgz", + "integrity": "sha512-YR80kxTPuq9oRpZUdQmNEQWrmTKLINk1cfLVfyrV7Rfr9KLtLJdcockPKbreIr4JYAq+DhHBR7w+WA/tF5VDaQ==", + "license": "mit", + "dependencies": { + "bufferpack": "0.0.6", + "crc": "^3.3.0", + "stream-to-buffer": "^0.1.0", + "streamifier": "^0.1.1" + } + }, "node_modules/chalk": { "version": "5.6.2", "resolved": "https://registry.npmjs.org/chalk/-/chalk-5.6.2.tgz", @@ -2189,6 +2348,15 @@ "node": "^14.18.0 || >=16.10.0" } }, + "node_modules/crc": { + "version": "3.8.0", + "resolved": "https://registry.npmjs.org/crc/-/crc-3.8.0.tgz", + "integrity": "sha512-iX3mfgcTMIq3ZKLIsVFAbv7+Mc10kxabAGQb8HvjA1o3T1PIYprbakQ65d3I+2HGHt6nSKkM9PYjgoJO2KcFBQ==", + "license": "MIT", + "dependencies": { + "buffer": "^5.1.0" + } + }, "node_modules/cross-spawn": { "version": "7.0.6", "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz", @@ -2549,6 +2717,15 @@ "dev": true, "license": "MIT" }, + "node_modules/fd-slicer": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/fd-slicer/-/fd-slicer-1.1.0.tgz", + "integrity": "sha512-cE1qsB/VwyQozZ+q1dGxR8LBYNZeofhEdUNGSMbQD3Gw2lAzX9Zb3uIU6Ebc/Fmyjo9AWWfnn0AUCHqtevs/8g==", + "license": "MIT", + "dependencies": { + "pend": "~1.2.0" + } + }, "node_modules/fdir": { "version": "6.5.0", "resolved": "https://registry.npmjs.org/fdir/-/fdir-6.5.0.tgz", @@ -2721,6 +2898,26 @@ "node": ">= 14" } }, + "node_modules/ieee754": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/ieee754/-/ieee754-1.2.1.tgz", + "integrity": "sha512-dcyqhDvX1C46lXZcVqCpK+FtMRQVdIMN6/Df5js2zouUsqG7I6sFxitIC+7KYK29KdXOLHdu9zL4sFnoVQnqaA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "BSD-3-Clause" + }, "node_modules/ignore": { "version": "5.3.2", "resolved": "https://registry.npmjs.org/ignore/-/ignore-5.3.2.tgz", @@ -2804,6 +3001,16 @@ "dev": true, "license": "ISC" }, + "node_modules/isomorphic-unzip": { + "version": "1.1.5", + "resolved": "https://registry.npmjs.org/isomorphic-unzip/-/isomorphic-unzip-1.1.5.tgz", + "integrity": "sha512-2McA51lWhmO3Kk438jxVcYeh6L+AOqVnl9XdX1yI7GlLA9RwEyTBgGem1rNuRIU2abAmOiv+IagThdUxASY4IA==", + "license": "MIT", + "dependencies": { + "buffer": "^5.0.7", + "yauzl": "^2.8.0" + } + }, "node_modules/json-buffer": { "version": "3.0.1", "resolved": "https://registry.npmjs.org/json-buffer/-/json-buffer-3.0.1.tgz", @@ -3192,6 +3399,12 @@ "node": ">=8" } }, + "node_modules/pend": { + "version": "1.2.0", + "resolved": "https://registry.npmjs.org/pend/-/pend-1.2.0.tgz", + "integrity": "sha512-F3asv42UuXchdzt+xXqfW1OGlVBe+mxa2mqI0pg5yAHZPvFmY3Y6drSf/GQ1A86WgWEN9Kzh/WrgKa6iGcHXLg==", + "license": "MIT" + }, "node_modules/picocolors": { "version": "1.1.1", "resolved": "https://registry.npmjs.org/picocolors/-/picocolors-1.1.1.tgz", @@ -3211,6 +3424,20 @@ "url": "https://github.com/sponsors/jonschlinkert" } }, + "node_modules/plist": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/plist/-/plist-3.1.0.tgz", + "integrity": "sha512-uysumyrvkUX0rX/dEVqt8gC3sTBzd4zoWfLeS29nb53imdaXVvLINYXTI2GNqzaMuvacNx4uJQ8+b3zXR0pkgQ==", + "license": "MIT", + "dependencies": { + "@xmldom/xmldom": "^0.8.8", + "base64-js": "^1.5.1", + "xmlbuilder": "^15.1.1" + }, + "engines": { + "node": ">=10.4.0" + } + }, "node_modules/postcss": { "version": "8.4.31", "resolved": "https://registry.npmjs.org/postcss/-/postcss-8.4.31.tgz", @@ -3454,6 +3681,29 @@ "url": "https://github.com/sponsors/isaacs" } }, + "node_modules/simple-plist": { + "version": "1.3.1", + "resolved": "https://registry.npmjs.org/simple-plist/-/simple-plist-1.3.1.tgz", + "integrity": "sha512-iMSw5i0XseMnrhtIzRb7XpQEXepa9xhWxGUojHBL43SIpQuDQkh3Wpy67ZbDzZVr6EKxvwVChnVpdl8hEVLDiw==", + "license": "MIT", + "dependencies": { + "bplist-creator": "0.1.0", + "bplist-parser": "0.3.1", + "plist": "^3.0.5" + } + }, + "node_modules/simple-plist/node_modules/bplist-parser": { + "version": "0.3.1", + "resolved": "https://registry.npmjs.org/bplist-parser/-/bplist-parser-0.3.1.tgz", + "integrity": "sha512-PyJxiNtA5T2PlLIeBot4lbp7rj4OadzjnMZD/G5zuBNt8ei/yCU7+wW0h2bag9vr8c+/WuRWmSxbqAl9hL1rBA==", + "license": "MIT", + "dependencies": { + "big-integer": "1.6.x" + }, + "engines": { + "node": ">= 5.10.0" + } + }, "node_modules/source-map-js": { "version": "1.2.1", "resolved": "https://registry.npmjs.org/source-map-js/-/source-map-js-1.2.1.tgz", @@ -3475,6 +3725,45 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/stream-buffers": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/stream-buffers/-/stream-buffers-2.2.0.tgz", + "integrity": "sha512-uyQK/mx5QjHun80FLJTfaWE7JtwfRMKBLkMne6udYOmvH0CawotVa7TfgYHzAnpphn4+TweIx1QKMnRIbipmUg==", + "license": "Unlicense", + "engines": { + "node": ">= 0.10.0" + } + }, + "node_modules/stream-to": { + "version": "0.2.2", + "resolved": "https://registry.npmjs.org/stream-to/-/stream-to-0.2.2.tgz", + "integrity": "sha512-Kg1BSDTwgGiVMtTCJNlo7kk/xzL33ZuZveEBRt6rXw+f1WLK/8kmz2NVCT/Qnv0JkV85JOHcLhD82mnXsR3kPw==", + "license": "MIT", + "engines": { + "node": ">= 0.10.0" + } + }, + "node_modules/stream-to-buffer": { + "version": "0.1.0", + "resolved": "https://registry.npmjs.org/stream-to-buffer/-/stream-to-buffer-0.1.0.tgz", + "integrity": "sha512-Da4WoKaZyu3nf+bIdIifh7IPkFjARBnBK+pYqn0EUJqksjV9afojjaCCHUemH30Jmu7T2qcKvlZm2ykN38uzaw==", + "license": "MIT", + "dependencies": { + "stream-to": "~0.2.0" + }, + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/streamifier": { + "version": "0.1.1", + "resolved": "https://registry.npmjs.org/streamifier/-/streamifier-0.1.1.tgz", + "integrity": "sha512-zDgl+muIlWzXNsXeyUfOk9dChMjlpkq0DRsxujtYPgyJ676yQ8jEm6zzaaWHFDg5BNcLuif0eD2MTyJdZqXpdg==", + "license": "MIT", + "engines": { + "node": ">=0.10" + } + }, "node_modules/string-width": { "version": "7.2.0", "resolved": "https://registry.npmjs.org/string-width/-/string-width-7.2.0.tgz", @@ -3829,6 +4118,15 @@ "node": ">=8" } }, + "node_modules/xmlbuilder": { + "version": "15.1.1", + "resolved": "https://registry.npmjs.org/xmlbuilder/-/xmlbuilder-15.1.1.tgz", + "integrity": "sha512-yMqGBqtXyeN1e3TGYvgNgDVZ3j84W4cwkOXQswghol6APgZWaff9lnbvN7MHYJOiXsvGPXtjTYJEiC9J2wv9Eg==", + "license": "MIT", + "engines": { + "node": ">=8.0" + } + }, "node_modules/y18n": { "version": "5.0.8", "resolved": "https://registry.npmjs.org/y18n/-/y18n-5.0.8.tgz", @@ -3931,6 +4229,16 @@ "node": ">=8" } }, + "node_modules/yauzl": { + "version": "2.10.0", + "resolved": "https://registry.npmjs.org/yauzl/-/yauzl-2.10.0.tgz", + "integrity": "sha512-p4a9I6X6nu6IhoGmBqAcbJy1mlC4j27vEPZX9F4L4/vZT3Lyq1VkFHw/V/PUcB9Buo+DG3iHkT0x3Qya58zc3g==", + "license": "MIT", + "dependencies": { + "buffer-crc32": "~0.2.3", + "fd-slicer": "~1.1.0" + } + }, "node_modules/yocto-queue": { "version": "0.1.2", "resolved": "https://registry.npmjs.org/yocto-queue/-/yocto-queue-0.1.0.tgz", @@ -3988,11 +4296,13 @@ "@grpc/proto-loader": "^0.7.0", "adm-zip": "^0.5.17", "ai": "^6.0.134", + "app-info-parser": "^1.1.6", "chalk": "^5.4.0", "commander": "^13.1.0", "dotenv": "^16.4.0", "ora": "^8.2.0", "protobufjs": "^7.5.4", + "simple-plist": "^1.3.1", "uuid": "^11.1.0", "yaml": "^2.8.2", "zod": "^4.1.8" diff --git a/packages/cli/package.json b/packages/cli/package.json index 9d6114c..5247217 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -82,11 +82,13 @@ "@grpc/proto-loader": "^0.7.0", "adm-zip": "^0.5.17", "ai": "^6.0.134", + "app-info-parser": "^1.1.6", "chalk": "^5.4.0", "commander": "^13.1.0", "dotenv": "^16.4.0", "ora": "^8.2.0", "protobufjs": "^7.5.4", + "simple-plist": "^1.3.1", "uuid": "^11.1.0", "yaml": "^2.8.2", "zod": "^4.1.8" diff --git a/packages/cli/src/appInspector.ts b/packages/cli/src/appInspector.ts new file mode 100644 index 0000000..9db4747 --- /dev/null +++ b/packages/cli/src/appInspector.ts @@ -0,0 +1,96 @@ +// Inspect an Android APK or iOS .app/.ipa/.zip bundle. +// Extracts the minimum metadata we need: platform, package/bundle id, +// and (for iOS) whether the build is simulator-compatible. + +import * as fs from 'node:fs'; +import AdmZip from 'adm-zip'; +import plist from 'simple-plist'; + +export interface AppMetadata { + platform: 'android' | 'ios'; + packageName: string; + simulatorCompatible: boolean; // iOS only — always true for Android + fileSize: number; +} + +export async function inspectApp(filePath: string): Promise { + if (!fs.existsSync(filePath)) { + throw new Error(`App file not found: ${filePath}`); + } + + const fileSize = fs.statSync(filePath).size; + const buffer = fs.readFileSync(filePath); + + // All supported formats (APK, IPA, .app.zip) are zip files + const isZip = + buffer.length >= 4 && + buffer[0] === 0x50 && + buffer[1] === 0x4b && + buffer[2] === 0x03 && + buffer[3] === 0x04; + + if (!isZip) { + throw new Error( + 'Not a valid APK or iOS app bundle — expected a zip file (.apk, .ipa, or .zip containing a .app directory).', + ); + } + + const zip = new AdmZip(filePath); + const entries = zip.getEntries(); + + // APKs have AndroidManifest.xml at the root + if (entries.some((e) => e.entryName === 'AndroidManifest.xml')) { + const packageName = await readApkPackage(filePath); + return { platform: 'android', packageName, simulatorCompatible: true, fileSize }; + } + + // iOS bundles have some.app/Info.plist (either at root or inside Payload/) + const plistEntry = entries.find((e) => + /(?:^|\/)[^/]+\.app\/Info\.plist$/.test(e.entryName), + ); + if (plistEntry) { + const { packageName, simulatorCompatible } = readIosInfo(plistEntry.getData()); + return { platform: 'ios', packageName, simulatorCompatible, fileSize }; + } + + throw new Error( + 'Not a valid APK or iOS app bundle — no AndroidManifest.xml or .app/Info.plist found inside the zip.', + ); +} + +async function readApkPackage(filePath: string): Promise { + // Use ApkParser directly — AppInfoParser's auto-detect checks file extension + // and we don't rely on that (multer temp files etc. have no extension). + const ApkParserMod = await import('app-info-parser/src/apk'); + const ApkParser = (ApkParserMod as unknown as { default: new (p: string) => { parse(): Promise> } }).default + ?? (ApkParserMod as unknown as new (p: string) => { parse(): Promise> }); + const result = await new ApkParser(filePath).parse(); + const packageName = (result['package'] as string) || ''; + if (!packageName) { + throw new Error('APK is missing a package name in AndroidManifest.xml.'); + } + return packageName; +} + +function readIosInfo(plistBuffer: Buffer): { packageName: string; simulatorCompatible: boolean } { + // simple-plist.parse() handles both XML and binary plists transparently. + const info = plist.parse(plistBuffer) as Record; + + const packageName = info['CFBundleIdentifier'] as string | undefined; + if (!packageName) { + throw new Error('iOS bundle Info.plist has no CFBundleIdentifier.'); + } + const platformName = info['DTPlatformName'] as string | undefined; + // iphoneos = device-only; anything else (iphonesimulator, missing, etc.) + // we treat as compatible and let simctl install fail later if it isn't. + const simulatorCompatible = platformName !== 'iphoneos'; + return { packageName, simulatorCompatible }; +} + +export function formatAppInfo(metadata: AppMetadata): string { + if (metadata.platform === 'android') { + return `Detected: Android APK\n Package: ${metadata.packageName}`; + } + const simNote = metadata.simulatorCompatible ? 'compatible \u2713' : '\u26A0 device-only'; + return `Detected: iOS app bundle\n Bundle ID: ${metadata.packageName}\n Simulator: ${simNote}`; +} diff --git a/packages/cli/src/cloudRunner.ts b/packages/cli/src/cloudRunner.ts index b411669..fc93508 100644 --- a/packages/cli/src/cloudRunner.ts +++ b/packages/cli/src/cloudRunner.ts @@ -4,6 +4,7 @@ import * as path from 'node:path'; import AdmZip from 'adm-zip'; import { Logger } from '@finalrun/common'; import { runCheck } from './checkRunner.js'; +import { inspectApp, formatAppInfo, type AppMetadata } from './appInspector.js'; const FINALRUN_CLOUD_URL = process.env['FINALRUN_CLOUD_URL'] || 'https://cloud.finalrun.io'; const FINALRUN_API_KEY = process.env['FINALRUN_API_KEY'] || ''; @@ -61,10 +62,33 @@ export async function runCloud(options: CloudRunnerOptions): Promise { // 2. Resolve app — either from --app flag or latest upload let appMode: { type: 'file'; path: string } | { type: 'existing'; upload: AppUploadEntry }; + let inlineMetadata: AppMetadata | undefined; if (options.appPath) { if (!fs.existsSync(options.appPath)) { throw new Error(`App file not found: ${options.appPath}`); } + + // Inspect inline app and print info before submitting + try { + inlineMetadata = await inspectApp(options.appPath); + } catch (e) { + const msg = e instanceof Error ? e.message : String(e); + throw new Error(`\n\x1b[31mx ${msg}\x1b[0m\n`); + } + + if (inlineMetadata.platform === 'ios' && inlineMetadata.simulatorCompatible === false) { + throw new Error( + `\n\x1b[31mx This iOS app is a device-only build and cannot run on simulators.\x1b[0m\n` + + ` Rebuild with the iphonesimulator SDK:\n` + + ` • Flutter: flutter build ios --simulator --debug\n` + + ` • Xcode: xcodebuild -sdk iphonesimulator ...\n`, + ); + } + + console.log(''); + console.log(formatAppInfo(inlineMetadata)); + console.log(''); + appMode = { type: 'file', path: options.appPath }; } else { const latest = await fetchLatestAppUpload(); @@ -180,6 +204,13 @@ export async function runCloud(options: CloudRunnerOptions): Promise { formData.append('appFile', new Blob([appBuffer]), appFileName); formData.append('appFilename', appFileName); + // Include inspected metadata as a hint — server re-validates authoritatively. + // Don't append `platform` here: the --platform flag is already sent above, + // and multer would combine duplicates into an array breaking downstream .trim(). + if (inlineMetadata) { + formData.append('packageName', inlineMetadata.packageName); + } + const submissionLabel = options.suitePath ? `suite ${path.basename(options.suitePath)} (${checked.tests.length} test(s))` : `${checked.tests.length} test(s)`; @@ -262,6 +293,28 @@ export async function uploadApp(appPath: string): Promise { throw new Error(`App file not found: ${appPath}`); } + // 1. Inspect and print info before uploading + let metadata: AppMetadata; + try { + metadata = await inspectApp(appPath); + } catch (e) { + const msg = e instanceof Error ? e.message : String(e); + throw new Error(`\n\x1b[31mx ${msg}\x1b[0m\n`); + } + + if (metadata.platform === 'ios' && metadata.simulatorCompatible === false) { + throw new Error( + `\n\x1b[31mx This iOS app is a device-only build and cannot run on simulators.\x1b[0m\n` + + ` Rebuild with the iphonesimulator SDK:\n` + + ` • Flutter: flutter build ios --simulator --debug\n` + + ` • Xcode: xcodebuild -sdk iphonesimulator ...\n`, + ); + } + + console.log(''); + console.log(formatAppInfo(metadata)); + console.log(''); + const appBuffer = fs.readFileSync(appPath); const appFileName = path.basename(appPath); const appSize = appBuffer.byteLength; @@ -270,8 +323,11 @@ export async function uploadApp(appPath: string): Promise { const spinner = ora(`Uploading ${appFileName} (${formatBytes(appSize)})...`).start(); const uploadStart = Date.now(); + // 2. Build form data with metadata hints — server will re-validate const formData = new FormData(); formData.append('appFile', new Blob([appBuffer]), appFileName); + formData.append('platform', metadata.platform); + formData.append('packageName', metadata.packageName); let response: Response; try { diff --git a/packages/cli/src/types/app-info-parser.d.ts b/packages/cli/src/types/app-info-parser.d.ts new file mode 100644 index 0000000..80add2c --- /dev/null +++ b/packages/cli/src/types/app-info-parser.d.ts @@ -0,0 +1,20 @@ +declare module 'app-info-parser' { + export default class AppInfoParser { + constructor(file: string | Buffer); + parse(): Promise>; + } +} + +declare module 'app-info-parser/src/apk' { + export default class ApkParser { + constructor(file: string | Buffer); + parse(): Promise>; + } +} + +declare module 'app-info-parser/src/ipa' { + export default class IpaParser { + constructor(file: string | Buffer); + parse(): Promise>; + } +} From 2688ec55d9152a5a3d3369ed6c52fd14af914225 Mon Sep 17 00:00:00 2001 From: srinidhi-lwt Date: Fri, 17 Apr 2026 18:38:19 +0530 Subject: [PATCH 10/80] adding traces to langfuse --- packages/goal-executor/src/ActionExecutor.ts | 66 ++++++++--- packages/goal-executor/src/TestExecutor.ts | 19 ++++ packages/goal-executor/src/ai/AIAgent.ts | 103 ++++++++++++++++-- .../goal-executor/src/ai/VisualGrounder.ts | 19 +++- packages/goal-executor/src/index.ts | 1 + packages/goal-executor/src/trace.ts | 38 +++++++ 6 files changed, 217 insertions(+), 29 deletions(-) diff --git a/packages/goal-executor/src/ActionExecutor.ts b/packages/goal-executor/src/ActionExecutor.ts index 5c6dc2e..20ef73f 100644 --- a/packages/goal-executor/src/ActionExecutor.ts +++ b/packages/goal-executor/src/ActionExecutor.ts @@ -58,6 +58,7 @@ import { roundDuration, startTracePhase, type LLMTrace, + type LLMCallTrace, type SpanTiming, type TimingMetadata, type TraceStatus, @@ -89,6 +90,8 @@ export interface ActionOutput { error?: string; trace?: TimingMetadata; terminalFailure?: TerminalFailureSignal; + /** Raw LLM calls made during this action (grounder + visual grounder). Forwarded to observability. */ + llmCalls?: LLMCallTrace[]; } interface GroundToPointResult { @@ -125,6 +128,9 @@ export class ActionExecutor { private _platform: string; private _appIdentifier?: string; private _runtimeBindings?: RuntimeBindings; + // Accumulates LLM call traces during a single executeAction() invocation. + // Reset at entry; drained into ActionOutput.llmCalls at exit. + private _currentLlmCalls: LLMCallTrace[] = []; constructor(params: { agent: DeviceAgent; @@ -146,49 +152,65 @@ export class ActionExecutor { * Routes to the correct handler based on action type. */ async executeAction(input: ActionInput): Promise { + // Reset per-action accumulator + this._currentLlmCalls = []; + let output: ActionOutput; try { switch (input.action) { case PLANNER_ACTION_TAP: - return await this._executeTap(input); + output = await this._executeTap(input); + break; case PLANNER_ACTION_LONG_PRESS: - return await this._executeLongPress(input); + output = await this._executeLongPress(input); + break; case PLANNER_ACTION_TYPE: - return await this._executeType(input); + output = await this._executeType(input); + break; case PLANNER_ACTION_SCROLL: - return await this._executeScroll(input); + output = await this._executeScroll(input); + break; case PLANNER_ACTION_BACK: - return await this._executeSimpleAction(input, new BackAction()); + output = await this._executeSimpleAction(input, new BackAction()); + break; case PLANNER_ACTION_HOME: - return await this._executeSimpleAction(input, new HomeAction()); + output = await this._executeSimpleAction(input, new HomeAction()); + break; case PLANNER_ACTION_ROTATE: - return await this._executeSingleDevicePhase(input, new RotateAction()); + output = await this._executeSingleDevicePhase(input, new RotateAction()); + break; case PLANNER_ACTION_HIDE_KEYBOARD: - return await this._executeSimpleAction(input, new HideKeyboardAction()); + output = await this._executeSimpleAction(input, new HideKeyboardAction()); + break; case PLANNER_ACTION_PRESS_ENTER: - return await this._executePressEnter(input); + output = await this._executePressEnter(input); + break; case PLANNER_ACTION_LAUNCH_APP: - return await this._executeLaunchApp(input); + output = await this._executeLaunchApp(input); + break; case PLANNER_ACTION_SET_LOCATION: - return await this._executeSetLocation(input); + output = await this._executeSetLocation(input); + break; case PLANNER_ACTION_WAIT: - return await this._executeWait(input); + output = await this._executeWait(input); + break; case PLANNER_ACTION_DEEPLINK: - return await this._executeDeeplink(input); + output = await this._executeDeeplink(input); + break; default: - return { success: false, error: `Unknown action: ${input.action}` }; + output = { success: false, error: `Unknown action: ${input.action}` }; } } catch (error) { const terminalFailure = terminalFailureFromError(error); @@ -197,8 +219,15 @@ export class ActionExecutor { } else { Logger.e(`Action ${input.action} failed:`, error); } - return this._failure([], error); + output = this._failure([], error); } + + // Attach accumulated LLM calls, if any + if (this._currentLlmCalls.length > 0) { + output = { ...output, llmCalls: this._currentLlmCalls }; + } + this._currentLlmCalls = []; + return output; } private async _executeTap(input: ActionInput): Promise { @@ -828,6 +857,9 @@ export class ActionExecutor { platform: this._platform, traceStep: input.traceStep, }); + if (result.llmCall) { + this._currentLlmCalls.push(result.llmCall); + } } catch (error) { const message = this._redactRuntimeString( error instanceof Error ? error.message : String(error), @@ -950,6 +982,10 @@ export class ActionExecutor { tracePhase: request.tracePhase ?? 'action.ground', }); + if (response.llmCall) { + this._currentLlmCalls.push(response.llmCall); + } + return { ...response, trace: diff --git a/packages/goal-executor/src/TestExecutor.ts b/packages/goal-executor/src/TestExecutor.ts index 4e9a4f0..9706d54 100644 --- a/packages/goal-executor/src/TestExecutor.ts +++ b/packages/goal-executor/src/TestExecutor.ts @@ -27,6 +27,7 @@ import { type SpanTiming, type StepTrace, type TimingMetadata, + type LLMCallTrace, } from './trace.js'; // ============================================================================ @@ -77,6 +78,13 @@ export interface AgentActionResult { durationMs?: number; timing?: TimingMetadata; trace?: StepTrace; + /** + * Raw LLM call traces that happened during this step (planner + any + * grounder / visual grounder calls). Consumers can forward these to + * observability backends (e.g., Langfuse). Empty for steps with no + * LLM activity. + */ + llmCalls?: LLMCallTrace[]; } export interface TestRecordingResult { @@ -595,6 +603,16 @@ export class TestExecutor { ); } + // Aggregate LLM calls for this step: planner call + any grounder/visual-grounder + // calls made by ActionExecutor. Order: planner first, then action calls. + const stepLLMCalls: LLMCallTrace[] = []; + if (plannerResponse.llmCall) { + stepLLMCalls.push(plannerResponse.llmCall); + } + if (actionResult.llmCalls && actionResult.llmCalls.length > 0) { + stepLLMCalls.push(...actionResult.llmCalls); + } + const stepResult: AgentActionResult = { iteration, action, @@ -610,6 +628,7 @@ export class TestExecutor { screenHeight: postActionCapture.screenHeight ?? deviceState.screenHeight, timestamp: new Date().toISOString(), timing: actionResult.trace, + ...(stepLLMCalls.length > 0 ? { llmCalls: stepLLMCalls } : {}), }; if (!actionResult.success && actionResult.error) { diff --git a/packages/goal-executor/src/ai/AIAgent.ts b/packages/goal-executor/src/ai/AIAgent.ts index 0d90b61..98e4d28 100644 --- a/packages/goal-executor/src/ai/AIAgent.ts +++ b/packages/goal-executor/src/ai/AIAgent.ts @@ -53,6 +53,7 @@ import { roundDuration, startTracePhase, type LLMTrace, + type LLMCallTrace, } from '../trace.js'; import { classifyFatalProviderError, FatalProviderError } from './providerFailure.js'; @@ -94,6 +95,8 @@ export interface PlannerResponse { act?: string; }; trace?: LLMTrace; + /** Raw LLM call trace captured during planning — forwarded to observability. */ + llmCall?: LLMCallTrace; } export interface GrounderRequest { @@ -111,6 +114,8 @@ export interface GrounderResponse { output: Record; raw: string; // Raw LLM response for debugging trace?: LLMTrace; + /** Raw LLM call trace captured during grounding — forwarded to observability. */ + llmCall?: LLMCallTrace; } type JsonRecord = Record; @@ -204,6 +209,7 @@ export class AIAgent { let parsedResponse: PlannerResponse | undefined; let llmMs = 0; let parseMs = 0; + let lastLLMCall: LLMCallTrace | undefined; for (let attempt = 1; attempt <= maxAttempts; attempt++) { const llmPhase = startTracePhase( @@ -216,9 +222,10 @@ export class AIAgent { let rawOutput: unknown; let rawText: string; try { - const llmResult = await this._callLLM(systemPrompt, userParts, 'planner'); + const llmResult = await this._callLLM(systemPrompt, userParts, 'planner', FEATURE_PLANNER); rawOutput = llmResult.output; rawText = llmResult.text; + lastLLMCall = llmResult.llmCall; } catch (error) { finishTracePhase( llmPhase, @@ -298,6 +305,7 @@ export class AIAgent { llmMs, parseMs, }, + ...(lastLLMCall ? { llmCall: lastLLMCall } : {}), }; } @@ -350,6 +358,7 @@ export class AIAgent { let parsed: GrounderResponse | undefined; let llmMs = 0; let parseMs = 0; + let lastLLMCall: LLMCallTrace | undefined; for (let attempt = 1; attempt <= maxAttempts; attempt++) { const phase = startTracePhase( @@ -362,9 +371,10 @@ export class AIAgent { let rawOutput: unknown; let rawText: string; try { - const llmResult = await this._callLLM(systemPrompt, userParts, 'grounder'); + const llmResult = await this._callLLM(systemPrompt, userParts, 'grounder', request.feature); rawOutput = llmResult.output; rawText = llmResult.text; + lastLLMCall = llmResult.llmCall; } catch (error) { finishTracePhase( phase, @@ -448,6 +458,7 @@ export class AIAgent { llmMs, parseMs, }, + ...(lastLLMCall ? { llmCall: lastLLMCall } : {}), }; } @@ -462,7 +473,8 @@ export class AIAgent { systemPrompt: string, userParts: Array<{ type: 'text'; text: string } | { type: 'image'; image: string }>, phase: LLMPhase, - ): Promise<{ output: unknown; text: string }> { + feature?: string, + ): Promise<{ output: unknown; text: string; llmCall: LLMCallTrace }> { const model = this._getModel(); const providerOptions = this._getProviderOptions(phase); @@ -474,16 +486,26 @@ export class AIAgent { return { type: 'text' as const, text: part.text }; }); + // Persist the exact messages we send so we can forward them verbatim to + // observability backends (Langfuse stores these for debugging). + const messages = [ + { role: 'system' as const, content: systemPrompt }, + { role: 'user' as const, content: userContent }, + ]; + + const startedAt = new Date().toISOString(); + const startPerfMs = performance.now(); + let output: unknown; - let text: string; + let text = ''; let reasoningText: string | undefined; + // eslint-disable-next-line @typescript-eslint/no-explicit-any + let usage: any; + let thrownError: unknown; try { const result = await generateText({ model, - messages: [ - { role: 'system', content: systemPrompt }, - { role: 'user', content: userContent }, - ], + messages, output: Output.json(), maxOutputTokens: phase === 'planner' ? 8192 : 4096, providerOptions, @@ -491,12 +513,35 @@ export class AIAgent { output = result.output; text = result.text; reasoningText = result.reasoningText; + usage = result.usage; } catch (error) { + thrownError = error; + } + + const completedAt = new Date().toISOString(); + const durationMs = roundDuration(performance.now() - startPerfMs); + + const llmCall: LLMCallTrace = { + provider: this._provider, + model: this._modelName, + feature: feature ?? phase, + prompt: messages, + completion: text, + usage: normalizeUsage(usage), + startedAt, + completedAt, + durationMs, + ...(thrownError + ? { statusMessage: thrownError instanceof Error ? thrownError.message : String(thrownError) } + : {}), + }; + + if (thrownError) { throw ( - classifyFatalProviderError(error, { + classifyFatalProviderError(thrownError, { provider: this._provider, modelName: this._modelName, - }) ?? error + }) ?? thrownError ); } @@ -509,9 +554,10 @@ export class AIAgent { Logger.d( `LLM response [${phase}] (${this._provider}/${this._modelName}):\n${text || ''}`, ); - return { output, text }; + return { output, text, llmCall }; } + /** * Create the appropriate Vercel AI SDK model instance. */ @@ -826,3 +872,38 @@ function normalizeBoolean(value: unknown): boolean | undefined { function firstNonEmpty(...values: Array): string | undefined { return values.find((value) => typeof value === 'string' && value.trim().length > 0); } + +/** + * Convert the Vercel AI SDK's `LanguageModelUsage` (inputTokens/outputTokens + * with nested *TokenDetails) into the Langfuse canonical shape + * (input/output/total, optional input_cached_tokens only if > 0). + * Fields default to 0 when the provider omits them. + */ +// eslint-disable-next-line @typescript-eslint/no-explicit-any +function normalizeUsage(usage: any): { input: number; output: number; total: number; input_cached_tokens?: number } { + const input = + typeof usage?.inputTokens === 'number' + ? usage.inputTokens + : typeof usage?.promptTokens === 'number' + ? usage.promptTokens + : 0; + const output = + typeof usage?.outputTokens === 'number' + ? usage.outputTokens + : typeof usage?.completionTokens === 'number' + ? usage.completionTokens + : 0; + const total = + typeof usage?.totalTokens === 'number' + ? usage.totalTokens + : input + output; + + const cacheRead = + typeof usage?.inputTokenDetails?.cacheReadTokens === 'number' + ? usage.inputTokenDetails.cacheReadTokens + : undefined; + + return cacheRead !== undefined && cacheRead > 0 + ? { input, output, total, input_cached_tokens: cacheRead } + : { input, output, total }; +} diff --git a/packages/goal-executor/src/ai/VisualGrounder.ts b/packages/goal-executor/src/ai/VisualGrounder.ts index 59a44dc..b005a4b 100644 --- a/packages/goal-executor/src/ai/VisualGrounder.ts +++ b/packages/goal-executor/src/ai/VisualGrounder.ts @@ -5,7 +5,7 @@ import { Logger } from '@finalrun/common'; import type { AIAgent } from './AIAgent.js'; import { FEATURE_VISUAL_GROUNDER } from '@finalrun/common'; -import type { LLMTrace } from '../trace.js'; +import type { LLMTrace, LLMCallTrace } from '../trace.js'; import { FatalProviderError } from './providerFailure.js'; export interface VisualGroundingResult { @@ -14,6 +14,8 @@ export interface VisualGroundingResult { y?: number; reason?: string; trace?: LLMTrace; + /** LLM call trace from the visual grounding attempt. */ + llmCall?: LLMCallTrace; } /** @@ -66,17 +68,28 @@ export class VisualGrounder { y: output['y'] as number, reason: output['reason'] as string, trace: response.trace, + ...(response.llmCall ? { llmCall: response.llmCall } : {}), }; } // Check for error if (output['isError']) { Logger.w(`Visual grounding failed: ${output['reason']}`); - return { success: false, reason: output['reason'] as string, trace: response.trace }; + return { + success: false, + reason: output['reason'] as string, + trace: response.trace, + ...(response.llmCall ? { llmCall: response.llmCall } : {}), + }; } Logger.w('Visual grounding returned unexpected format'); - return { success: false, reason: 'Unexpected response format', trace: response.trace }; + return { + success: false, + reason: 'Unexpected response format', + trace: response.trace, + ...(response.llmCall ? { llmCall: response.llmCall } : {}), + }; } catch (error) { if (FatalProviderError.isInstance(error)) { throw error; diff --git a/packages/goal-executor/src/index.ts b/packages/goal-executor/src/index.ts index 2109054..4256fd3 100644 --- a/packages/goal-executor/src/index.ts +++ b/packages/goal-executor/src/index.ts @@ -29,4 +29,5 @@ export type { SpanTiming, TimingMetadata, LLMTrace, + LLMCallTrace, } from './trace.js'; diff --git a/packages/goal-executor/src/trace.ts b/packages/goal-executor/src/trace.ts index b7a31ae..6c1a78a 100644 --- a/packages/goal-executor/src/trace.ts +++ b/packages/goal-executor/src/trace.ts @@ -22,6 +22,44 @@ export interface LLMTrace { parseMs: number; } +/** + * Per-LLM-call observability data — prompt, response, tokens, timing. + * Populated in AIAgent._callLLM() and bubbled up to TestExecutor so + * consumers (cloud-server) can forward to observability backends + * (e.g., Langfuse) without agent itself depending on any SDK. + * + * Field names mirror Langfuse's canonical ingestion schema to make + * forwarding a straight pass-through on the consumer side. + */ +export interface LLMCallTrace { + /** AI provider: 'openai' | 'google' | 'anthropic'. */ + provider: string; + /** Full model name, e.g. 'gpt-4.1-mini', 'gemini-2.0-flash'. */ + model: string; + /** Logical feature the call served: 'planner', 'grounder', 'visual_grounder', etc. */ + feature: string; + /** Full prompt as the provider saw it — array of role/content messages (includes any base64 images inline). */ + prompt: unknown; + /** Raw model response text. */ + completion: string; + /** Normalized token counts (Langfuse canonical names — input/output/total). */ + usage: { + input: number; + output: number; + total: number; + /** Only present if the provider reported cache-read input tokens > 0. */ + input_cached_tokens?: number; + }; + /** ISO-8601 timestamp when the call started. */ + startedAt: string; + /** ISO-8601 timestamp when the call returned or errored. */ + completedAt: string; + /** Wall-clock duration of the LLM call in ms. */ + durationMs: number; + /** Provider error message, if the call threw. */ + statusMessage?: string; +} + export interface ActiveTracePhase { phase: string; startedAt: number; From 04a6db59a6562238392b8da9875dd346a8d46c93 Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 20:11:31 -0700 Subject: [PATCH 11/80] feat: per-feature model and reasoning effort via workspace YAML Lift the all-or-nothing model config into a per-feature block so the planner can run on a strong reasoning model while specialized grounders use cheaper ones. Unified reasoning scale (minimal|low|medium|high) maps to each provider's native knob (Google thinkingLevel, OpenAI reasoningEffort, Anthropic effort); minimal stays OpenAI-only. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/cli-reference.md | 2 +- docs/codebase-walkthrough.md | 2 + docs/configuration.md | 27 +++ docs/environment.md | 2 + docs/troubleshooting.md | 6 + packages/cli/bin/finalrun.ts | 27 ++- packages/cli/src/apiKey.test.ts | 53 ++++- packages/cli/src/apiKey.ts | 61 ++++++ packages/cli/src/env.test.ts | 32 ++- packages/cli/src/env.ts | 22 ++ packages/cli/src/goalRunner.test.ts | 40 ++-- packages/cli/src/sessionRunner.ts | 33 ++- packages/cli/src/testRunner.test.ts | 80 +++---- packages/cli/src/testRunner.ts | 16 +- packages/cli/src/workspace.test.ts | 83 +++++++- packages/cli/src/workspace.ts | 55 ++++- packages/common/src/constants.ts | 36 ++++ packages/goal-executor/src/ai/AIAgent.test.ts | 196 ++++++++++++++---- packages/goal-executor/src/ai/AIAgent.ts | 154 +++++++++++--- 19 files changed, 752 insertions(+), 175 deletions(-) diff --git a/docs/cli-reference.md b/docs/cli-reference.md index b42355e..5669c47 100644 --- a/docs/cli-reference.md +++ b/docs/cli-reference.md @@ -26,7 +26,7 @@ Flags for `test` and `suite`: | `--model ` | AI model (e.g. `google/gemini-3-flash-preview`). Falls back to `.finalrun/config.yaml`. | | `--env ` | Environment name (matches `.finalrun/env/.yaml`). Falls back to config. | | `--app ` | Path to `.apk` or `.app` binary. Overrides the app identity in config. See [configuration.md](configuration.md) for details. | -| `--api-key ` | Override the provider API key. | +| `--api-key ` | Override the provider API key. Only valid when a single provider is in use across all features; use env vars when features target multiple providers. | | `--debug` | Enable debug logging. | | `--max-iterations ` | Limit AI action iterations per step. | diff --git a/docs/codebase-walkthrough.md b/docs/codebase-walkthrough.md index b849990..009772d 100644 --- a/docs/codebase-walkthrough.md +++ b/docs/codebase-walkthrough.md @@ -500,6 +500,8 @@ Standard Grounder (hierarchy-based) **Why Vercel AI SDK?** It provides a unified interface across providers, so the goal executor doesn't need provider-specific code for each LLM. +Model and reasoning effort are configurable per feature (planner, grounder, and the specialized grounders) via the `features:` block in `.finalrun/config.yaml`. See [configuration.md](configuration.md) for the YAML shape. + --- ## 8. The Device Layer (Physical Actions) diff --git a/docs/configuration.md b/docs/configuration.md index 0128f28..e136032 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -31,9 +31,14 @@ The workspace config defines defaults used by the CLI when flags are omitted. | `app.bundleId` | iOS bundle identifier (e.g. `com.example.myapp`) | | `env` | Default environment name (used when `--env` is omitted) | | `model` | Default AI model in `provider/model` format (used when `--model` is omitted) | +| `reasoning` | Default reasoning effort for all features: `minimal`, `low`, `medium`, or `high`. `minimal` is OpenAI-only. | +| `features..model` | Per-feature model override in `provider/model` format. | +| `features..reasoning` | Per-feature reasoning effort override. | At least one of `app.packageName` or `app.bundleId` is required. +Valid feature names: `planner`, `grounder`, `visual-grounder`, `scroll-index-grounder`, `input-focus-grounder`, `launch-app-grounder`, `set-location-grounder`. + ### Example ```yaml @@ -43,8 +48,30 @@ app: bundleId: com.example.myapp env: dev model: google/gemini-3-flash-preview +reasoning: medium + +# Optional — unlisted features inherit the default model and reasoning. +features: + planner: + model: anthropic/claude-opus-4-7 + reasoning: high + scroll-index-grounder: + reasoning: low ``` +### Per-Feature Overrides + +The `features:` block lets you tune each LLM call independently. Each feature drives a distinct prompt: + +- `planner` — decides the next user action from the current screen. +- `grounder` — picks the UI element for an action. +- `visual-grounder` — visual fallback when text grounding fails. +- `scroll-index-grounder`, `input-focus-grounder`, `launch-app-grounder`, `set-location-grounder` — specialized grounders for their respective actions. + +Both `model` and `reasoning` are optional per feature. Any unset field falls back to the workspace-level default (`model:` / `reasoning:`), and any unlisted feature inherits both defaults. + +If features target **different providers** (e.g. planner on Anthropic, grounder on Google), you must set each provider's env var (`OPENAI_API_KEY`, `GOOGLE_API_KEY`, `ANTHROPIC_API_KEY`) — see [environment.md](environment.md). The `--api-key` CLI flag only works when a single provider is active across all features. + ## App Identity FinalRun needs to know which app to launch on the device. The app identity is resolved in this order: diff --git a/docs/environment.md b/docs/environment.md index 1d7eabb..24daf8c 100644 --- a/docs/environment.md +++ b/docs/environment.md @@ -63,6 +63,8 @@ FinalRun resolves API keys by provider prefix: Keys are read from `process.env` and from workspace-root `.env` / `.env.`. You can also pass `--api-key` to override. +If `.finalrun/config.yaml` uses different providers across features (via the `features:` block in [configuration.md](configuration.md)), set the env var for each provider you reference. `--api-key` is only accepted when a single provider is in play. + ## Git: Keep Secrets Out of the Repo **Do not commit** `.env` files. Add the following to your app repository's `.gitignore`: diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 9d7b311..a38f810 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -6,6 +6,12 @@ FinalRun looks for `.finalrun/` by walking up from your current directory. Make **`Error: API key not configured`** Set the matching environment variable for your model provider. For `google/...`, set `GOOGLE_API_KEY` in your `.env` or shell. See [environment.md](environment.md#ai-provider-api-keys). +**`Error: --api-key is only valid when a single provider is active`** +Your `.finalrun/config.yaml` targets multiple providers via the `features:` block (see [configuration.md](configuration.md)). Drop `--api-key` and set each provider's env var instead: `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `ANTHROPIC_API_KEY`. + +**`Error: Reasoning level "minimal" is only supported for OpenAI`** +The `minimal` reasoning level only exists on OpenAI. Change the workspace or feature override to `low`, `medium`, or `high` — or route that feature to an OpenAI model. + **`Error: No Android emulator running`** Start an emulator with `emulator -avd ` or launch one from Android Studio. Run `finalrun doctor --platform android` to verify. diff --git a/packages/cli/bin/finalrun.ts b/packages/cli/bin/finalrun.ts index 7bd55c8..5ceeedd 100644 --- a/packages/cli/bin/finalrun.ts +++ b/packages/cli/bin/finalrun.ts @@ -5,7 +5,7 @@ import { Command } from 'commander'; import { Logger, LogLevel, type TestResult } from '@finalrun/common'; import { formatResolvedAppSummary } from '../src/appConfig.js'; import { CliEnv, MODEL_FORMAT_EXAMPLE, parseModel } from '../src/env.js'; -import { resolveApiKey } from '../src/apiKey.js'; +import { resolveApiKeys } from '../src/apiKey.js'; import { runCheck, SUITE_SELECTOR_CONFLICT_ERROR } from '../src/checkRunner.js'; import { runDoctorCommand } from '../src/doctorRunner.js'; import { @@ -297,6 +297,17 @@ async function runTestCommand(params: { const workspace = await resolveWorkspace(); const workspaceConfig = await loadWorkspaceConfig(workspace.finalrunDir); const model = parseModel(params.options.model ?? workspaceConfig.model); + const features = workspaceConfig.features; + const reasoning = workspaceConfig.reasoning; + + const requiredProviders = new Set([model.provider]); + if (features) { + for (const override of Object.values(features)) { + if (override?.model) { + requiredProviders.add(parseModel(override.model).provider); + } + } + } const debug = params.options.debug === true; Logger.init({ level: debug ? LogLevel.DEBUG : LogLevel.INFO, resetSinks: true }); @@ -312,9 +323,9 @@ async function runTestCommand(params: { : resolvedEnvironment.envName, { cwd: workspace.rootDir }, ); - const apiKey = resolveApiKey({ + const apiKeys = resolveApiKeys({ env: runtimeEnv, - provider: model.provider, + providers: requiredProviders, providedApiKey: params.options.apiKey, }); @@ -326,9 +337,13 @@ async function runTestCommand(params: { suitePath: normalizedSuitePath, platform: params.options.platform, appPath: params.options.app, - apiKey, - provider: model.provider, - modelName: model.modelName, + apiKeys, + defaults: { + provider: model.provider, + modelName: model.modelName, + reasoning, + }, + features, maxIterations: parseInt(params.options.maxIterations, 10) || 110, debug, invokedCommand: params.invokedCommand, diff --git a/packages/cli/src/apiKey.test.ts b/packages/cli/src/apiKey.test.ts index 3bcadc6..b1cbf1c 100644 --- a/packages/cli/src/apiKey.test.ts +++ b/packages/cli/src/apiKey.test.ts @@ -5,7 +5,7 @@ import os from 'node:os'; import path from 'node:path'; import test from 'node:test'; import { CliEnv } from './env.js'; -import { resolveApiKey } from './apiKey.js'; +import { resolveApiKey, resolveApiKeys } from './apiKey.js'; function createEnv(values: Record) { return { @@ -114,3 +114,54 @@ test('resolveApiKey reports the provider-matched env var in its error message', /Provide via --api-key or GOOGLE_API_KEY/, ); }); + +test('resolveApiKeys returns a per-provider map from env vars', () => { + const env = createEnv({ + OPENAI_API_KEY: 'openai-key', + GOOGLE_API_KEY: 'google-key', + ANTHROPIC_API_KEY: 'anthropic-key', + }); + + const keys = resolveApiKeys({ + env, + providers: new Set(['openai', 'google']), + }); + + assert.deepEqual(keys, { + openai: 'openai-key', + google: 'google-key', + }); +}); + +test('resolveApiKeys routes --api-key to the single active provider', () => { + const keys = resolveApiKeys({ + env: createEnv({}), + providers: ['openai'], + providedApiKey: 'flag-key', + }); + + assert.deepEqual(keys, { openai: 'flag-key' }); +}); + +test('resolveApiKeys rejects --api-key when multiple providers are configured', () => { + assert.throws( + () => + resolveApiKeys({ + env: createEnv({}), + providers: ['openai', 'anthropic'], + providedApiKey: 'flag-key', + }), + /--api-key is only valid when a single provider is active/, + ); +}); + +test('resolveApiKeys aggregates missing provider errors into one message', () => { + assert.throws( + () => + resolveApiKeys({ + env: createEnv({ OPENAI_API_KEY: 'openai-key' }), + providers: ['openai', 'google', 'anthropic'], + }), + /google \(GOOGLE_API_KEY\), anthropic \(ANTHROPIC_API_KEY\)/, + ); +}); diff --git a/packages/cli/src/apiKey.ts b/packages/cli/src/apiKey.ts index 16b2749..1b1867d 100644 --- a/packages/cli/src/apiKey.ts +++ b/packages/cli/src/apiKey.ts @@ -20,6 +20,54 @@ export function resolveApiKey(params: { return apiKey; } +/** + * Resolve API keys for every provider referenced by the current run. + * + * --api-key is accepted only when a single provider is in play; mixing + * providers across features requires env vars per provider (documented in + * docs/environment.md) so we can't silently pair one key with multiple + * providers. + */ +export function resolveApiKeys(params: { + env: Pick; + providers: Iterable; + providedApiKey?: string; +}): Record { + const providers = Array.from(new Set(params.providers)); + if (providers.length === 0) { + throw new Error('At least one provider must be specified when resolving API keys.'); + } + + if (params.providedApiKey !== undefined) { + if (providers.length > 1) { + throw new Error( + `--api-key is only valid when a single provider is active. This run uses multiple providers (${providers.join(', ')}). Provide the per-provider env vars instead: ${providers + .map((p) => PROVIDER_ENV_VARS[p as keyof typeof PROVIDER_ENV_VARS] ?? `<${p}>`) + .join(', ')}.`, + ); + } + return { [providers[0]!]: params.providedApiKey }; + } + + const resolved: Record = {}; + const missing: Array<{ provider: string; envVar?: string }> = []; + for (const provider of providers) { + const providerEnvVar = PROVIDER_ENV_VARS[provider as keyof typeof PROVIDER_ENV_VARS]; + const apiKey = providerEnvVar ? params.env.get(providerEnvVar) : undefined; + if (!apiKey) { + missing.push({ provider, envVar: providerEnvVar }); + continue; + } + resolved[provider] = apiKey; + } + + if (missing.length > 0) { + throw new Error(buildMissingApiKeysError(missing)); + } + + return resolved; +} + function buildMissingApiKeyError( provider: string, providerEnvVar?: string, @@ -30,3 +78,16 @@ function buildMissingApiKeyError( return `API key is required for provider "${provider}". Provide via --api-key.`; } + +function buildMissingApiKeysError( + missing: Array<{ provider: string; envVar?: string }>, +): string { + if (missing.length === 1) { + const entry = missing[0]!; + return buildMissingApiKeyError(entry.provider, entry.envVar); + } + const detail = missing + .map(({ provider, envVar }) => (envVar ? `${provider} (${envVar})` : provider)) + .join(', '); + return `API keys are required for multiple providers. Set the following env vars: ${detail}.`; +} diff --git a/packages/cli/src/env.test.ts b/packages/cli/src/env.test.ts index b11c8bd..dfe21b6 100644 --- a/packages/cli/src/env.test.ts +++ b/packages/cli/src/env.test.ts @@ -1,6 +1,6 @@ import assert from 'node:assert/strict'; import test from 'node:test'; -import { parseModel } from './env.js'; +import { parseModel, parseReasoningLevel } from './env.js'; test('parseModel requires an explicit model value', () => { assert.throws( @@ -43,3 +43,33 @@ test('parseModel rejects unsupported providers', () => { /Unsupported AI provider: "bedrock"\. Supported providers: openai, google, anthropic\./, ); }); + +test('parseReasoningLevel returns undefined when unset', () => { + assert.equal(parseReasoningLevel(undefined, 'reasoning'), undefined); + assert.equal(parseReasoningLevel(null, 'reasoning'), undefined); + assert.equal(parseReasoningLevel('', 'reasoning'), undefined); +}); + +test('parseReasoningLevel accepts minimal, low, medium, high', () => { + for (const value of ['minimal', 'low', 'medium', 'high']) { + assert.equal(parseReasoningLevel(value, 'reasoning'), value); + } +}); + +test('parseReasoningLevel trims surrounding whitespace', () => { + assert.equal(parseReasoningLevel(' high ', 'reasoning'), 'high'); +}); + +test('parseReasoningLevel rejects non-string values with a labeled error', () => { + assert.throws( + () => parseReasoningLevel(42, 'config.yaml reasoning'), + /config\.yaml reasoning must be a string\. Allowed values: minimal, low, medium, high\./, + ); +}); + +test('parseReasoningLevel rejects unknown values with a labeled error', () => { + assert.throws( + () => parseReasoningLevel('extreme', 'config.yaml reasoning'), + /config\.yaml reasoning has invalid value "extreme"\. Allowed values: minimal, low, medium, high\./, + ); +}); diff --git a/packages/cli/src/env.ts b/packages/cli/src/env.ts index ee38ef9..a5affe7 100644 --- a/packages/cli/src/env.ts +++ b/packages/cli/src/env.ts @@ -4,6 +4,7 @@ import * as dotenv from 'dotenv'; import * as path from 'path'; import * as fs from 'fs'; +import { REASONING_LEVELS, type ReasoningLevel } from '@finalrun/common'; /** * Environment configuration for the CLI. @@ -128,3 +129,24 @@ export function parseModel(modelStr: string | undefined): ParsedModel { modelName, }; } + +export const REASONING_LEVELS_LABEL = REASONING_LEVELS.join(', '); + +export function parseReasoningLevel(value: unknown, label: string): ReasoningLevel | undefined { + if (value === undefined || value === null) { + return undefined; + } + if (typeof value !== 'string') { + throw new Error(`${label} must be a string. Allowed values: ${REASONING_LEVELS_LABEL}.`); + } + const trimmed = value.trim(); + if (trimmed === '') { + return undefined; + } + if (!REASONING_LEVELS.includes(trimmed as ReasoningLevel)) { + throw new Error( + `${label} has invalid value "${trimmed}". Allowed values: ${REASONING_LEVELS_LABEL}.`, + ); + } + return trimmed as ReasoningLevel; +} diff --git a/packages/cli/src/goalRunner.test.ts b/packages/cli/src/goalRunner.test.ts index 5f915ab..a2f9029 100644 --- a/packages/cli/src/goalRunner.test.ts +++ b/packages/cli/src/goalRunner.test.ts @@ -284,9 +284,8 @@ test('runGoal starts and stops Android recording when recording is configured', const result = await runGoal( { goal: 'Log in', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-4.1', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-4.1' }, platform: PLATFORM_ANDROID, recording: { runId: 'run-1', @@ -340,9 +339,8 @@ test('executeTestOnSession forwards explicit recording output paths and preserve session, { goal: 'Test 1', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-4.1', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-4.1' }, recording: { runId: 'run-1', testId: 'case-1', @@ -662,9 +660,8 @@ test('executeTestOnSession reuses one prepared session while keeping recording s session, { goal: 'Test 1', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-4.1', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-4.1' }, recording: { runId: 'run-1', testId: 'case-1', @@ -676,9 +673,8 @@ test('executeTestOnSession reuses one prepared session while keeping recording s session, { goal: 'Test 2', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-4.1', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-4.1' }, recording: { runId: 'run-1', testId: 'case-2', @@ -744,9 +740,8 @@ test('executeTestOnSession forwards the prelaunch summary and app identifier to session, { goal: 'Test 1', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-4.1', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-4.1' }, }, dependencies, ); @@ -776,9 +771,8 @@ test('runGoal still performs isolated setup and cleanup for single-test executio const result = await runGoal( { goal: 'Log in', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-4.1', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-4.1' }, platform: PLATFORM_ANDROID, }, dependencies, @@ -808,9 +802,8 @@ test('runGoal fails before execution if required Android recording cannot start' const result = await runGoal( { goal: 'Log in', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-4.1', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-4.1' }, platform: PLATFORM_ANDROID, recording: { runId: 'run-1', @@ -850,9 +843,8 @@ test('runGoal marks the Android test as failed if recording stops without a vide const result = await runGoal( { goal: 'Log in', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-4.1', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-4.1' }, platform: PLATFORM_ANDROID, recording: { runId: 'run-1', diff --git a/packages/cli/src/sessionRunner.ts b/packages/cli/src/sessionRunner.ts index ccccb04..fcf6e2e 100644 --- a/packages/cli/src/sessionRunner.ts +++ b/packages/cli/src/sessionRunner.ts @@ -11,6 +11,8 @@ import { RecordingRequest, type DeviceInventoryDiagnostic, type DeviceInventoryEntry, + type FeatureOverrides, + type ModelDefaults, type RuntimeBindings, } from '@finalrun/common'; import { DeviceNode } from '@finalrun/device-node'; @@ -56,9 +58,9 @@ type GoalRunnerExecutor = Pick< export interface TestSessionConfig { goal: string; - apiKey: string; - provider: string; // 'openai' | 'google' | 'anthropic' - modelName: string; // e.g., 'gpt-5.4-mini', 'gemini-2.0-flash' + apiKeys: Record; + defaults: ModelDefaults; + features?: FeatureOverrides; maxIterations?: number; debug?: boolean; platform?: string; @@ -307,9 +309,9 @@ export async function executeTestOnSession( try { const aiAgent = dependencies.createAiAgent({ - provider: config.provider, - modelName: config.modelName, - apiKey: config.apiKey, + apiKeys: config.apiKeys, + defaults: config.defaults, + features: config.features, }); const executor = dependencies.createExecutor({ @@ -342,7 +344,7 @@ export async function executeTestOnSession( new RecordingRequest({ runId: config.recording.runId, testId: config.recording.testId, - apiKey: config.apiKey, + apiKey: config.apiKeys[config.defaults.provider] ?? '', outputFilePath: config.recording.outputFilePath, }), ); @@ -690,7 +692,22 @@ function printRunBanner(config: TestSessionConfig): void { console.log('\n\x1b[1mFinalRun CLI\x1b[0m'); console.log('─'.repeat(50)); console.log(`Goal: ${config.goal}`); - console.log(`Model: ${config.provider}/${config.modelName}`); + const defaultReasoning = config.defaults.reasoning ? ` (${config.defaults.reasoning})` : ''; + console.log(`Model: ${config.defaults.provider}/${config.defaults.modelName}${defaultReasoning}`); + if (config.features) { + const overrides = Object.entries(config.features) + .filter(([, override]) => override && (override.model || override.reasoning)) + .map(([feature, override]) => { + const parts: string[] = []; + if (override!.model) parts.push(override!.model); + if (override!.reasoning) parts.push(override!.reasoning); + return ` ${feature}: ${parts.join(' ')}`; + }); + if (overrides.length > 0) { + console.log('Feature overrides:'); + for (const line of overrides) console.log(line); + } + } console.log('─'.repeat(50) + '\n'); } diff --git a/packages/cli/src/testRunner.test.ts b/packages/cli/src/testRunner.test.ts index 279b8de..795797d 100644 --- a/packages/cli/src/testRunner.test.ts +++ b/packages/cli/src/testRunner.test.ts @@ -631,9 +631,8 @@ test('runTests finalizes top-level artifacts when shared-session execution throw envName: 'dev', cwd: rootDir, selectors: ['login.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(result.success, false); @@ -719,9 +718,8 @@ test('runTests succeeds without env config when the repo is env-free', async () const result = await runTests({ cwd: rootDir, selectors: ['smoke.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(result.success, true); @@ -779,9 +777,8 @@ test('runTests records the suite subcommand in run metadata when invoked via fin const result = await runTests({ cwd: rootDir, suitePath: 'smoke.yaml', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, invokedCommand: 'suite', }); @@ -846,9 +843,8 @@ test('runTests prepares one shared session for multiple tests and cleans it up o envName: 'dev', cwd: rootDir, selectors: ['login.yaml', 'search.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(result.success, true); @@ -908,9 +904,8 @@ test('runTests uses mov artifact recording output paths for iOS tests', async () cwd: rootDir, selectors: ['login.yaml'], platform: 'ios', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(result.success, true); @@ -977,9 +972,8 @@ test('runTests stops the batch after a shared-session failure and cleans up once envName: 'dev', cwd: rootDir, selectors: ['first.yaml', 'second.yaml', 'third.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(result.success, false); @@ -1060,9 +1054,8 @@ test('runTests stops remaining tests after a terminal AI provider failure', asyn envName: 'dev', cwd: rootDir, selectors: ['first.yaml', 'second.yaml', 'third.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(result.success, false); @@ -1176,9 +1169,8 @@ test('runTests aborts the batch after SIGINT and marks the active run as aborted envName: 'dev', cwd: rootDir, selectors: ['first.yaml', 'second.yaml', 'third.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(result.success, false); @@ -1268,9 +1260,8 @@ test('runTests requests a forced exit after a second SIGINT', async () => { envName: 'dev', cwd: rootDir, selectors: ['first.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(forcedExitCode, 130); @@ -1310,9 +1301,8 @@ test('runTests requires base app config even when the env file contains an app o envName: 'dev', cwd: rootDir, selectors: ['login.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }), (error: unknown) => { assert.ok(error instanceof PreExecutionFailureError); @@ -1350,9 +1340,8 @@ test('runTests rejects validation failures before creating run artifacts', async runTests({ envName: 'dev', cwd: rootDir, - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }), (error: unknown) => { assert.ok(error instanceof PreExecutionFailureError); @@ -1409,9 +1398,8 @@ test('runTests surfaces device setup diagnostics before execution without creati envName: 'dev', cwd: rootDir, selectors: ['login.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }), (error: unknown) => { assert.ok(error instanceof PreExecutionFailureError); @@ -1472,9 +1460,8 @@ test('runTests fails before prepareGoalSession when Android host preflight is bl cwd: rootDir, selectors: ['login.yaml'], platform: 'android', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }), (error: unknown) => { assert.ok(error instanceof PreExecutionFailureError); @@ -1535,9 +1522,8 @@ test('runTests fails before prepareGoalSession when iOS host preflight is blocke cwd: rootDir, selectors: ['login.yaml'], platform: 'ios', - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }), (error: unknown) => { assert.ok(error instanceof PreExecutionFailureError); @@ -1609,9 +1595,8 @@ test('runTests continues when one platform is healthy and the other is blocked', envName: 'dev', cwd: rootDir, selectors: ['login.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }); assert.equal(result.success, true); @@ -1674,9 +1659,8 @@ test('runTests requires --platform when both Android and iOS apps are configured envName: 'dev', cwd: rootDir, selectors: ['login.yaml'], - apiKey: 'test-key', - provider: 'openai', - modelName: 'gpt-5.4-mini', + apiKeys: { openai: 'test-key' }, + defaults: { provider: 'openai', modelName: 'gpt-5.4-mini' }, }), (error: unknown) => { assert.ok(error instanceof PreExecutionFailureError); diff --git a/packages/cli/src/testRunner.ts b/packages/cli/src/testRunner.ts index 1e79d6d..21b5eb9 100644 --- a/packages/cli/src/testRunner.ts +++ b/packages/cli/src/testRunner.ts @@ -4,7 +4,9 @@ import { LogLevel, type DeviceInfo, type DeviceInventoryDiagnostic, + type FeatureOverrides, type LogEntry, + type ModelDefaults, type RunTarget, type RuntimeBindings, type TestResult, @@ -40,9 +42,9 @@ import { } from './workspace.js'; export interface TestRunnerOptions extends CheckRunnerOptions { - apiKey: string; - provider: string; - modelName: string; + apiKeys: Record; + defaults: ModelDefaults; + features?: FeatureOverrides; maxIterations?: number; debug?: boolean; invokedCommand?: 'test' | 'suite'; @@ -261,7 +263,7 @@ export async function runTests(options: TestRunnerOptions): Promise { try { await assert.rejects( () => runCheck({ cwd: rootDir }), - /config\.yaml contains unsupported key "region"\. Supported keys: env, model, app\./, + /config\.yaml contains unsupported key "region"\. Supported keys: env, model, reasoning, features, app\./, ); } finally { await fsp.rm(rootDir, { recursive: true, force: true }); @@ -633,6 +633,87 @@ test('runCheck rejects empty env values in .finalrun/config.yaml', async () => { } }); +test('runCheck rejects invalid reasoning level in .finalrun/config.yaml', async () => { + const rootDir = createTempWorkspace({ + configYaml: 'reasoning: extreme\n', + }); + + try { + await assert.rejects( + () => runCheck({ cwd: rootDir }), + /config\.yaml reasoning has invalid value "extreme"\. Allowed values: minimal, low, medium, high\./, + ); + } finally { + await fsp.rm(rootDir, { recursive: true, force: true }); + } +}); + +test('runCheck rejects unknown feature names in .finalrun/config.yaml', async () => { + const rootDir = createTempWorkspace({ + configYaml: ['features:', ' plannerX:', ' reasoning: high'].join('\n'), + }); + + try { + await assert.rejects( + () => runCheck({ cwd: rootDir }), + /features contains unsupported key "plannerX"\. Supported keys: planner, grounder, visual-grounder, scroll-index-grounder, input-focus-grounder, launch-app-grounder, set-location-grounder\./, + ); + } finally { + await fsp.rm(rootDir, { recursive: true, force: true }); + } +}); + +test('runCheck rejects unknown inner keys in a features override', async () => { + const rootDir = createTempWorkspace({ + configYaml: ['features:', ' planner:', ' temperature: 0.2'].join('\n'), + }); + + try { + await assert.rejects( + () => runCheck({ cwd: rootDir }), + /features\.planner contains unsupported key "temperature"\. Supported keys: model, reasoning\./, + ); + } finally { + await fsp.rm(rootDir, { recursive: true, force: true }); + } +}); + +test('runCheck rejects invalid reasoning in a features override', async () => { + const rootDir = createTempWorkspace({ + configYaml: ['features:', ' planner:', ' reasoning: extreme'].join('\n'), + }); + + try { + await assert.rejects( + () => runCheck({ cwd: rootDir }), + /features\.planner\.reasoning has invalid value "extreme"\./, + ); + } finally { + await fsp.rm(rootDir, { recursive: true, force: true }); + } +}); + +test('runCheck accepts a valid features block and preserves unset features', async () => { + const rootDir = createTempWorkspace({ + configYaml: [ + 'model: openai/gpt-5.4-mini', + 'reasoning: medium', + 'features:', + ' planner:', + ' model: anthropic/claude-opus-4-7', + ' reasoning: high', + ' scroll-index-grounder:', + ' reasoning: low', + ].join('\n'), + }); + + try { + await assert.doesNotReject(() => runCheck({ cwd: rootDir })); + } finally { + await fsp.rm(rootDir, { recursive: true, force: true }); + } +}); + test('runCheck accepts empty-string secret environment values when the variable is present', async () => { const secretEnvVar = 'FINALRUN_EMPTY_SECRET'; const previousSecret = process.env[secretEnvVar]; diff --git a/packages/cli/src/workspace.ts b/packages/cli/src/workspace.ts index 534427e..f98cf8d 100644 --- a/packages/cli/src/workspace.ts +++ b/packages/cli/src/workspace.ts @@ -3,12 +3,18 @@ import { spawnSync } from 'node:child_process'; import * as fs from 'node:fs/promises'; import * as path from 'node:path'; import { + ALL_FEATURES, PLATFORM_ANDROID, PLATFORM_IOS, type AppConfig, + type FeatureName, + type FeatureOverride, + type FeatureOverrides, + type ReasoningLevel, } from '@finalrun/common'; import YAML from 'yaml'; import { readAppConfig } from './appConfig.js'; +import { parseReasoningLevel } from './env.js'; import { resolveFinalRunRootDir } from './runtimePaths.js'; import { promptForWorkspaceSelection, type WorkspaceSelectionIO } from './workspacePicker.js'; @@ -30,6 +36,8 @@ export interface AppOverrideValidationResult { export interface WorkspaceConfig { env?: string; model?: string; + reasoning?: ReasoningLevel; + features?: FeatureOverrides; app?: AppConfig; } @@ -57,7 +65,15 @@ export interface RegisteredWorkspaceEntry { metadataPath: string; } -const WORKSPACE_CONFIG_TOP_LEVEL_KEYS = new Set(['env', 'model', 'app']); +const WORKSPACE_CONFIG_TOP_LEVEL_KEYS = new Set([ + 'env', + 'model', + 'reasoning', + 'features', + 'app', +]); +const FEATURE_OVERRIDE_KEYS = new Set(['model', 'reasoning']); +const ALL_FEATURES_SET = new Set(ALL_FEATURES); const WORKSPACE_HASH_LENGTH = 16; export async function resolveWorkspace( @@ -420,10 +436,47 @@ export async function loadWorkspaceConfig(finalrunDir: string): Promise 0 ? overrides : undefined; +} + export async function resolveConfiguredEnvironmentFile( workspace: FinalRunWorkspace, requestedEnvName?: string, diff --git a/packages/common/src/constants.ts b/packages/common/src/constants.ts index a5f0d9b..0c53e87 100644 --- a/packages/common/src/constants.ts +++ b/packages/common/src/constants.ts @@ -52,6 +52,42 @@ export const FEATURE_INPUT_FOCUS_GROUNDER = 'input-focus-grounder'; export const FEATURE_LAUNCH_APP_GROUNDER = 'launch-app-grounder'; export const FEATURE_SET_LOCATION_GROUNDER = 'set-location-grounder'; +export const ALL_FEATURES = [ + FEATURE_PLANNER, + FEATURE_GROUNDER, + FEATURE_VISUAL_GROUNDER, + FEATURE_SCROLL_INDEX_GROUNDER, + FEATURE_INPUT_FOCUS_GROUNDER, + FEATURE_LAUNCH_APP_GROUNDER, + FEATURE_SET_LOCATION_GROUNDER, +] as const; +export type FeatureName = (typeof ALL_FEATURES)[number]; + +// ============================================================================ +// Reasoning effort — unified level mapped per-provider inside AIAgent. +// 'minimal' is OpenAI-only; Google/Anthropic reject it at call time. +// ============================================================================ +export const REASONING_LEVELS = ['minimal', 'low', 'medium', 'high'] as const; +export type ReasoningLevel = (typeof REASONING_LEVELS)[number]; + +/** + * Per-feature override resolved from `features:` in .finalrun/config.yaml. + * Each field is optional; unset fields inherit workspace-level defaults. + * `model` is a "provider/modelName" string (validated via parseModel at use site). + */ +export interface FeatureOverride { + model?: string; + reasoning?: ReasoningLevel; +} + +export type FeatureOverrides = Partial>; + +export interface ModelDefaults { + provider: string; + modelName: string; + reasoning?: ReasoningLevel; +} + // ============================================================================ // Defaults // ============================================================================ diff --git a/packages/goal-executor/src/ai/AIAgent.test.ts b/packages/goal-executor/src/ai/AIAgent.test.ts index fcd476c..051f9c0 100644 --- a/packages/goal-executor/src/ai/AIAgent.test.ts +++ b/packages/goal-executor/src/ai/AIAgent.test.ts @@ -2,21 +2,44 @@ import assert from 'node:assert/strict'; import test from 'node:test'; import { FEATURE_GROUNDER, + FEATURE_PLANNER, + FEATURE_SCROLL_INDEX_GROUNDER, PLANNER_ACTION_ROTATE, PLANNER_ACTION_TAP, + type FeatureName, + type FeatureOverrides, + type ModelDefaults, } from '@finalrun/common'; import { AIAgent, GrounderResponse, PlannerResponse } from './AIAgent.js'; import { FatalProviderError } from './providerFailure.js'; type LLMPhase = 'planner' | 'grounder'; -function parsePlannerResponse(output: unknown, rawText = ''): PlannerResponse { - const agent = new AIAgent({ - provider: 'google', - modelName: 'gemini-test', - apiKey: 'test-key', +function makeAgent(overrides?: { + defaults?: Partial; + features?: FeatureOverrides; + apiKeys?: Record; +}): AIAgent { + const defaults: ModelDefaults = { + provider: overrides?.defaults?.provider ?? 'google', + modelName: overrides?.defaults?.modelName ?? 'gemini-test', + ...(overrides?.defaults?.reasoning !== undefined + ? { reasoning: overrides.defaults.reasoning } + : {}), + }; + return new AIAgent({ + apiKeys: overrides?.apiKeys ?? { + google: 'test-key', + openai: 'test-key', + anthropic: 'test-key', + }, + defaults, + ...(overrides?.features !== undefined ? { features: overrides.features } : {}), }); +} +function parsePlannerResponse(output: unknown, rawText = ''): PlannerResponse { + const agent = makeAgent(); return ( agent as unknown as { _parsePlannerResponse: (output: unknown, rawText: string) => PlannerResponse; @@ -25,12 +48,7 @@ function parsePlannerResponse(output: unknown, rawText = ''): PlannerResponse { } function parseGrounderResponse(output: unknown, rawText = ''): GrounderResponse { - const agent = new AIAgent({ - provider: 'google', - modelName: 'gemini-test', - apiKey: 'test-key', - }); - + const agent = makeAgent(); return ( agent as unknown as { _parseGrounderResponse: (output: unknown, rawText: string) => GrounderResponse; @@ -41,26 +59,44 @@ function parseGrounderResponse(output: unknown, rawText = ''): GrounderResponse function getProviderOptions(params: { provider: string; modelName: string; - phase: LLMPhase; + feature: FeatureName; + defaultReasoning?: ModelDefaults['reasoning']; + features?: FeatureOverrides; }): Record | undefined { - const agent = new AIAgent({ - provider: params.provider, - modelName: params.modelName, - apiKey: 'test-key', + const agent = makeAgent({ + defaults: { + provider: params.provider, + modelName: params.modelName, + ...(params.defaultReasoning !== undefined ? { reasoning: params.defaultReasoning } : {}), + }, + ...(params.features !== undefined ? { features: params.features } : {}), }); + const resolved = ( + agent as unknown as { + _resolveFeatureConfig: (feature: FeatureName) => { + provider: string; + modelName: string; + reasoning: string; + }; + } + )._resolveFeatureConfig(params.feature); + return ( agent as unknown as { - _getProviderOptions: (phase: LLMPhase) => Record | undefined; + _getProviderOptions: ( + resolved: { provider: string; modelName: string; reasoning: string }, + feature: FeatureName, + ) => Record | undefined; } - )._getProviderOptions(params.phase); + )._getProviderOptions(resolved, params.feature); } -test('AIAgent uses medium Gemini 3 reasoning defaults for planner calls', () => { +test('AIAgent uses medium Google reasoning defaults for planner feature', () => { const providerOptions = getProviderOptions({ provider: 'google', modelName: 'gemini-3.1-pro-preview', - phase: 'planner', + feature: FEATURE_PLANNER, }); assert.deepEqual(providerOptions, { @@ -73,17 +109,17 @@ test('AIAgent uses medium Gemini 3 reasoning defaults for planner calls', () => }); }); -test('AIAgent uses minimal Gemini 3 reasoning defaults for grounder calls', () => { +test('AIAgent uses low Google reasoning defaults for grounder feature', () => { const providerOptions = getProviderOptions({ provider: 'google', modelName: 'gemini-3.1-pro-preview', - phase: 'grounder', + feature: FEATURE_GROUNDER, }); assert.deepEqual(providerOptions, { google: { thinkingConfig: { - thinkingLevel: 'minimal', + thinkingLevel: 'low', includeThoughts: false, }, }, @@ -94,7 +130,7 @@ test('AIAgent applies Google reasoning defaults without model-family gating', () const providerOptions = getProviderOptions({ provider: 'google', modelName: 'gemini-2.0-flash', - phase: 'planner', + feature: FEATURE_PLANNER, }); assert.deepEqual(providerOptions, { @@ -107,11 +143,11 @@ test('AIAgent applies Google reasoning defaults without model-family gating', () }); }); -test('AIAgent uses medium GPT-5 reasoning defaults for planner calls', () => { +test('AIAgent uses medium OpenAI reasoning defaults for planner feature', () => { const providerOptions = getProviderOptions({ provider: 'openai', modelName: 'gpt-5', - phase: 'planner', + feature: FEATURE_PLANNER, }); assert.deepEqual(providerOptions, { @@ -121,11 +157,11 @@ test('AIAgent uses medium GPT-5 reasoning defaults for planner calls', () => { }); }); -test('AIAgent uses low GPT-5 reasoning defaults for grounder calls', () => { +test('AIAgent uses low OpenAI reasoning defaults for grounder feature', () => { const providerOptions = getProviderOptions({ provider: 'openai', modelName: 'gpt-5', - phase: 'grounder', + feature: FEATURE_GROUNDER, }); assert.deepEqual(providerOptions, { @@ -139,7 +175,7 @@ test('AIAgent applies OpenAI reasoning defaults without model-family gating', () const providerOptions = getProviderOptions({ provider: 'openai', modelName: 'gpt-5.4-mini', - phase: 'planner', + feature: FEATURE_PLANNER, }); assert.deepEqual(providerOptions, { @@ -149,11 +185,11 @@ test('AIAgent applies OpenAI reasoning defaults without model-family gating', () }); }); -test('AIAgent uses medium Anthropic effort defaults for planner calls', () => { +test('AIAgent uses medium Anthropic effort defaults for planner feature', () => { const providerOptions = getProviderOptions({ provider: 'anthropic', modelName: 'claude-sonnet-4-6', - phase: 'planner', + feature: FEATURE_PLANNER, }); assert.deepEqual(providerOptions, { @@ -163,11 +199,11 @@ test('AIAgent uses medium Anthropic effort defaults for planner calls', () => { }); }); -test('AIAgent uses low Anthropic effort defaults for grounder calls', () => { +test('AIAgent uses low Anthropic effort defaults for grounder feature', () => { const providerOptions = getProviderOptions({ provider: 'anthropic', modelName: 'claude-sonnet-4-6', - phase: 'grounder', + feature: FEATURE_GROUNDER, }); assert.deepEqual(providerOptions, { @@ -181,7 +217,7 @@ test('AIAgent applies Anthropic effort defaults without model-family gating', () const providerOptions = getProviderOptions({ provider: 'anthropic', modelName: 'claude-3-7-sonnet-latest', - phase: 'planner', + feature: FEATURE_PLANNER, }); assert.deepEqual(providerOptions, { @@ -191,6 +227,88 @@ test('AIAgent applies Anthropic effort defaults without model-family gating', () }); }); +test('AIAgent respects workspace-wide reasoning default across features', () => { + const providerOptions = getProviderOptions({ + provider: 'openai', + modelName: 'gpt-5.4-mini', + feature: FEATURE_GROUNDER, + defaultReasoning: 'high', + }); + + assert.deepEqual(providerOptions, { + openai: { + reasoningEffort: 'high', + }, + }); +}); + +test('AIAgent per-feature reasoning override beats workspace default', () => { + const providerOptions = getProviderOptions({ + provider: 'openai', + modelName: 'gpt-5.4-mini', + feature: FEATURE_PLANNER, + defaultReasoning: 'low', + features: { planner: { reasoning: 'high' } }, + }); + + assert.deepEqual(providerOptions, { + openai: { + reasoningEffort: 'high', + }, + }); +}); + +test('AIAgent per-feature model override re-routes to the named provider', () => { + const providerOptions = getProviderOptions({ + provider: 'openai', + modelName: 'gpt-5.4-mini', + feature: FEATURE_SCROLL_INDEX_GROUNDER, + features: { + 'scroll-index-grounder': { + model: 'google/gemini-2.0-flash', + reasoning: 'medium', + }, + }, + }); + + assert.deepEqual(providerOptions, { + google: { + thinkingConfig: { + thinkingLevel: 'medium', + includeThoughts: false, + }, + }, + }); +}); + +test('AIAgent rejects minimal reasoning on non-OpenAI provider', () => { + assert.throws( + () => + getProviderOptions({ + provider: 'google', + modelName: 'gemini-3.1-pro-preview', + feature: FEATURE_GROUNDER, + defaultReasoning: 'minimal', + }), + /Reasoning level "minimal" is only supported for OpenAI/, + ); +}); + +test('AIAgent accepts minimal reasoning on OpenAI', () => { + const providerOptions = getProviderOptions({ + provider: 'openai', + modelName: 'gpt-5.4-mini', + feature: FEATURE_GROUNDER, + defaultReasoning: 'minimal', + }); + + assert.deepEqual(providerOptions, { + openai: { + reasoningEffort: 'minimal', + }, + }); +}); + test('AIAgent normalizes rotate planner actions', () => { const response = parsePlannerResponse({ output: { @@ -371,14 +489,6 @@ test('AIAgent rejects grounder responses that are not JSON objects', () => { type MockLLMResult = { output: unknown; text: string }; -function makeAgent(): AIAgent { - return new AIAgent({ - provider: 'google', - modelName: 'gemini-test', - apiKey: 'test-key', - }); -} - function installMockCallLLM( agent: AIAgent, results: Array, @@ -389,7 +499,7 @@ function installMockCallLLM( _callLLM: ( systemPrompt: string, userParts: unknown[], - phase: LLMPhase, + feature: FeatureName, ) => Promise; } )._callLLM = async () => { diff --git a/packages/goal-executor/src/ai/AIAgent.ts b/packages/goal-executor/src/ai/AIAgent.ts index 0d90b61..7d24eb2 100644 --- a/packages/goal-executor/src/ai/AIAgent.ts +++ b/packages/goal-executor/src/ai/AIAgent.ts @@ -43,6 +43,10 @@ import { PLANNER_ACTION_COMPLETED, PLANNER_ACTION_FAILED, PLANNER_ACTION_DEEPLINK, + type FeatureName, + type FeatureOverrides, + type ModelDefaults, + type ReasoningLevel, } from '@finalrun/common'; import { describeLLMTrace, @@ -121,6 +125,23 @@ type AIAgentProviderOptions = { anthropic?: AnthropicLanguageModelOptions; }; +interface ResolvedFeatureConfig { + provider: string; + modelName: string; + reasoning: ReasoningLevel; +} + +/** Fallback reasoning levels used when neither feature override nor workspace default is set. */ +const DEFAULT_REASONING_BY_PHASE: Record = { + planner: 'medium', + grounder: 'low', +}; + +/** Map a feature to its phase (controls token budget + default reasoning). */ +function phaseForFeature(feature: FeatureName): LLMPhase { + return feature === FEATURE_PLANNER ? 'planner' : 'grounder'; +} + const MAX_LLM_ATTEMPTS = 2; // ============================================================================ @@ -134,17 +155,24 @@ const MAX_LLM_ATTEMPTS = 2; * Dart equivalent: FinalRunAgent in goal_executor/lib/src/FinalRunAgent.dart */ export class AIAgent { - private _provider: string; // e.g., 'openai', 'google', 'anthropic' - private _modelName: string; // e.g., 'gpt-5.4-mini', 'gemini-2.0-flash' - private _apiKey: string; + private _apiKeys: Record; + private _defaults: ModelDefaults; + private _features: FeatureOverrides; // Cached prompt contents private _promptCache: Map = new Map(); - - constructor(params: { provider: string; modelName: string; apiKey: string }) { - this._provider = params.provider; - this._modelName = params.modelName; - this._apiKey = params.apiKey; + // Cached Vercel AI SDK clients, keyed by provider + // eslint-disable-next-line @typescript-eslint/no-explicit-any + private _clientCache: Map = new Map(); + + constructor(params: { + apiKeys: Record; + defaults: ModelDefaults; + features?: FeatureOverrides; + }) { + this._apiKeys = params.apiKeys; + this._defaults = params.defaults; + this._features = params.features ?? {}; } /** @@ -205,18 +233,19 @@ export class AIAgent { let llmMs = 0; let parseMs = 0; + const plannerResolved = this._resolveFeatureConfig(FEATURE_PLANNER); for (let attempt = 1; attempt <= maxAttempts; attempt++) { const llmPhase = startTracePhase( request.traceStep, 'planning.llm', - `provider=${this._provider} model=${this._modelName} attempt=${attempt}/${maxAttempts}`, + `provider=${plannerResolved.provider} model=${plannerResolved.modelName} attempt=${attempt}/${maxAttempts}`, ); const llmStartedAt = performance.now(); let rawOutput: unknown; let rawText: string; try { - const llmResult = await this._callLLM(systemPrompt, userParts, 'planner'); + const llmResult = await this._callLLM(systemPrompt, userParts, FEATURE_PLANNER); rawOutput = llmResult.output; rawText = llmResult.text; } catch (error) { @@ -362,7 +391,11 @@ export class AIAgent { let rawOutput: unknown; let rawText: string; try { - const llmResult = await this._callLLM(systemPrompt, userParts, 'grounder'); + const llmResult = await this._callLLM( + systemPrompt, + userParts, + request.feature as FeatureName, + ); rawOutput = llmResult.output; rawText = llmResult.text; } catch (error) { @@ -461,10 +494,12 @@ export class AIAgent { private async _callLLM( systemPrompt: string, userParts: Array<{ type: 'text'; text: string } | { type: 'image'; image: string }>, - phase: LLMPhase, + feature: FeatureName, ): Promise<{ output: unknown; text: string }> { - const model = this._getModel(); - const providerOptions = this._getProviderOptions(phase); + const resolved = this._resolveFeatureConfig(feature); + const model = this._getModel(resolved); + const providerOptions = this._getProviderOptions(resolved, feature); + const phase = phaseForFeature(feature); // eslint-disable-next-line @typescript-eslint/no-explicit-any const userContent: any[] = userParts.map((part) => { @@ -494,68 +529,119 @@ export class AIAgent { } catch (error) { throw ( classifyFatalProviderError(error, { - provider: this._provider, - modelName: this._modelName, + provider: resolved.provider, + modelName: resolved.modelName, }) ?? error ); } if (reasoningText) { Logger.d( - `LLM reasoning [${phase}] (${this._provider}/${this._modelName}):\n${reasoningText}`, + `LLM reasoning [${feature}] (${resolved.provider}/${resolved.modelName}):\n${reasoningText}`, ); } Logger.d( - `LLM response [${phase}] (${this._provider}/${this._modelName}):\n${text || ''}`, + `LLM response [${feature}] (${resolved.provider}/${resolved.modelName}):\n${text || ''}`, ); return { output, text }; } /** - * Create the appropriate Vercel AI SDK model instance. + * Resolve the effective provider / model / reasoning for a feature by + * merging the optional per-feature override on top of workspace defaults. + */ + private _resolveFeatureConfig(feature: FeatureName): ResolvedFeatureConfig { + const override = this._features[feature]; + let provider = this._defaults.provider; + let modelName = this._defaults.modelName; + if (override?.model) { + const slash = override.model.indexOf('/'); + if (slash <= 0 || slash === override.model.length - 1) { + throw new Error( + `Invalid model override for feature "${feature}": "${override.model}". Expected provider/model.`, + ); + } + provider = override.model.slice(0, slash).trim(); + modelName = override.model.slice(slash + 1).trim(); + } + const reasoning: ReasoningLevel = + override?.reasoning ?? this._defaults.reasoning ?? DEFAULT_REASONING_BY_PHASE[phaseForFeature(feature)]; + return { provider, modelName, reasoning }; + } + + /** + * Create (or reuse a cached) Vercel AI SDK model instance for the + * resolved provider/modelName. */ // eslint-disable-next-line @typescript-eslint/no-explicit-any - private _getModel(): any { - switch (this._provider) { + private _getModel(resolved: ResolvedFeatureConfig): any { + const cacheKey = `${resolved.provider}/${resolved.modelName}`; + const cached = this._clientCache.get(cacheKey); + if (cached) { + return cached; + } + const apiKey = this._apiKeys[resolved.provider]; + if (!apiKey) { + throw new Error( + `Missing API key for provider "${resolved.provider}". Set the corresponding env var (e.g. OPENAI_API_KEY, GOOGLE_API_KEY, ANTHROPIC_API_KEY).`, + ); + } + let client: unknown; + switch (resolved.provider) { case 'openai': { - const openai = createOpenAI({ apiKey: this._apiKey }); - return openai(this._modelName); + const openai = createOpenAI({ apiKey }); + client = openai(resolved.modelName); + break; } case 'google': { - const google = createGoogleGenerativeAI({ apiKey: this._apiKey }); - return google(this._modelName); + const google = createGoogleGenerativeAI({ apiKey }); + client = google(resolved.modelName); + break; } case 'anthropic': { - const anthropic = createAnthropic({ apiKey: this._apiKey }); - return anthropic(this._modelName); + const anthropic = createAnthropic({ apiKey }); + client = anthropic(resolved.modelName); + break; } default: - throw new Error(`Unsupported AI provider: ${this._provider}`); + throw new Error(`Unsupported AI provider: ${resolved.provider}`); } + this._clientCache.set(cacheKey, client); + return client; } - private _getProviderOptions(phase: LLMPhase): AIAgentProviderOptions | undefined { - switch (this._provider) { - case 'google': + private _getProviderOptions( + resolved: ResolvedFeatureConfig, + feature: FeatureName, + ): AIAgentProviderOptions | undefined { + const { provider, reasoning } = resolved; + if (reasoning === 'minimal' && provider !== 'openai') { + throw new Error( + `Reasoning level "minimal" is only supported for OpenAI. Feature "${feature}" is configured for provider "${provider}".`, + ); + } + switch (provider) { + case 'google': { return { google: { thinkingConfig: { - thinkingLevel: phase === 'planner' ? 'high' : 'medium', + thinkingLevel: reasoning as 'low' | 'medium' | 'high', includeThoughts: false, }, } satisfies GoogleLanguageModelOptions, }; + } case 'openai': return { openai: { - reasoningEffort: phase === 'planner' ? 'medium' : 'low', + reasoningEffort: reasoning, } satisfies OpenAILanguageModelResponsesOptions, }; case 'anthropic': return { anthropic: { - effort: phase === 'planner' ? 'medium' : 'low', + effort: reasoning as 'low' | 'medium' | 'high', } satisfies AnthropicLanguageModelOptions, }; default: From 518a77c9e29057a90c02123e2c4478d92d4a8836 Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 20:41:49 -0700 Subject: [PATCH 12/80] fix: resolve per-feature provider/model in new summary logs post-merge The merge with main pulled in new _summarizePlannerRequest / _summarizeGrounderRequest helpers that referenced the old _provider and _modelName fields. Route them through _resolveFeatureConfig so the log line reflects the actual provider/model used for each feature. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/goal-executor/src/ai/AIAgent.ts | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/packages/goal-executor/src/ai/AIAgent.ts b/packages/goal-executor/src/ai/AIAgent.ts index b34d751..85183c1 100644 --- a/packages/goal-executor/src/ai/AIAgent.ts +++ b/packages/goal-executor/src/ai/AIAgent.ts @@ -762,7 +762,8 @@ export class AIAgent { private _summarizePlannerRequest(req: PlannerRequest): string { const parts: string[] = ['[AI plan]']; parts.push(this._formatLogContext(req.logContext, req.traceStep)); - parts.push(`provider=${this._provider}/${this._modelName}`); + const plannerResolved = this._resolveFeatureConfig(FEATURE_PLANNER); + parts.push(`provider=${plannerResolved.provider}/${plannerResolved.modelName}`); parts.push(this._screenshotMetric('screenshot', req.preActionScreenshot)); if (req.postActionScreenshot) { parts.push(this._screenshotMetric('postScreenshot', req.postActionScreenshot)); @@ -782,7 +783,8 @@ export class AIAgent { private _summarizeGrounderRequest(req: GrounderRequest): string { const parts: string[] = ['[AI ground]']; parts.push(this._formatLogContext(req.logContext, req.traceStep)); - parts.push(`provider=${this._provider}/${this._modelName}`); + const grounderResolved = this._resolveFeatureConfig(req.feature as FeatureName); + parts.push(`provider=${grounderResolved.provider}/${grounderResolved.modelName}`); parts.push(`feature=${req.feature}`); parts.push(this._screenshotMetric('screenshot', req.screenshot)); const hierarchyCount = req.hierarchy From 38c4aff9ccd5b8dab7e2dfdeb2ab4226f9b3e404 Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 20:58:28 -0700 Subject: [PATCH 13/80] docs: spell out supported config shapes and per-provider reasoning levels Add three labeled recipes (single model, per-feature effort tuning, mixed providers), a provider/reasoning matrix so minimal is visible as OpenAI-only before runtime, and the planner=medium / grounder=low fallback when reasoning is omitted. Cross-link from cli-reference.md so users reading only the CLI flags discover the YAML surface. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/cli-reference.md | 2 ++ docs/configuration.md | 69 +++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 69 insertions(+), 2 deletions(-) diff --git a/docs/cli-reference.md b/docs/cli-reference.md index 5669c47..6041397 100644 --- a/docs/cli-reference.md +++ b/docs/cli-reference.md @@ -32,6 +32,8 @@ Flags for `test` and `suite`: CLI flags always take precedence over `.finalrun/config.yaml`. +For workspace-level `model`, `reasoning`, and per-feature `features:` overrides (including mixed-provider setups), see [configuration.md](configuration.md#supported-configurations). + ### Examples ```sh diff --git a/docs/configuration.md b/docs/configuration.md index e136032..7bfd7d3 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -59,6 +59,73 @@ features: reasoning: low ``` +### Supported Providers + +Use any of these prefixes in the `provider/model` format: + +- `openai/` (e.g. `openai/gpt-5.4-mini`) +- `google/` (e.g. `google/gemini-3-flash-preview`) +- `anthropic/` (e.g. `anthropic/claude-opus-4-7`) + +Model names are passed straight to the provider — consult the provider's docs for which models accept reasoning effort. + +### Reasoning Levels by Provider + +| Provider | Accepted `reasoning` values | +|---|---| +| `openai` | `minimal`, `low`, `medium`, `high` | +| `google` | `low`, `medium`, `high` | +| `anthropic` | `low`, `medium`, `high` | + +Setting `reasoning: minimal` on a Google- or Anthropic-routed feature fails at run time with a message naming the offending feature. + +When neither workspace `reasoning:` nor a per-feature `reasoning:` is set, FinalRun applies built-in fallbacks: + +- `planner` → `medium` +- every grounder (`grounder`, `visual-grounder`, `scroll-index-grounder`, `input-focus-grounder`, `launch-app-grounder`, `set-location-grounder`) → `low` + +### Supported Configurations + +Three shapes are supported. Pick the simplest one that fits. + +**1. One model, one reasoning level (simplest).** Every feature uses the same model and effort: + +```yaml +model: openai/gpt-5.4-mini +reasoning: low +``` + +**2. Same provider, per-feature reasoning tuning.** One API key, one provider, but effort tuned per feature: + +```yaml +model: openai/gpt-5.4-mini +reasoning: low + +features: + planner: + reasoning: high # planner only — keeps the workspace model + scroll-index-grounder: + reasoning: minimal # cheap fast grounding + # unlisted features inherit model + reasoning from the top +``` + +**3. Mixed providers across features.** Different providers for different features: + +```yaml +model: google/gemini-3-flash-preview # default for anything unlisted +reasoning: medium + +features: + planner: + model: anthropic/claude-opus-4-7 + reasoning: high + grounder: + model: openai/gpt-5.4-mini + reasoning: minimal +``` + +Mixed-provider mode requires **every** referenced provider's env var to be set (`OPENAI_API_KEY`, `GOOGLE_API_KEY`, `ANTHROPIC_API_KEY` — see [environment.md](environment.md)). The `--api-key` CLI flag is rejected in this mode. + ### Per-Feature Overrides The `features:` block lets you tune each LLM call independently. Each feature drives a distinct prompt: @@ -70,8 +137,6 @@ The `features:` block lets you tune each LLM call independently. Each feature dr Both `model` and `reasoning` are optional per feature. Any unset field falls back to the workspace-level default (`model:` / `reasoning:`), and any unlisted feature inherits both defaults. -If features target **different providers** (e.g. planner on Anthropic, grounder on Google), you must set each provider's env var (`OPENAI_API_KEY`, `GOOGLE_API_KEY`, `ANTHROPIC_API_KEY`) — see [environment.md](environment.md). The `--api-key` CLI flag only works when a single provider is active across all features. - ## App Identity FinalRun needs to know which app to launch on the device. The app identity is resolved in this order: From c8ddcc7a009a595c2c93515fbd429dd6d7e8014d Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 21:20:41 -0700 Subject: [PATCH 14/80] fix(ai): enforce structured output on Anthropic path via zod schema MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Vercel AI SDK's @ai-sdk/anthropic adapter silently drops Output.json() when no schema is supplied — Anthropic has no schema-less JSON mode and tool-calling is the only enforced path. Claude Sonnet 4.6 then free-writes multiple candidate JSONs, breaking safeParseJSON at the planner step. Add a per-feature zod schema (new schemas.ts), select it when the resolved provider is anthropic, and pass through Output.object({ schema }). OpenAI and Google keep their working schema-less Output.json() path untouched. Co-Authored-By: Claude Opus 4.7 (1M context) --- package-lock.json | 17 +- packages/goal-executor/package.json | 3 +- packages/goal-executor/src/ai/AIAgent.ts | 11 +- packages/goal-executor/src/ai/schemas.test.ts | 240 ++++++++++++++++++ packages/goal-executor/src/ai/schemas.ts | 209 +++++++++++++++ 5 files changed, 470 insertions(+), 10 deletions(-) create mode 100644 packages/goal-executor/src/ai/schemas.test.ts create mode 100644 packages/goal-executor/src/ai/schemas.ts diff --git a/package-lock.json b/package-lock.json index 748197f..c8b1c5d 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "finalrun-agent-monorepo", - "version": "0.1.5", + "version": "0.1.6", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "finalrun-agent-monorepo", - "version": "0.1.5", + "version": "0.1.6", "workspaces": [ "packages/common", "packages/device-node", @@ -3936,7 +3936,7 @@ }, "packages/cli": { "name": "@finalrun/finalrun-agent", - "version": "0.1.5", + "version": "0.1.6", "bundleDependencies": [ "@ai-sdk/anthropic", "@ai-sdk/google", @@ -4005,7 +4005,7 @@ }, "packages/common": { "name": "@finalrun/common", - "version": "0.1.5", + "version": "0.1.6", "license": "Apache-2.0", "devDependencies": { "typescript": "^5.7.0" @@ -4027,7 +4027,7 @@ }, "packages/device-node": { "name": "@finalrun/device-node", - "version": "0.1.5", + "version": "0.1.6", "license": "Apache-2.0", "dependencies": { "@finalrun/common": "*", @@ -4056,7 +4056,7 @@ }, "packages/goal-executor": { "name": "@finalrun/goal-executor", - "version": "0.1.5", + "version": "0.1.6", "license": "Apache-2.0", "dependencies": { "@ai-sdk/anthropic": "^3.0.58", @@ -4064,7 +4064,8 @@ "@ai-sdk/openai": "^3.0.47", "@finalrun/common": "*", "ai": "^6.0.134", - "uuid": "^11.1.0" + "uuid": "^11.1.0", + "zod": "^4.1.8" }, "devDependencies": { "typescript": "^5.7.0" @@ -4086,7 +4087,7 @@ }, "packages/report-web": { "name": "@finalrun/report-web", - "version": "0.1.5", + "version": "0.1.6", "dependencies": { "@finalrun/common": "*", "next": "^16.2.1", diff --git a/packages/goal-executor/package.json b/packages/goal-executor/package.json index 4bb75d3..1250d00 100644 --- a/packages/goal-executor/package.json +++ b/packages/goal-executor/package.json @@ -21,7 +21,8 @@ "@ai-sdk/openai": "^3.0.47", "@ai-sdk/google": "^3.0.43", "@ai-sdk/anthropic": "^3.0.58", - "uuid": "^11.1.0" + "uuid": "^11.1.0", + "zod": "^4.1.8" }, "devDependencies": { "typescript": "^5.7.0" diff --git a/packages/goal-executor/src/ai/AIAgent.ts b/packages/goal-executor/src/ai/AIAgent.ts index 85183c1..8c7b9d9 100644 --- a/packages/goal-executor/src/ai/AIAgent.ts +++ b/packages/goal-executor/src/ai/AIAgent.ts @@ -59,6 +59,7 @@ import { type LLMTrace, } from '../trace.js'; import { classifyFatalProviderError, FatalProviderError } from './providerFailure.js'; +import { schemaForFeature } from './schemas.js'; // ============================================================================ // Types @@ -536,7 +537,15 @@ export class AIAgent { { role: 'system', content: systemPrompt }, { role: 'user', content: userContent }, ], - output: Output.json(), + // Anthropic has no schema-less JSON mode — the @ai-sdk/anthropic + // adapter drops responseFormat silently without a schema, letting + // Claude free-write multiple candidate JSONs. Passing a schema routes + // the call through Anthropic's tool-use API for enforced structured + // output. OpenAI and Google keep their working schema-less paths. + output: + resolved.provider === 'anthropic' + ? Output.object({ schema: schemaForFeature(feature) }) + : Output.json(), maxOutputTokens: phase === 'planner' ? 8192 : 4096, providerOptions, }); diff --git a/packages/goal-executor/src/ai/schemas.test.ts b/packages/goal-executor/src/ai/schemas.test.ts new file mode 100644 index 0000000..a7b877a --- /dev/null +++ b/packages/goal-executor/src/ai/schemas.test.ts @@ -0,0 +1,240 @@ +import assert from 'node:assert/strict'; +import test from 'node:test'; +import { + FEATURE_GROUNDER, + FEATURE_INPUT_FOCUS_GROUNDER, + FEATURE_LAUNCH_APP_GROUNDER, + FEATURE_PLANNER, + FEATURE_SCROLL_INDEX_GROUNDER, + FEATURE_SET_LOCATION_GROUNDER, + FEATURE_VISUAL_GROUNDER, +} from '@finalrun/common'; +import { PLANNER_SCHEMA, schemaForFeature } from './schemas.js'; + +// ---------------------------------------------------------------------------- +// Planner +// ---------------------------------------------------------------------------- + +test('planner schema accepts the canonical wait example from planner.md', () => { + const payload = { + output: { + thought: { + plan: '[→ Wait for app to load]', + think: 'App is on splash screen; need to wait.', + act: 'Wait 5 seconds for the app to load.', + }, + action: { action_type: 'wait', duration: 5 }, + remember: [], + }, + }; + assert.equal(PLANNER_SCHEMA.safeParse(payload).success, true); +}); + +test('planner schema accepts each documented action_type', () => { + const types = [ + 'tap', + 'long_press', + 'input_text', + 'swipe', + 'navigate_back', + 'navigate_home', + 'rotate', + 'hide_keyboard', + 'keyboard_enter', + 'wait', + 'deep_link', + 'set_location', + 'launch_app', + 'status', + ]; + for (const t of types) { + const payload = { + output: { action: { action_type: t } }, + }; + assert.equal( + PLANNER_SCHEMA.safeParse(payload).success, + true, + `expected ${t} to be accepted`, + ); + } +}); + +test('planner schema accepts passthrough action fields (repeat, delay_between_tap)', () => { + const payload = { + output: { + action: { + action_type: 'tap', + repeat: 3, + delay_between_tap: 1000, + }, + remember: [], + }, + }; + assert.equal(PLANNER_SCHEMA.safeParse(payload).success, true); +}); + +test('planner schema rejects an unknown action_type', () => { + const payload = { + output: { action: { action_type: 'click' } }, + }; + const result = PLANNER_SCHEMA.safeParse(payload); + assert.equal(result.success, false); +}); + +test('planner schema rejects when the top-level output wrapper is missing', () => { + const payload = { action: { action_type: 'tap' } }; + assert.equal(PLANNER_SCHEMA.safeParse(payload).success, false); +}); + +// ---------------------------------------------------------------------------- +// Grounder — each feature's success and error shapes +// ---------------------------------------------------------------------------- + +test('grounder schema accepts index match, needsVisualGrounding, and error variants', () => { + const schema = schemaForFeature(FEATURE_GROUNDER); + assert.equal( + schema.safeParse({ output: { index: 5, reason: 'match' } }).success, + true, + ); + assert.equal( + schema.safeParse({ + output: { needsVisualGrounding: true, reason: 'not in list' }, + }).success, + true, + ); + assert.equal( + schema.safeParse({ output: { isError: true, reason: 'not visible' } }) + .success, + true, + ); +}); + +test('input-focus grounder schema accepts index, x/y, null-index, and error', () => { + const schema = schemaForFeature(FEATURE_INPUT_FOCUS_GROUNDER); + assert.equal( + schema.safeParse({ output: { index: 42, reason: 'match' } }).success, + true, + ); + assert.equal( + schema.safeParse({ output: { index: null, reason: 'already focused' } }) + .success, + true, + ); + assert.equal( + schema.safeParse({ output: { x: 100, y: 200, reason: 'derived' } }) + .success, + true, + ); + assert.equal( + schema.safeParse({ output: { isError: true, reason: 'not found' } }) + .success, + true, + ); +}); + +test('visual grounder schema accepts coordinates and error', () => { + const schema = schemaForFeature(FEATURE_VISUAL_GROUNDER); + assert.equal( + schema.safeParse({ output: { x: 540, y: 1200, reason: 'center of label' } }) + .success, + true, + ); + assert.equal( + schema.safeParse({ output: { isError: true, reason: 'not visible' } }) + .success, + true, + ); +}); + +test('scroll-index grounder schema accepts swipe vector and error', () => { + const schema = schemaForFeature(FEATURE_SCROLL_INDEX_GROUNDER); + assert.equal( + schema.safeParse({ + output: { + start_x: 540, + start_y: 1800, + end_x: 540, + end_y: 400, + durationMs: 600, + reason: 'swipe up', + }, + }).success, + true, + ); + assert.equal( + schema.safeParse({ output: { isError: true, reason: 'no container' } }) + .success, + true, + ); +}); + +test('launch-app grounder schema accepts minimal and full payloads', () => { + const schema = schemaForFeature(FEATURE_LAUNCH_APP_GROUNDER); + assert.equal( + schema.safeParse({ + output: { packageName: 'com.whatsapp', reason: 'exact match' }, + }).success, + true, + ); + assert.equal( + schema.safeParse({ + output: { + packageName: 'com.example.myapp', + clearState: true, + allowAllPermissions: false, + permissions: { camera: 'allow', photos: 'allow' }, + reason: 'full config', + }, + }).success, + true, + ); + assert.equal( + schema.safeParse({ output: { isError: true, reason: 'not found' } }) + .success, + true, + ); +}); + +test('set-location grounder schema accepts string coords and error', () => { + const schema = schemaForFeature(FEATURE_SET_LOCATION_GROUNDER); + assert.equal( + schema.safeParse({ + output: { lat: '37.7749', long: '-122.4194', reason: 'SF' }, + }).success, + true, + ); + assert.equal( + schema.safeParse({ output: { isError: true, reason: 'unresolved' } }) + .success, + true, + ); +}); + +test('set-location grounder schema rejects numeric lat/long (spec requires strings)', () => { + const schema = schemaForFeature(FEATURE_SET_LOCATION_GROUNDER); + assert.equal( + schema.safeParse({ + output: { lat: 37.7749, long: -122.4194, reason: 'numeric' }, + }).success, + false, + ); +}); + +// ---------------------------------------------------------------------------- +// Lookup +// ---------------------------------------------------------------------------- + +test('schemaForFeature returns a schema for every known feature', () => { + const features = [ + FEATURE_PLANNER, + FEATURE_GROUNDER, + FEATURE_VISUAL_GROUNDER, + FEATURE_SCROLL_INDEX_GROUNDER, + FEATURE_INPUT_FOCUS_GROUNDER, + FEATURE_LAUNCH_APP_GROUNDER, + FEATURE_SET_LOCATION_GROUNDER, + ]; + for (const feature of features) { + assert.ok(schemaForFeature(feature), `missing schema for ${feature}`); + } +}); diff --git a/packages/goal-executor/src/ai/schemas.ts b/packages/goal-executor/src/ai/schemas.ts new file mode 100644 index 0000000..1a79d9b --- /dev/null +++ b/packages/goal-executor/src/ai/schemas.ts @@ -0,0 +1,209 @@ +// Zod schemas for LLM structured output. +// +// The Vercel AI SDK's Anthropic adapter (`@ai-sdk/anthropic`) cannot enforce +// JSON output without a schema — Anthropic has no schema-less JSON mode. When +// a schema is supplied, the adapter routes through Anthropic's tool-use API +// so Claude emits exactly one well-formed JSON object. +// +// OpenAI (`response_format: json_object`) and Google +// (`response_mime_type: application/json`) work schema-less today, so this +// file is only consumed on the Anthropic call path in `AIAgent._callLLM`. +// +// Each schema mirrors the corresponding prompt in `src/prompts/*.md`. When +// a prompt changes, update the matching schema here. + +import { z } from 'zod'; +import { + FEATURE_GROUNDER, + FEATURE_INPUT_FOCUS_GROUNDER, + FEATURE_LAUNCH_APP_GROUNDER, + FEATURE_PLANNER, + FEATURE_SCROLL_INDEX_GROUNDER, + FEATURE_SET_LOCATION_GROUNDER, + FEATURE_VISUAL_GROUNDER, + type FeatureName, +} from '@finalrun/common'; + +// ---------------------------------------------------------------------------- +// Planner — canonical shape from `prompts/planner.md` +// ---------------------------------------------------------------------------- + +const PLANNER_ACTION_TYPES = [ + 'tap', + 'long_press', + 'input_text', + 'swipe', + 'navigate_back', + 'navigate_home', + 'rotate', + 'hide_keyboard', + 'keyboard_enter', + 'wait', + 'deep_link', + 'set_location', + 'launch_app', + 'status', +] as const; + +const plannerActionSchema = z + .object({ + action_type: z.enum(PLANNER_ACTION_TYPES), + }) + .passthrough(); + +const plannerThoughtSchema = z + .object({ + plan: z.string().optional(), + think: z.string().optional(), + act: z.string().optional(), + }) + .passthrough(); + +export const PLANNER_SCHEMA = z.object({ + output: z.object({ + thought: plannerThoughtSchema.optional(), + action: plannerActionSchema, + remember: z.array(z.string()).optional(), + }), +}); + +// ---------------------------------------------------------------------------- +// Grounder — per-feature shapes from the grounder prompt files +// ---------------------------------------------------------------------------- + +const errorOutputSchema = z.object({ + isError: z.literal(true), + reason: z.string(), +}); + +// `FEATURE_GROUNDER` — `prompts/grounder.md` +// Three success variants: visual-fallback, index match, or error. +const grounderSchema = z.object({ + output: z.union([ + errorOutputSchema, + z + .object({ + needsVisualGrounding: z.literal(true), + reason: z.string(), + }) + .passthrough(), + z + .object({ + index: z.number().int(), + reason: z.string().optional(), + }) + .passthrough(), + ]), +}); + +// `FEATURE_INPUT_FOCUS_GROUNDER` — `prompts/input-focus-grounder.md` +// Variants: index match, null index (already focused), x/y coords, or error. +const inputFocusGrounderSchema = z.object({ + output: z.union([ + errorOutputSchema, + z + .object({ + index: z.number().int().nullable(), + reason: z.string().optional(), + }) + .passthrough(), + z + .object({ + x: z.number().int(), + y: z.number().int(), + reason: z.string().optional(), + }) + .passthrough(), + ]), +}); + +// `FEATURE_VISUAL_GROUNDER` — `prompts/visual-grounder.md` +const visualGrounderSchema = z.object({ + output: z.union([ + errorOutputSchema, + z + .object({ + x: z.number().int(), + y: z.number().int(), + reason: z.string().optional(), + }) + .passthrough(), + ]), +}); + +// `FEATURE_SCROLL_INDEX_GROUNDER` — `prompts/scroll-grounder.md` +const scrollIndexGrounderSchema = z.object({ + output: z.union([ + errorOutputSchema, + z + .object({ + start_x: z.number(), + start_y: z.number(), + end_x: z.number(), + end_y: z.number(), + durationMs: z.number(), + reason: z.string().optional(), + }) + .passthrough(), + ]), +}); + +// `FEATURE_LAUNCH_APP_GROUNDER` — `prompts/launch-app-grounder.md` +// Keep permissions and arguments as permissive records; the prompt documents +// free-form values. +const launchAppGrounderSchema = z.object({ + output: z.union([ + errorOutputSchema, + z + .object({ + packageName: z.string(), + reason: z.string().optional(), + clearState: z.boolean().optional(), + allowAllPermissions: z.boolean().optional(), + stopAppBeforeLaunch: z.boolean().optional(), + shouldUninstallBeforeLaunch: z.boolean().optional(), + permissions: z.record(z.string(), z.string()).optional(), + arguments: z.record(z.string(), z.string()).optional(), + }) + .passthrough(), + ]), +}); + +// `FEATURE_SET_LOCATION_GROUNDER` — `prompts/set-location-grounder.md` +// lat/long are strings by spec (4-6 decimal places). +const setLocationGrounderSchema = z.object({ + output: z.union([ + errorOutputSchema, + z + .object({ + lat: z.string(), + long: z.string(), + reason: z.string().optional(), + }) + .passthrough(), + ]), +}); + +// ---------------------------------------------------------------------------- +// Lookup +// ---------------------------------------------------------------------------- + +// eslint-disable-next-line @typescript-eslint/no-explicit-any +const FEATURE_SCHEMAS: Record> = { + [FEATURE_PLANNER]: PLANNER_SCHEMA, + [FEATURE_GROUNDER]: grounderSchema, + [FEATURE_INPUT_FOCUS_GROUNDER]: inputFocusGrounderSchema, + [FEATURE_VISUAL_GROUNDER]: visualGrounderSchema, + [FEATURE_SCROLL_INDEX_GROUNDER]: scrollIndexGrounderSchema, + [FEATURE_LAUNCH_APP_GROUNDER]: launchAppGrounderSchema, + [FEATURE_SET_LOCATION_GROUNDER]: setLocationGrounderSchema, +}; + +// eslint-disable-next-line @typescript-eslint/no-explicit-any +export function schemaForFeature(feature: FeatureName): z.ZodType { + const schema = FEATURE_SCHEMAS[feature]; + if (!schema) { + throw new Error(`No schema registered for feature "${feature}".`); + } + return schema; +} From 8c629a7ac303d9a49bf98c77eb4f3a62f2e53eba Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 21:33:17 -0700 Subject: [PATCH 15/80] fix(ai): drop .int() from Anthropic schemas to satisfy tool-schema validator MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Anthropic's tool-schema validator rejects `minimum`/`maximum` keywords on the `integer` type, and zod v4's .int() emits safe-integer bounds by default — producing HTTP 400: output_config.format.schema: For 'integer' type, properties maximum, minimum are not supported Switch index/x/y fields to plain z.number(). Downstream parsers already coerce where needed. Document the constraint inline so nobody reintroduces .int(). Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/goal-executor/src/ai/schemas.test.ts | 21 ++++++++++--------- packages/goal-executor/src/ai/schemas.ts | 17 +++++++++------ 2 files changed, 22 insertions(+), 16 deletions(-) diff --git a/packages/goal-executor/src/ai/schemas.test.ts b/packages/goal-executor/src/ai/schemas.test.ts index a7b877a..9e88567 100644 --- a/packages/goal-executor/src/ai/schemas.test.ts +++ b/packages/goal-executor/src/ai/schemas.test.ts @@ -1,6 +1,7 @@ import assert from 'node:assert/strict'; import test from 'node:test'; import { + ALL_FEATURES, FEATURE_GROUNDER, FEATURE_INPUT_FOCUS_GROUNDER, FEATURE_LAUNCH_APP_GROUNDER, @@ -225,16 +226,16 @@ test('set-location grounder schema rejects numeric lat/long (spec requires strin // ---------------------------------------------------------------------------- test('schemaForFeature returns a schema for every known feature', () => { - const features = [ - FEATURE_PLANNER, - FEATURE_GROUNDER, - FEATURE_VISUAL_GROUNDER, - FEATURE_SCROLL_INDEX_GROUNDER, - FEATURE_INPUT_FOCUS_GROUNDER, - FEATURE_LAUNCH_APP_GROUNDER, - FEATURE_SET_LOCATION_GROUNDER, - ]; - for (const feature of features) { + for (const feature of ALL_FEATURES) { assert.ok(schemaForFeature(feature), `missing schema for ${feature}`); } + // Silence unused-import warnings (these individual constants are tested + // implicitly through ALL_FEATURES, but kept explicit for readability). + void FEATURE_PLANNER; + void FEATURE_GROUNDER; + void FEATURE_VISUAL_GROUNDER; + void FEATURE_SCROLL_INDEX_GROUNDER; + void FEATURE_INPUT_FOCUS_GROUNDER; + void FEATURE_LAUNCH_APP_GROUNDER; + void FEATURE_SET_LOCATION_GROUNDER; }); diff --git a/packages/goal-executor/src/ai/schemas.ts b/packages/goal-executor/src/ai/schemas.ts index 1a79d9b..b0fbdbb 100644 --- a/packages/goal-executor/src/ai/schemas.ts +++ b/packages/goal-executor/src/ai/schemas.ts @@ -76,6 +76,11 @@ const errorOutputSchema = z.object({ reason: z.string(), }); +// Numeric fields use plain z.number() — Anthropic's tool-schema validator +// rejects `minimum`/`maximum` keywords on the `integer` type, and zod v4's +// .int() emits those bounds by default. Downstream parsers already coerce +// to integers where needed (ActionExecutor + GrounderResponseConverter). + // `FEATURE_GROUNDER` — `prompts/grounder.md` // Three success variants: visual-fallback, index match, or error. const grounderSchema = z.object({ @@ -89,7 +94,7 @@ const grounderSchema = z.object({ .passthrough(), z .object({ - index: z.number().int(), + index: z.number(), reason: z.string().optional(), }) .passthrough(), @@ -103,14 +108,14 @@ const inputFocusGrounderSchema = z.object({ errorOutputSchema, z .object({ - index: z.number().int().nullable(), + index: z.number().nullable(), reason: z.string().optional(), }) .passthrough(), z .object({ - x: z.number().int(), - y: z.number().int(), + x: z.number(), + y: z.number(), reason: z.string().optional(), }) .passthrough(), @@ -123,8 +128,8 @@ const visualGrounderSchema = z.object({ errorOutputSchema, z .object({ - x: z.number().int(), - y: z.number().int(), + x: z.number(), + y: z.number(), reason: z.string().optional(), }) .passthrough(), From 4d7721f1b1ee03cb5d2e2ebcdba70216a4b1c9f8 Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 21:45:18 -0700 Subject: [PATCH 16/80] fix(ai): drop outer output wrapper from Anthropic schemas MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The prompts tell the model to emit {"output": {...}} as a text-JSON convention for OpenAI/Google. Duplicating that in the zod schema meant Claude, asked to satisfy both, nested twice: {"output":{"output":{"thought":...,"action":...,"remember":[]}}} Vercel AI SDK docs confirm: a schema passed to Output.object({ schema }) describes the structured data directly — tool inputs/output_format payloads are not wrapped. Flatten every schema to its inner shape. The planner and grounder parsers already accept both wrapped (OpenAI and Google) and unwrapped (Anthropic) shapes via their existing fallback branches, so no parser changes. Add a test that rejects a payload with an outer output wrapper so the bug can't creep back. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/goal-executor/src/ai/schemas.test.ts | 130 +++++------ packages/goal-executor/src/ai/schemas.ts | 205 +++++++++--------- 2 files changed, 156 insertions(+), 179 deletions(-) diff --git a/packages/goal-executor/src/ai/schemas.test.ts b/packages/goal-executor/src/ai/schemas.test.ts index 9e88567..3d19128 100644 --- a/packages/goal-executor/src/ai/schemas.test.ts +++ b/packages/goal-executor/src/ai/schemas.test.ts @@ -12,21 +12,22 @@ import { } from '@finalrun/common'; import { PLANNER_SCHEMA, schemaForFeature } from './schemas.js'; +// Schemas describe the inner shape directly — no outer `output` wrapper. +// See schemas.ts for why. + // ---------------------------------------------------------------------------- // Planner // ---------------------------------------------------------------------------- test('planner schema accepts the canonical wait example from planner.md', () => { const payload = { - output: { - thought: { - plan: '[→ Wait for app to load]', - think: 'App is on splash screen; need to wait.', - act: 'Wait 5 seconds for the app to load.', - }, - action: { action_type: 'wait', duration: 5 }, - remember: [], + thought: { + plan: '[→ Wait for app to load]', + think: 'App is on splash screen; need to wait.', + act: 'Wait 5 seconds for the app to load.', }, + action: { action_type: 'wait', duration: 5 }, + remember: [], }; assert.equal(PLANNER_SCHEMA.safeParse(payload).success, true); }); @@ -49,9 +50,7 @@ test('planner schema accepts each documented action_type', () => { 'status', ]; for (const t of types) { - const payload = { - output: { action: { action_type: t } }, - }; + const payload = { action: { action_type: t } }; assert.equal( PLANNER_SCHEMA.safeParse(payload).success, true, @@ -62,28 +61,30 @@ test('planner schema accepts each documented action_type', () => { test('planner schema accepts passthrough action fields (repeat, delay_between_tap)', () => { const payload = { - output: { - action: { - action_type: 'tap', - repeat: 3, - delay_between_tap: 1000, - }, - remember: [], + action: { + action_type: 'tap', + repeat: 3, + delay_between_tap: 1000, }, + remember: [], }; assert.equal(PLANNER_SCHEMA.safeParse(payload).success, true); }); test('planner schema rejects an unknown action_type', () => { - const payload = { - output: { action: { action_type: 'click' } }, - }; + const payload = { action: { action_type: 'click' } }; const result = PLANNER_SCHEMA.safeParse(payload); assert.equal(result.success, false); }); -test('planner schema rejects when the top-level output wrapper is missing', () => { - const payload = { action: { action_type: 'tap' } }; +test('planner schema rejects a payload with an outer output wrapper', () => { + // Guard against reintroducing the double-wrapping bug. + const payload = { + output: { + action: { action_type: 'tap' }, + remember: [], + }, + }; assert.equal(PLANNER_SCHEMA.safeParse(payload).success, false); }); @@ -93,19 +94,14 @@ test('planner schema rejects when the top-level output wrapper is missing', () = test('grounder schema accepts index match, needsVisualGrounding, and error variants', () => { const schema = schemaForFeature(FEATURE_GROUNDER); + assert.equal(schema.safeParse({ index: 5, reason: 'match' }).success, true); assert.equal( - schema.safeParse({ output: { index: 5, reason: 'match' } }).success, - true, - ); - assert.equal( - schema.safeParse({ - output: { needsVisualGrounding: true, reason: 'not in list' }, - }).success, + schema.safeParse({ needsVisualGrounding: true, reason: 'not in list' }) + .success, true, ); assert.equal( - schema.safeParse({ output: { isError: true, reason: 'not visible' } }) - .success, + schema.safeParse({ isError: true, reason: 'not visible' }).success, true, ); }); @@ -113,22 +109,19 @@ test('grounder schema accepts index match, needsVisualGrounding, and error varia test('input-focus grounder schema accepts index, x/y, null-index, and error', () => { const schema = schemaForFeature(FEATURE_INPUT_FOCUS_GROUNDER); assert.equal( - schema.safeParse({ output: { index: 42, reason: 'match' } }).success, + schema.safeParse({ index: 42, reason: 'match' }).success, true, ); assert.equal( - schema.safeParse({ output: { index: null, reason: 'already focused' } }) - .success, + schema.safeParse({ index: null, reason: 'already focused' }).success, true, ); assert.equal( - schema.safeParse({ output: { x: 100, y: 200, reason: 'derived' } }) - .success, + schema.safeParse({ x: 100, y: 200, reason: 'derived' }).success, true, ); assert.equal( - schema.safeParse({ output: { isError: true, reason: 'not found' } }) - .success, + schema.safeParse({ isError: true, reason: 'not found' }).success, true, ); }); @@ -136,13 +129,11 @@ test('input-focus grounder schema accepts index, x/y, null-index, and error', () test('visual grounder schema accepts coordinates and error', () => { const schema = schemaForFeature(FEATURE_VISUAL_GROUNDER); assert.equal( - schema.safeParse({ output: { x: 540, y: 1200, reason: 'center of label' } }) - .success, + schema.safeParse({ x: 540, y: 1200, reason: 'center of label' }).success, true, ); assert.equal( - schema.safeParse({ output: { isError: true, reason: 'not visible' } }) - .success, + schema.safeParse({ isError: true, reason: 'not visible' }).success, true, ); }); @@ -151,20 +142,17 @@ test('scroll-index grounder schema accepts swipe vector and error', () => { const schema = schemaForFeature(FEATURE_SCROLL_INDEX_GROUNDER); assert.equal( schema.safeParse({ - output: { - start_x: 540, - start_y: 1800, - end_x: 540, - end_y: 400, - durationMs: 600, - reason: 'swipe up', - }, + start_x: 540, + start_y: 1800, + end_x: 540, + end_y: 400, + durationMs: 600, + reason: 'swipe up', }).success, true, ); assert.equal( - schema.safeParse({ output: { isError: true, reason: 'no container' } }) - .success, + schema.safeParse({ isError: true, reason: 'no container' }).success, true, ); }); @@ -172,26 +160,22 @@ test('scroll-index grounder schema accepts swipe vector and error', () => { test('launch-app grounder schema accepts minimal and full payloads', () => { const schema = schemaForFeature(FEATURE_LAUNCH_APP_GROUNDER); assert.equal( - schema.safeParse({ - output: { packageName: 'com.whatsapp', reason: 'exact match' }, - }).success, + schema.safeParse({ packageName: 'com.whatsapp', reason: 'exact match' }) + .success, true, ); assert.equal( schema.safeParse({ - output: { - packageName: 'com.example.myapp', - clearState: true, - allowAllPermissions: false, - permissions: { camera: 'allow', photos: 'allow' }, - reason: 'full config', - }, + packageName: 'com.example.myapp', + clearState: true, + allowAllPermissions: false, + permissions: { camera: 'allow', photos: 'allow' }, + reason: 'full config', }).success, true, ); assert.equal( - schema.safeParse({ output: { isError: true, reason: 'not found' } }) - .success, + schema.safeParse({ isError: true, reason: 'not found' }).success, true, ); }); @@ -199,14 +183,12 @@ test('launch-app grounder schema accepts minimal and full payloads', () => { test('set-location grounder schema accepts string coords and error', () => { const schema = schemaForFeature(FEATURE_SET_LOCATION_GROUNDER); assert.equal( - schema.safeParse({ - output: { lat: '37.7749', long: '-122.4194', reason: 'SF' }, - }).success, + schema.safeParse({ lat: '37.7749', long: '-122.4194', reason: 'SF' }) + .success, true, ); assert.equal( - schema.safeParse({ output: { isError: true, reason: 'unresolved' } }) - .success, + schema.safeParse({ isError: true, reason: 'unresolved' }).success, true, ); }); @@ -214,9 +196,8 @@ test('set-location grounder schema accepts string coords and error', () => { test('set-location grounder schema rejects numeric lat/long (spec requires strings)', () => { const schema = schemaForFeature(FEATURE_SET_LOCATION_GROUNDER); assert.equal( - schema.safeParse({ - output: { lat: 37.7749, long: -122.4194, reason: 'numeric' }, - }).success, + schema.safeParse({ lat: 37.7749, long: -122.4194, reason: 'numeric' }) + .success, false, ); }); @@ -229,8 +210,7 @@ test('schemaForFeature returns a schema for every known feature', () => { for (const feature of ALL_FEATURES) { assert.ok(schemaForFeature(feature), `missing schema for ${feature}`); } - // Silence unused-import warnings (these individual constants are tested - // implicitly through ALL_FEATURES, but kept explicit for readability). + // Keep individual imports live so a future rename lands a compile error here. void FEATURE_PLANNER; void FEATURE_GROUNDER; void FEATURE_VISUAL_GROUNDER; diff --git a/packages/goal-executor/src/ai/schemas.ts b/packages/goal-executor/src/ai/schemas.ts index b0fbdbb..b0435bb 100644 --- a/packages/goal-executor/src/ai/schemas.ts +++ b/packages/goal-executor/src/ai/schemas.ts @@ -1,14 +1,25 @@ -// Zod schemas for LLM structured output. +// Zod schemas for LLM structured output on the Anthropic path. // // The Vercel AI SDK's Anthropic adapter (`@ai-sdk/anthropic`) cannot enforce // JSON output without a schema — Anthropic has no schema-less JSON mode. When -// a schema is supplied, the adapter routes through Anthropic's tool-use API -// so Claude emits exactly one well-formed JSON object. +// a schema is supplied, the adapter routes through Anthropic's structured- +// output APIs (`output_format` or a `json` tool, depending on +// `structuredOutputMode`) so Claude emits exactly one well-formed JSON object. // // OpenAI (`response_format: json_object`) and Google // (`response_mime_type: application/json`) work schema-less today, so this // file is only consumed on the Anthropic call path in `AIAgent._callLLM`. // +// IMPORTANT — no outer `output` wrapper. +// The prompts tell the model to emit `{"output": {...}}` because OpenAI and +// Google are in text-JSON mode and the parsers look for that convention. +// On the Anthropic structured-output path, the schema IS the shape of the +// tool call arguments (or the `output_format` payload) — adding an `output` +// wrapper here causes Claude to nest twice: `{"output":{"output":{...}}}`. +// Schemas below describe the inner shape directly; `_parsePlannerResponse` +// and `_parseGrounderResponse` already accept both wrapped and unwrapped +// shapes via their fallback branches. +// // Each schema mirrors the corresponding prompt in `src/prompts/*.md`. When // a prompt changes, update the matching schema here. @@ -60,134 +71,120 @@ const plannerThoughtSchema = z .passthrough(); export const PLANNER_SCHEMA = z.object({ - output: z.object({ - thought: plannerThoughtSchema.optional(), - action: plannerActionSchema, - remember: z.array(z.string()).optional(), - }), + thought: plannerThoughtSchema.optional(), + action: plannerActionSchema, + remember: z.array(z.string()).optional(), }); // ---------------------------------------------------------------------------- // Grounder — per-feature shapes from the grounder prompt files // ---------------------------------------------------------------------------- -const errorOutputSchema = z.object({ - isError: z.literal(true), - reason: z.string(), -}); - // Numeric fields use plain z.number() — Anthropic's tool-schema validator // rejects `minimum`/`maximum` keywords on the `integer` type, and zod v4's // .int() emits those bounds by default. Downstream parsers already coerce // to integers where needed (ActionExecutor + GrounderResponseConverter). +const errorShape = z.object({ + isError: z.literal(true), + reason: z.string(), +}); + // `FEATURE_GROUNDER` — `prompts/grounder.md` // Three success variants: visual-fallback, index match, or error. -const grounderSchema = z.object({ - output: z.union([ - errorOutputSchema, - z - .object({ - needsVisualGrounding: z.literal(true), - reason: z.string(), - }) - .passthrough(), - z - .object({ - index: z.number(), - reason: z.string().optional(), - }) - .passthrough(), - ]), -}); +const grounderSchema = z.union([ + errorShape, + z + .object({ + needsVisualGrounding: z.literal(true), + reason: z.string(), + }) + .passthrough(), + z + .object({ + index: z.number(), + reason: z.string().optional(), + }) + .passthrough(), +]); // `FEATURE_INPUT_FOCUS_GROUNDER` — `prompts/input-focus-grounder.md` // Variants: index match, null index (already focused), x/y coords, or error. -const inputFocusGrounderSchema = z.object({ - output: z.union([ - errorOutputSchema, - z - .object({ - index: z.number().nullable(), - reason: z.string().optional(), - }) - .passthrough(), - z - .object({ - x: z.number(), - y: z.number(), - reason: z.string().optional(), - }) - .passthrough(), - ]), -}); +const inputFocusGrounderSchema = z.union([ + errorShape, + z + .object({ + index: z.number().nullable(), + reason: z.string().optional(), + }) + .passthrough(), + z + .object({ + x: z.number(), + y: z.number(), + reason: z.string().optional(), + }) + .passthrough(), +]); // `FEATURE_VISUAL_GROUNDER` — `prompts/visual-grounder.md` -const visualGrounderSchema = z.object({ - output: z.union([ - errorOutputSchema, - z - .object({ - x: z.number(), - y: z.number(), - reason: z.string().optional(), - }) - .passthrough(), - ]), -}); +const visualGrounderSchema = z.union([ + errorShape, + z + .object({ + x: z.number(), + y: z.number(), + reason: z.string().optional(), + }) + .passthrough(), +]); // `FEATURE_SCROLL_INDEX_GROUNDER` — `prompts/scroll-grounder.md` -const scrollIndexGrounderSchema = z.object({ - output: z.union([ - errorOutputSchema, - z - .object({ - start_x: z.number(), - start_y: z.number(), - end_x: z.number(), - end_y: z.number(), - durationMs: z.number(), - reason: z.string().optional(), - }) - .passthrough(), - ]), -}); +const scrollIndexGrounderSchema = z.union([ + errorShape, + z + .object({ + start_x: z.number(), + start_y: z.number(), + end_x: z.number(), + end_y: z.number(), + durationMs: z.number(), + reason: z.string().optional(), + }) + .passthrough(), +]); // `FEATURE_LAUNCH_APP_GROUNDER` — `prompts/launch-app-grounder.md` // Keep permissions and arguments as permissive records; the prompt documents // free-form values. -const launchAppGrounderSchema = z.object({ - output: z.union([ - errorOutputSchema, - z - .object({ - packageName: z.string(), - reason: z.string().optional(), - clearState: z.boolean().optional(), - allowAllPermissions: z.boolean().optional(), - stopAppBeforeLaunch: z.boolean().optional(), - shouldUninstallBeforeLaunch: z.boolean().optional(), - permissions: z.record(z.string(), z.string()).optional(), - arguments: z.record(z.string(), z.string()).optional(), - }) - .passthrough(), - ]), -}); +const launchAppGrounderSchema = z.union([ + errorShape, + z + .object({ + packageName: z.string(), + reason: z.string().optional(), + clearState: z.boolean().optional(), + allowAllPermissions: z.boolean().optional(), + stopAppBeforeLaunch: z.boolean().optional(), + shouldUninstallBeforeLaunch: z.boolean().optional(), + permissions: z.record(z.string(), z.string()).optional(), + arguments: z.record(z.string(), z.string()).optional(), + }) + .passthrough(), +]); // `FEATURE_SET_LOCATION_GROUNDER` — `prompts/set-location-grounder.md` // lat/long are strings by spec (4-6 decimal places). -const setLocationGrounderSchema = z.object({ - output: z.union([ - errorOutputSchema, - z - .object({ - lat: z.string(), - long: z.string(), - reason: z.string().optional(), - }) - .passthrough(), - ]), -}); +const setLocationGrounderSchema = z.union([ + errorShape, + z + .object({ + lat: z.string(), + long: z.string(), + reason: z.string().optional(), + }) + .passthrough(), +]); // ---------------------------------------------------------------------------- // Lookup From 4b4069f1b4270a4f7923e9b5885fb5b30fb46fc7 Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 21:59:09 -0700 Subject: [PATCH 17/80] fix(ai): pin Anthropic to outputFormat structured-output mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The @ai-sdk/anthropic adapter's getModelCapabilities() carries a hardcoded list of which models support structured output. Newer models (e.g. claude-opus-4-7) miss the list and fall through to the catch-all that returns supportsStructuredOutput: false, which makes `auto` fall back to a `json` tool wrapper. The wrapper collides with the prompt's {output: {...}} convention and Claude emits {input:{output:{...}}}, failing schema validation. Force structuredOutputMode: 'outputFormat' on every Anthropic call. This bypasses the SDK's stale model list entirely — Anthropic's API itself validates whether the model supports output_config.format and returns a clean HTTP 400 if not. Forward-compatible with every Claude 4.5+ model without any model-version checks on our side. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/configuration.md | 4 ++++ packages/goal-executor/src/ai/AIAgent.test.ts | 3 +++ packages/goal-executor/src/ai/AIAgent.ts | 9 +++++++++ 3 files changed, 16 insertions(+) diff --git a/docs/configuration.md b/docs/configuration.md index 7bfd7d3..13a5a60 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -84,6 +84,10 @@ When neither workspace `reasoning:` nor a per-feature `reasoning:` is set, Final - `planner` → `medium` - every grounder (`grounder`, `visual-grounder`, `scroll-index-grounder`, `input-focus-grounder`, `launch-app-grounder`, `set-location-grounder`) → `low` +### Anthropic Model Compatibility + +`anthropic/...` models must be Claude 4.5 or later (Sonnet 4.5+, Opus 4.5+, Haiku 4.5+, including Sonnet 4.6, Opus 4.6, and Opus 4.7). FinalRun uses Anthropic's native structured-output API (`output_config.format`) for guaranteed JSON, and only Claude 4.5+ supports it. Older Anthropic models will return HTTP 400 from the API. OpenAI and Google paths have no equivalent restriction. + ### Supported Configurations Three shapes are supported. Pick the simplest one that fits. diff --git a/packages/goal-executor/src/ai/AIAgent.test.ts b/packages/goal-executor/src/ai/AIAgent.test.ts index 051f9c0..4a10095 100644 --- a/packages/goal-executor/src/ai/AIAgent.test.ts +++ b/packages/goal-executor/src/ai/AIAgent.test.ts @@ -195,6 +195,7 @@ test('AIAgent uses medium Anthropic effort defaults for planner feature', () => assert.deepEqual(providerOptions, { anthropic: { effort: 'medium', + structuredOutputMode: 'outputFormat', }, }); }); @@ -209,6 +210,7 @@ test('AIAgent uses low Anthropic effort defaults for grounder feature', () => { assert.deepEqual(providerOptions, { anthropic: { effort: 'low', + structuredOutputMode: 'outputFormat', }, }); }); @@ -223,6 +225,7 @@ test('AIAgent applies Anthropic effort defaults without model-family gating', () assert.deepEqual(providerOptions, { anthropic: { effort: 'medium', + structuredOutputMode: 'outputFormat', }, }); }); diff --git a/packages/goal-executor/src/ai/AIAgent.ts b/packages/goal-executor/src/ai/AIAgent.ts index 8c7b9d9..8bb4d92 100644 --- a/packages/goal-executor/src/ai/AIAgent.ts +++ b/packages/goal-executor/src/ai/AIAgent.ts @@ -668,6 +668,15 @@ export class AIAgent { return { anthropic: { effort: reasoning as 'low' | 'medium' | 'high', + // Force Anthropic's native structured-output API + // (`output_config.format`). The SDK's `auto` mode falls back to a + // `json` tool wrapper when its hardcoded model-capability table + // doesn't recognize the model — but that table lags behind new + // releases (e.g. Opus 4.7 isn't listed even though it supports + // structured output). Pinning `outputFormat` makes us forward- + // compatible with every Claude 4.5+ model without any + // model-version checks on our side. + structuredOutputMode: 'outputFormat', } satisfies AnthropicLanguageModelOptions, }; default: From 659693e70055e0cdb10a03dd73c40f4fd9ecbb4f Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 23:03:27 -0700 Subject: [PATCH 18/80] fix(ai): route OpenAI through Responses API so reasoningEffort takes effect @ai-sdk/openai v3's openai(modelId) defaults to the Chat Completions API, which silently ignores providerOptions.openai.reasoningEffort. Reasoning models like gpt-5.4-mini only honor reasoning effort via the Responses API. Use openai.responses(modelId) explicitly so the effort setting is actually applied. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/goal-executor/src/ai/AIAgent.ts | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/packages/goal-executor/src/ai/AIAgent.ts b/packages/goal-executor/src/ai/AIAgent.ts index 8bb4d92..a7b439d 100644 --- a/packages/goal-executor/src/ai/AIAgent.ts +++ b/packages/goal-executor/src/ai/AIAgent.ts @@ -617,7 +617,11 @@ export class AIAgent { switch (resolved.provider) { case 'openai': { const openai = createOpenAI({ apiKey }); - client = openai(resolved.modelName); + // Use the Responses API (not Chat Completions) so that + // `providerOptions.openai.reasoningEffort` is honored by reasoning + // models like gpt-5.4-mini. `openai(modelId)` defaults to Chat + // Completions and silently ignores reasoning effort. + client = openai.responses(resolved.modelName); break; } case 'google': { From 6967a1313a6da288df0c2c3d10429019629a29cf Mon Sep 17 00:00:00 2001 From: ashish Date: Sat, 18 Apr 2026 23:10:27 -0700 Subject: [PATCH 19/80] refactor(ai): harden per-feature model resolution and run-context capture Address several CodeRabbit comments as one cohesive tightening: - Lift parseModel (+ SUPPORTED_AI_PROVIDERS, PROVIDER_ENV_VARS, MODEL_FORMAT_EXAMPLE, ParsedModel, SupportedProvider) into @finalrun/common and add an optional label parameter for error context. The CLI re-exports the names so existing imports keep working. - _resolveFeatureConfig now calls parseModel with a `features..model` label instead of open-coding a looser split, so invalid per-feature overrides (empty halves, unsupported providers, whitespace quirks) fail with the same messages as the --model flag and workspace-level model. - Narrow GrounderRequest.feature from string to FeatureName and drop the two `as FeatureName` casts; propagate the narrowing through ActionExecutor._groundToPoint and _groundTraceDetail so a typo can't silently reach schemaForFeature or the wrong prompt. - Align resolveApiKeys with resolveApiKey: an empty/whitespace --api-key value falls through to env-var lookup instead of being accepted as the literal key. - Persist workspace-level reasoning and per-feature overrides in run-context.json via writeRunInputs so mixed-provider runs are reproducible from artifacts. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/cli/src/apiKey.test.ts | 10 +++ packages/cli/src/apiKey.ts | 5 +- packages/cli/src/env.test.ts | 20 ++++++ packages/cli/src/env.ts | 58 +++-------------- packages/cli/src/reportWriter.ts | 26 +++++--- packages/cli/src/testRunner.ts | 4 ++ packages/common/src/constants.ts | 68 ++++++++++++++++++++ packages/goal-executor/src/ActionExecutor.ts | 7 +- packages/goal-executor/src/ai/AIAgent.ts | 21 +++--- 9 files changed, 145 insertions(+), 74 deletions(-) diff --git a/packages/cli/src/apiKey.test.ts b/packages/cli/src/apiKey.test.ts index b1cbf1c..6d00a5d 100644 --- a/packages/cli/src/apiKey.test.ts +++ b/packages/cli/src/apiKey.test.ts @@ -143,6 +143,16 @@ test('resolveApiKeys routes --api-key to the single active provider', () => { assert.deepEqual(keys, { openai: 'flag-key' }); }); +test('resolveApiKeys treats empty --api-key as unset and falls through to env vars', () => { + const keys = resolveApiKeys({ + env: createEnv({ OPENAI_API_KEY: 'env-key' }), + providers: ['openai'], + providedApiKey: '', + }); + + assert.deepEqual(keys, { openai: 'env-key' }); +}); + test('resolveApiKeys rejects --api-key when multiple providers are configured', () => { assert.throws( () => diff --git a/packages/cli/src/apiKey.ts b/packages/cli/src/apiKey.ts index 1b1867d..0ea93df 100644 --- a/packages/cli/src/apiKey.ts +++ b/packages/cli/src/apiKey.ts @@ -38,7 +38,10 @@ export function resolveApiKeys(params: { throw new Error('At least one provider must be specified when resolving API keys.'); } - if (params.providedApiKey !== undefined) { + // Match `resolveApiKey` semantics: an empty/whitespace --api-key value + // falls through to env-var lookup rather than being treated as "this is + // the key." Keeps the two resolvers consistent. + if (params.providedApiKey) { if (providers.length > 1) { throw new Error( `--api-key is only valid when a single provider is active. This run uses multiple providers (${providers.join(', ')}). Provide the per-provider env vars instead: ${providers diff --git a/packages/cli/src/env.test.ts b/packages/cli/src/env.test.ts index dfe21b6..af4952b 100644 --- a/packages/cli/src/env.test.ts +++ b/packages/cli/src/env.test.ts @@ -44,6 +44,26 @@ test('parseModel rejects unsupported providers', () => { ); }); +test('parseModel prefixes errors with the provided label for context', () => { + // Trailing whitespace after the slash collapses under the outer trim, so + // the echoed value is "openai/" (empty model half) and the label prefix + // points the user at the exact config entry that tripped validation. + assert.throws( + () => parseModel('openai/ ', 'features.planner.model'), + /features\.planner\.model has invalid model format: "openai\/"\./, + ); + assert.throws( + () => parseModel('bedrock/claude', 'features.planner.model'), + /features\.planner\.model has unsupported AI provider: "bedrock"\./, + ); + // Sanity: omitting the label keeps the pre-existing CLI-style error text + // that other tests (and --model users) depend on. + assert.throws( + () => parseModel(undefined), + /--model is required\./, + ); +}); + test('parseReasoningLevel returns undefined when unset', () => { assert.equal(parseReasoningLevel(undefined, 'reasoning'), undefined); assert.equal(parseReasoningLevel(null, 'reasoning'), undefined); diff --git a/packages/cli/src/env.ts b/packages/cli/src/env.ts index a5affe7..4bf6963 100644 --- a/packages/cli/src/env.ts +++ b/packages/cli/src/env.ts @@ -5,6 +5,15 @@ import * as dotenv from 'dotenv'; import * as path from 'path'; import * as fs from 'fs'; import { REASONING_LEVELS, type ReasoningLevel } from '@finalrun/common'; +export { + MODEL_FORMAT_EXAMPLE, + PROVIDER_ENV_VARS, + SUPPORTED_AI_PROVIDERS, + SUPPORTED_AI_PROVIDERS_LABEL, + parseModel, + type ParsedModel, + type SupportedProvider, +} from '@finalrun/common'; /** * Environment configuration for the CLI. @@ -81,55 +90,6 @@ export class CliEnv { } } -export interface ParsedModel { - provider: string; - modelName: string; -} - -export const SUPPORTED_AI_PROVIDERS = ['openai', 'google', 'anthropic'] as const; -export const SUPPORTED_AI_PROVIDERS_LABEL = SUPPORTED_AI_PROVIDERS.join(', '); -export const MODEL_FORMAT_EXAMPLE = 'google/gemini-3-flash-preview'; -export const PROVIDER_ENV_VARS: Record<(typeof SUPPORTED_AI_PROVIDERS)[number], string> = { - openai: 'OPENAI_API_KEY', - google: 'GOOGLE_API_KEY', - anthropic: 'ANTHROPIC_API_KEY', -}; - -export function parseModel(modelStr: string | undefined): ParsedModel { - const normalizedModel = modelStr?.trim(); - if (!normalizedModel) { - throw new Error( - `--model is required. Use provider/model, for example ${MODEL_FORMAT_EXAMPLE}. Supported providers: ${SUPPORTED_AI_PROVIDERS_LABEL}.`, - ); - } - - const segments = normalizedModel.split('/'); - if ( - segments.length !== 2 || - segments[0] === undefined || - segments[1] === undefined || - segments[0].trim() === '' || - segments[1].trim() === '' - ) { - throw new Error( - `Invalid model format: "${normalizedModel}". Expected provider/model with non-empty provider and model name. Supported providers: ${SUPPORTED_AI_PROVIDERS_LABEL}.`, - ); - } - - const provider = segments[0].trim(); - const modelName = segments[1].trim(); - if (!SUPPORTED_AI_PROVIDERS.includes(provider as (typeof SUPPORTED_AI_PROVIDERS)[number])) { - throw new Error( - `Unsupported AI provider: "${provider}". Supported providers: ${SUPPORTED_AI_PROVIDERS_LABEL}.`, - ); - } - - return { - provider, - modelName, - }; -} - export const REASONING_LEVELS_LABEL = REASONING_LEVELS.join(', '); export function parseReasoningLevel(value: unknown, label: string): ReasoningLevel | undefined { diff --git a/packages/cli/src/reportWriter.ts b/packages/cli/src/reportWriter.ts index e019cbb..80d386f 100644 --- a/packages/cli/src/reportWriter.ts +++ b/packages/cli/src/reportWriter.ts @@ -3,24 +3,26 @@ import * as fsp from 'node:fs/promises'; import * as path from 'node:path'; import YAML from 'yaml'; import { + type AgentAction, type BindingReference, + type EnvironmentRecord, type FailurePhase, - Logger, - type TestDefinition, - type SuiteDefinition, + type FeatureOverrides, + type FirstFailure, type LogEntry, + Logger, type LoggerSink, - type RunManifestAppRecord, - type EnvironmentRecord, - type FirstFailure, + type ReasoningLevel, type RunManifest, - type TestResult, - type AgentAction, + type RunManifestAppRecord, type RunStatus, - type TestStatus, - type RunTarget, type RunSummary, + type RunTarget, type RuntimeBindings, + type SuiteDefinition, + type TestDefinition, + type TestResult, + type TestStatus, redactResolvedValue, } from '@finalrun/common'; import type { TestExecutionResult, AgentActionResult } from '@finalrun/goal-executor'; @@ -160,6 +162,8 @@ export class ReportWriter { target: RunTarget; cli: { command: string; selectors: string[]; debug: boolean; [key: string]: unknown }; model: { provider: string; modelName: string; label: string }; + reasoning?: ReasoningLevel; + features?: FeatureOverrides; app: RunManifestAppRecord; }): Promise { const inputDir = path.join(this._runDir, 'input'); @@ -179,6 +183,8 @@ export class ReportWriter { { cli: params.cli, model: params.model, + ...(params.reasoning !== undefined ? { reasoning: params.reasoning } : {}), + ...(params.features !== undefined ? { features: params.features } : {}), app: params.app, target: params.target, }, diff --git a/packages/cli/src/testRunner.ts b/packages/cli/src/testRunner.ts index 21b5eb9..bec54c0 100644 --- a/packages/cli/src/testRunner.ts +++ b/packages/cli/src/testRunner.ts @@ -264,6 +264,10 @@ export async function runTests(options: TestRunnerOptions): Promise = { + openai: 'OPENAI_API_KEY', + google: 'GOOGLE_API_KEY', + anthropic: 'ANTHROPIC_API_KEY', +}; + +export interface ParsedModel { + provider: SupportedProvider; + modelName: string; +} + +/** + * Parse a `provider/model` string (e.g. `openai/gpt-5.4-mini`) into its + * provider and model name. Validates that both halves are non-empty after + * trimming and that the provider is one of `SUPPORTED_AI_PROVIDERS`. + * + * @param modelStr the raw string from YAML or the CLI `--model` flag + * @param label optional context prefix for errors (e.g. `features.planner.model`). + * When omitted, errors read as CLI-style (`--model is required...`). + */ +export function parseModel(modelStr: string | undefined, label?: string): ParsedModel { + const normalizedModel = modelStr?.trim(); + if (!normalizedModel) { + throw new Error( + label + ? `${label} is required. Use provider/model, for example ${MODEL_FORMAT_EXAMPLE}. Supported providers: ${SUPPORTED_AI_PROVIDERS_LABEL}.` + : `--model is required. Use provider/model, for example ${MODEL_FORMAT_EXAMPLE}. Supported providers: ${SUPPORTED_AI_PROVIDERS_LABEL}.`, + ); + } + + const segments = normalizedModel.split('/'); + if ( + segments.length !== 2 || + segments[0] === undefined || + segments[1] === undefined || + segments[0].trim() === '' || + segments[1].trim() === '' + ) { + const detail = `Expected provider/model with non-empty provider and model name. Supported providers: ${SUPPORTED_AI_PROVIDERS_LABEL}.`; + throw new Error( + label + ? `${label} has invalid model format: "${normalizedModel}". ${detail}` + : `Invalid model format: "${normalizedModel}". ${detail}`, + ); + } + + const provider = segments[0].trim(); + const modelName = segments[1].trim(); + if (!SUPPORTED_AI_PROVIDERS.includes(provider as SupportedProvider)) { + throw new Error( + label + ? `${label} has unsupported AI provider: "${provider}". Supported providers: ${SUPPORTED_AI_PROVIDERS_LABEL}.` + : `Unsupported AI provider: "${provider}". Supported providers: ${SUPPORTED_AI_PROVIDERS_LABEL}.`, + ); + } + + return { provider: provider as SupportedProvider, modelName }; +} + /** * Per-feature override resolved from `features:` in .finalrun/config.yaml. * Each field is optional; unset fields inherit workspace-level defaults. diff --git a/packages/goal-executor/src/ActionExecutor.ts b/packages/goal-executor/src/ActionExecutor.ts index 5328c26..6baa234 100644 --- a/packages/goal-executor/src/ActionExecutor.ts +++ b/packages/goal-executor/src/ActionExecutor.ts @@ -40,6 +40,7 @@ import { PLANNER_ACTION_SET_LOCATION, PLANNER_ACTION_WAIT, PLANNER_ACTION_DEEPLINK, + type FeatureName, type RuntimeBindings, redactResolvedValue, resolveRuntimePlaceholders, @@ -773,7 +774,7 @@ export class ActionExecutor { private async _groundToPoint( input: ActionInput, - feature: string, + feature: FeatureName, tracePhase: string, ): Promise { const grounderResponse = await this._callGrounder(input, { @@ -937,7 +938,7 @@ export class ActionExecutor { private async _callGrounder( input: ActionInput, request: { - feature: string; + feature: FeatureName; act: string; hierarchy?: Hierarchy; screenshot?: string; @@ -1094,7 +1095,7 @@ export class ActionExecutor { private _groundTraceDetail( trace: LLMTrace | undefined, - feature: string, + feature: FeatureName, reason?: string, ): string { const detail = `feature=${feature}${reason ? ` reason=${reason}` : ''}`; diff --git a/packages/goal-executor/src/ai/AIAgent.ts b/packages/goal-executor/src/ai/AIAgent.ts index a7b439d..7339c52 100644 --- a/packages/goal-executor/src/ai/AIAgent.ts +++ b/packages/goal-executor/src/ai/AIAgent.ts @@ -43,6 +43,7 @@ import { PLANNER_ACTION_COMPLETED, PLANNER_ACTION_FAILED, PLANNER_ACTION_DEEPLINK, + parseModel, type FeatureName, type FeatureOverrides, type ModelDefaults, @@ -107,7 +108,7 @@ export interface PlannerResponse { } export interface GrounderRequest { - feature: string; + feature: FeatureName; act: string; hierarchy?: Hierarchy; screenshot?: string; // base64 @@ -412,7 +413,7 @@ export class AIAgent { const llmResult = await this._callLLM( systemPrompt, userParts, - request.feature as FeatureName, + request.feature, ); rawOutput = llmResult.output; rawText = llmResult.text; @@ -582,14 +583,12 @@ export class AIAgent { let provider = this._defaults.provider; let modelName = this._defaults.modelName; if (override?.model) { - const slash = override.model.indexOf('/'); - if (slash <= 0 || slash === override.model.length - 1) { - throw new Error( - `Invalid model override for feature "${feature}": "${override.model}". Expected provider/model.`, - ); - } - provider = override.model.slice(0, slash).trim(); - modelName = override.model.slice(slash + 1).trim(); + // Reuse the shared parser so per-feature overrides fail with the same + // validation errors (empty provider/model, unsupported provider) as + // workspace-level `model:` and the `--model` CLI flag. + const parsed = parseModel(override.model, `features.${feature}.model`); + provider = parsed.provider; + modelName = parsed.modelName; } const reasoning: ReasoningLevel = override?.reasoning ?? this._defaults.reasoning ?? DEFAULT_REASONING_BY_PHASE[phaseForFeature(feature)]; @@ -805,7 +804,7 @@ export class AIAgent { private _summarizeGrounderRequest(req: GrounderRequest): string { const parts: string[] = ['[AI ground]']; parts.push(this._formatLogContext(req.logContext, req.traceStep)); - const grounderResolved = this._resolveFeatureConfig(req.feature as FeatureName); + const grounderResolved = this._resolveFeatureConfig(req.feature); parts.push(`provider=${grounderResolved.provider}/${grounderResolved.modelName}`); parts.push(`feature=${req.feature}`); parts.push(this._screenshotMetric('screenshot', req.screenshot)); From b31afb44f2b144a7c2205fd97ff184b047931128 Mon Sep 17 00:00:00 2001 From: ashish Date: Sun, 19 Apr 2026 14:37:22 -0700 Subject: [PATCH 20/80] chore: add mintlify-docs folder placeholder Co-Authored-By: Claude Opus 4.7 --- mintlify-docs/.keep | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 mintlify-docs/.keep diff --git a/mintlify-docs/.keep b/mintlify-docs/.keep new file mode 100644 index 0000000..e69de29 From 32161a6c31ba17d980da27774fcad0dd6495639f Mon Sep 17 00:00:00 2001 From: ashish Date: Sun, 19 Apr 2026 14:38:05 -0700 Subject: [PATCH 21/80] docs: add mintlify community docs content Co-Authored-By: Claude Opus 4.7 --- mintlify-docs/.atlas-analysis.json | 84 +++++++++ mintlify-docs/.keep | 0 mintlify-docs/.mintignore | 7 + mintlify-docs/AGENTS.md | 33 ++++ mintlify-docs/CONTRIBUTING.md | 34 ++++ mintlify-docs/LICENSE | 21 +++ mintlify-docs/README.md | 55 ++++++ mintlify-docs/configuration/ai-providers.mdx | 88 ++++++++++ mintlify-docs/configuration/environments.mdx | 101 +++++++++++ mintlify-docs/configuration/workspace.mdx | 112 ++++++++++++ mintlify-docs/docs.json | 69 ++++++++ mintlify-docs/faq.mdx | 103 +++++++++++ mintlify-docs/index.mdx | 67 ++++++++ mintlify-docs/installation.mdx | 137 +++++++++++++++ mintlify-docs/introduction.mdx | 83 +++++++++ mintlify-docs/quickstart.mdx | 163 ++++++++++++++++++ mintlify-docs/running/ai-agent-skills.mdx | 120 +++++++++++++ mintlify-docs/running/cli-reference.mdx | 150 ++++++++++++++++ mintlify-docs/running/reports.mdx | 101 +++++++++++ mintlify-docs/tests/placeholders.mdx | 106 ++++++++++++ mintlify-docs/tests/suites.mdx | 63 +++++++ mintlify-docs/tests/yaml-format.mdx | 171 +++++++++++++++++++ mintlify-docs/troubleshooting.mdx | 160 +++++++++++++++++ 23 files changed, 2028 insertions(+) create mode 100644 mintlify-docs/.atlas-analysis.json delete mode 100644 mintlify-docs/.keep create mode 100644 mintlify-docs/.mintignore create mode 100644 mintlify-docs/AGENTS.md create mode 100644 mintlify-docs/CONTRIBUTING.md create mode 100644 mintlify-docs/LICENSE create mode 100644 mintlify-docs/README.md create mode 100644 mintlify-docs/configuration/ai-providers.mdx create mode 100644 mintlify-docs/configuration/environments.mdx create mode 100644 mintlify-docs/configuration/workspace.mdx create mode 100644 mintlify-docs/docs.json create mode 100644 mintlify-docs/faq.mdx create mode 100644 mintlify-docs/index.mdx create mode 100644 mintlify-docs/installation.mdx create mode 100644 mintlify-docs/introduction.mdx create mode 100644 mintlify-docs/quickstart.mdx create mode 100644 mintlify-docs/running/ai-agent-skills.mdx create mode 100644 mintlify-docs/running/cli-reference.mdx create mode 100644 mintlify-docs/running/reports.mdx create mode 100644 mintlify-docs/tests/placeholders.mdx create mode 100644 mintlify-docs/tests/suites.mdx create mode 100644 mintlify-docs/tests/yaml-format.mdx create mode 100644 mintlify-docs/troubleshooting.mdx diff --git a/mintlify-docs/.atlas-analysis.json b/mintlify-docs/.atlas-analysis.json new file mode 100644 index 0000000..f050d12 --- /dev/null +++ b/mintlify-docs/.atlas-analysis.json @@ -0,0 +1,84 @@ +{ + "projectType": "cli-tool", + "projectName": "FinalRun", + "projectDescription": "An AI-driven CLI that tests Android and iOS apps using natural language YAML specs, executing steps on real devices or emulators with Gemini, GPT, or Claude.", + "theme": "luma", + "primaryColor": "#3f4fe8", + "lightColor": "#d0a7f7", + "darkColor": "#6c4cfc", + "navigation": { + "tabs": [ + { + "tab": "Docs", + "groups": [ + { + "group": "Get Started", + "pages": [ + "introduction", + "quickstart", + "installation" + ] + }, + { + "group": "Writing Tests", + "pages": [ + "tests/yaml-format", + "tests/suites", + "tests/placeholders" + ] + }, + { + "group": "Configuration", + "pages": [ + "configuration/workspace", + "configuration/environments", + "configuration/ai-providers" + ] + }, + { + "group": "Running Tests", + "pages": [ + "running/cli-reference", + "running/ai-agent-skills", + "running/reports" + ] + }, + { + "group": "Help", + "pages": [ + "troubleshooting", + "faq" + ] + } + ] + } + ] + }, + "keyFeatures": [ + "Natural language YAML test specs for Android and iOS", + "AI-powered test execution using Gemini, GPT, or Claude", + "Three-phase test model: setup, steps, expected_state", + "AI agent skills for generating, running, and fixing tests", + "Local report viewer with video, screenshots, and device logs", + "BYOK (Bring Your Own Key) — use your own AI provider API key", + "Multi-environment support with secrets and variable bindings", + "One-command install and host readiness check via finalrun doctor" + ], + "publicApiSurface": [ + "finalrun test ", + "finalrun suite ", + "finalrun check [selectors...]", + "finalrun doctor", + "finalrun runs", + "finalrun start-server", + "finalrun stop-server", + "finalrun server-status", + "/finalrun-generate-test skill", + "/finalrun-use-cli skill", + "/finalrun-test-and-fix skill", + ".finalrun/config.yaml", + ".finalrun/tests//.yaml", + ".finalrun/suites/.yaml", + ".finalrun/env/.yaml" + ] +} diff --git a/mintlify-docs/.keep b/mintlify-docs/.keep deleted file mode 100644 index e69de29..0000000 diff --git a/mintlify-docs/.mintignore b/mintlify-docs/.mintignore new file mode 100644 index 0000000..9922f06 --- /dev/null +++ b/mintlify-docs/.mintignore @@ -0,0 +1,7 @@ +# Mintlify automatically ignores these files and directories: +# .git, .github, .claude, .agents, .idea, node_modules, +# README.md, LICENSE.md, CHANGELOG.md, CONTRIBUTING.md + +# Draft content +drafts/ +*.draft.mdx diff --git a/mintlify-docs/AGENTS.md b/mintlify-docs/AGENTS.md new file mode 100644 index 0000000..cebd973 --- /dev/null +++ b/mintlify-docs/AGENTS.md @@ -0,0 +1,33 @@ +> **First-time setup**: Customize this file for your project. Prompt the user to customize this file for their project. +> For Mintlify product knowledge (components, configuration, writing standards), +> install the Mintlify skill: `npx skills add https://mintlify.com/docs` + +# Documentation project instructions + +## About this project + +- This is a documentation site built on [Mintlify](https://mintlify.com) +- Pages are MDX files with YAML frontmatter +- Configuration lives in `docs.json` +- Run `mint dev` to preview locally +- Run `mint broken-links` to check links + +## Terminology + +{/* Add product-specific terms and preferred usage */} +{/* Example: Use "workspace" not "project", "member" not "user" */} + +## Style preferences + +{/* Add any project-specific style rules below */} + +- Use active voice and second person ("you") +- Keep sentences concise — one idea per sentence +- Use sentence case for headings +- Bold for UI elements: Click **Settings** +- Code formatting for file names, commands, paths, and code references + +## Content boundaries + +{/* Define what should and shouldn't be documented */} +{/* Example: Don't document internal admin features */} diff --git a/mintlify-docs/CONTRIBUTING.md b/mintlify-docs/CONTRIBUTING.md new file mode 100644 index 0000000..8863ee4 --- /dev/null +++ b/mintlify-docs/CONTRIBUTING.md @@ -0,0 +1,34 @@ +> **Customize this file**: Tailor this template to your project by noting specific contribution types you're looking for, adding a Code of Conduct, or adjusting the writing guidelines to match your style. + +# Contribute to the documentation + +Thank you for your interest in contributing to our documentation! This guide will help you get started. + +## How to contribute + +### Option 1: Edit directly on GitHub + +1. Navigate to the page you want to edit +2. Click the "Edit this file" button (the pencil icon) +3. Make your changes and submit a pull request + +### Option 2: Local development + +1. Fork and clone this repository +2. Install the Mintlify CLI: `npm i -g mint` +3. Create a branch for your changes +4. Make changes +5. Navigate to the docs directory and run `mint dev` +6. Preview your changes at `http://localhost:3000` +7. Commit your changes and submit a pull request + +For more details on local development, see our [development guide](development.mdx). + +## Writing guidelines + +- **Use active voice**: "Run the command" not "The command should be run" +- **Address the reader directly**: Use "you" instead of "the user" +- **Keep sentences concise**: Aim for one idea per sentence +- **Lead with the goal**: Start instructions with what the user wants to accomplish +- **Use consistent terminology**: Don't alternate between synonyms for the same concept +- **Include examples**: Show, don't just tell diff --git a/mintlify-docs/LICENSE b/mintlify-docs/LICENSE new file mode 100644 index 0000000..5411374 --- /dev/null +++ b/mintlify-docs/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2023 Mintlify + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/mintlify-docs/README.md b/mintlify-docs/README.md new file mode 100644 index 0000000..4552fbc --- /dev/null +++ b/mintlify-docs/README.md @@ -0,0 +1,55 @@ +# Mintlify Starter Kit + +Use the starter kit to get your docs deployed and ready to customize. + +Click the green **Use this template** button at the top of this repo to copy the Mintlify starter kit. The starter kit contains examples with + +- Guide pages +- Navigation +- Customizations +- API reference pages +- Use of popular components + +**[Follow the full quickstart guide](https://starter.mintlify.com/quickstart)** + +## AI-assisted writing + +Set up your AI coding tool to work with Mintlify: + +```bash +npx skills add https://mintlify.com/docs +``` + +This command installs Mintlify's documentation skill for your configured AI tools like Claude Code, Cursor, Windsurf, and others. The skill includes component reference, writing standards, and workflow guidance. + +See the [AI tools guides](/ai-tools) for tool-specific setup. + +## Development + +Install the [Mintlify CLI](https://www.npmjs.com/package/mint) to preview your documentation changes locally. To install, use the following command: + +``` +npm i -g mint +``` + +Run the following command at the root of your documentation, where your `docs.json` is located: + +``` +mint dev +``` + +View your local preview at `http://localhost:3000`. + +## Publishing changes + +Install our GitHub app from your [dashboard](https://dashboard.mintlify.com/settings/organization/github-app) to propagate changes from your repo to your deployment. Changes are deployed to production automatically after pushing to the default branch. + +## Need help? + +### Troubleshooting + +- If your dev environment isn't running: Run `mint update` to ensure you have the most recent version of the CLI. +- If a page loads as a 404: Make sure you are running in a folder with a valid `docs.json`. + +### Resources +- [Mintlify documentation](https://mintlify.com/docs) diff --git a/mintlify-docs/configuration/ai-providers.mdx b/mintlify-docs/configuration/ai-providers.mdx new file mode 100644 index 0000000..546e6ae --- /dev/null +++ b/mintlify-docs/configuration/ai-providers.mdx @@ -0,0 +1,88 @@ +--- +title: "Connect FinalRun to Google, OpenAI, or Anthropic AI" +sidebarTitle: "AI Providers" +description: "Bring your own API key to FinalRun and configure Google Gemini, OpenAI GPT, or Anthropic Claude as the AI model that drives your Android and iOS test runs." +--- + +FinalRun uses a bring-your-own-key (BYOK) model — it does not proxy AI requests through its own infrastructure. When you run a test, the CLI calls your chosen AI provider directly using the API key you supply. This gives you full visibility into token usage and billing, and lets you use whichever model tier your team has access to. + +## Supported providers + +| Provider prefix | Environment variable | Recommended model family | +|---|---|---| +| `google/...` | `GOOGLE_API_KEY` | Gemini 3 family and above | +| `openai/...` | `OPENAI_API_KEY` | GPT-5 family and above | +| `anthropic/...` | `ANTHROPIC_API_KEY` | Claude Sonnet 4 / Opus 4 and above | + +The provider is inferred from the prefix of the `--model` value or the `model` field in `.finalrun/config.yaml`. + +## Setting your API key + +You can supply an API key in three ways: + + + + Add the key to a `.env` file at your workspace root. This is the recommended approach for local development. + + ```bash + echo "GOOGLE_API_KEY=your-key-here" > .env + ``` + + The file is read automatically on every run. See [Managing environments and secrets](/configuration/environments) for dotenv load order details. + + + Export the variable in your shell session or CI environment: + + ```bash + export GOOGLE_API_KEY=your-key-here + ``` + + Shell environment variables take the highest priority in FinalRun's load order. + + + Pass the key directly as a CLI flag for a one-off run without modifying any file: + + ```bash + finalrun test smoke.yaml --api-key your-key-here --model google/gemini-3-flash-preview + ``` + + + +## Setting a default model + +Add a `model` field to `.finalrun/config.yaml` so you don't need to pass `--model` on every command: + +```yaml .finalrun/config.yaml +model: google/gemini-3-flash-preview +``` + +The model value must use `provider/model-name` format. Examples: `google/gemini-3-flash-preview`, `anthropic/claude-sonnet-4-6`, `openai/gpt-5`. + + + Set `model` in `.finalrun/config.yaml` once during workspace setup. After that, commands like `finalrun test smoke.yaml --platform android` work without an explicit `--model` flag. + + +## Provider setup examples + + + +```bash Google Gemini +echo "GOOGLE_API_KEY=your-key-here" > .env +finalrun test smoke.yaml --platform android --model google/gemini-3-flash-preview +``` + +```bash OpenAI GPT +echo "OPENAI_API_KEY=your-key-here" > .env +finalrun test smoke.yaml --platform android --model openai/gpt-5 +``` + +```bash Anthropic Claude +echo "ANTHROPIC_API_KEY=your-key-here" > .env +finalrun test smoke.yaml --platform ios --model anthropic/claude-sonnet-4-6 +``` + + + + + Test runs consume AI provider tokens. Standard API billing from your provider applies — FinalRun does not add any markup or usage fees on top of provider costs. + diff --git a/mintlify-docs/configuration/environments.mdx b/mintlify-docs/configuration/environments.mdx new file mode 100644 index 0000000..19caa99 --- /dev/null +++ b/mintlify-docs/configuration/environments.mdx @@ -0,0 +1,101 @@ +--- +title: "FinalRun environments: secrets, variables, and overrides" +sidebarTitle: "Environments" +description: "Use named environment files and dotenv files to manage variable bindings, secret placeholders, and per-environment app identity overrides in FinalRun." +--- + +Environments are named profiles that group variable bindings, secret placeholders, and optional app identity overrides. You define an environment by creating a YAML file at `.finalrun/env/.yaml`, then activate it with `--env ` or by setting `env: ` in `.finalrun/config.yaml`. Separating environments lets you run the same test specs against development, staging, and production configurations without touching the tests themselves. + +## Environment file structure + +Each environment file lives at `.finalrun/env/.yaml` and can contain three top-level blocks: + + + Per-environment app identity override. Values here take priority over the workspace default in `config.yaml`. Useful when your staging or debug build uses a different package name or bundle ID. + + + + Placeholder bindings for sensitive values. Each value uses `${SHELL_ENV_VAR}` syntax. The CLI resolves each placeholder from the shell environment or from workspace-root dotenv files at runtime. Do not put real secrets in this file. + + + + Plain, non-sensitive values such as locale strings, feature flags, or base URLs. Safe to commit. + + +### Full example + +```yaml .finalrun/env/dev.yaml +app: + packageName: com.example.myapp.debug + bundleId: com.example.myapp.debug + +secrets: + email: ${TEST_USER_EMAIL} + password: ${TEST_USER_PASSWORD} + +variables: + locale: en-US +``` + +In your test specs, you can reference these values as `${secrets.email}` and `${variables.locale}`. + +## Dotenv files for secret values + +Real secret values — API keys, passwords, tokens — belong in dotenv files at the workspace root, not in the YAML binding file. + +| File | Purpose | +|---|---| +| `.env` | Shared defaults loaded for all runs | +| `.env.` | Environment-specific values loaded when `--env ` is active (e.g. `.env.dev` for `--env dev`) | + +FinalRun finds the workspace root by walking up from your shell's current directory, so dotenv paths are always anchored to the folder that contains `.finalrun/`, regardless of where you run the CLI from. + +### Load order + +For an active environment named `N`, the CLI loads values in this order: + + + + Environment-specific dotenv file is read first. Keys set here take precedence over the shared file. + + + Shared dotenv file fills in any keys not already set by `.env.N`. + + + Shell environment variables win if the same key exists in both a dotenv file and the current shell session. + + + +This same load order applies to both `secrets` placeholder resolution and AI provider API key resolution. + +## Using environments with the CLI + +Pass `--env` to activate a named environment for any command: + +```bash +# Run a test against the dev environment on Android +finalrun test smoke.yaml --env dev --platform android + +# Validate workspace configuration for staging +finalrun check --env staging +``` + +When you set `env: dev` in `.finalrun/config.yaml`, `--env` becomes optional and the CLI uses `dev` by default. + +## Keeping secrets out of version control + + + Never commit `.env` files. Add the following to your repository's `.gitignore`: + + ```text .gitignore + .env + .env.* + !.env.example + ``` + + This ignores `.env`, `.env.dev`, `.env.staging`, and any similar files while keeping `.env.example` tracked. + + + + Create a `.env.example` file that lists every required variable with placeholder values. Commit it so that team members know exactly which variables to set in their local `.env` file. + diff --git a/mintlify-docs/configuration/workspace.mdx b/mintlify-docs/configuration/workspace.mdx new file mode 100644 index 0000000..b012304 --- /dev/null +++ b/mintlify-docs/configuration/workspace.mdx @@ -0,0 +1,112 @@ +--- +title: "FinalRun workspace config: app identity and defaults" +sidebarTitle: "Workspace" +description: "Set up .finalrun/config.yaml with app identity, default environment, and AI model fields to configure your FinalRun workspace for Android and iOS testing." +--- + +Every FinalRun project is anchored to a workspace root — the directory that contains the `.finalrun/` folder. The workspace holds your configuration file, test specs, optional suite manifests, and per-environment binding files. Understanding the layout and the fields in `config.yaml` lets the CLI resolve the right app, environment, and AI model without you needing to pass flags on every run. + +## Workspace layout + +```text +my-app/ # workspace root + .env # optional + .env.dev # optional + .finalrun/ + config.yaml # workspace configuration + tests/ # YAML test specs (required) + smoke.yaml + auth/ + login.yaml + suites/ # suite manifests (optional) + auth_smoke.yaml + env/ # environment bindings (optional) + dev.yaml +``` + +The `tests/` directory is the only required subdirectory. `suites/` and `env/` are optional and only needed when you run suites or use named environments. + +## `.finalrun/config.yaml` fields + +The workspace config defines defaults that the CLI uses when flags are omitted. Place this file at `.finalrun/config.yaml` in your workspace root. + + + Human-readable name for the app. Optional — used only for display purposes. + + + + Android package identifier (e.g. `com.example.myapp`). Required if you run Android tests and do not pass `--app`. + + + + iOS bundle identifier (e.g. `com.example.myapp`). Required if you run iOS tests and do not pass `--app`. + + + + Default environment name. Used when you omit the `--env` flag. Must match a file under `.finalrun/env/.yaml` if one exists. + + + + Default AI model in `provider/model` format (e.g. `google/gemini-3-flash-preview`). Used when you omit `--model`. + + + + At least one of `app.packageName` or `app.bundleId` is required unless you always pass `--app` on the command line. + + +### Example config + +```yaml .finalrun/config.yaml +app: + name: MyApp + packageName: com.example.myapp + bundleId: com.example.myapp +env: dev +model: google/gemini-3-flash-preview +``` + +## App identity resolution + +FinalRun resolves which app to launch on the device using the following priority order: + + + + When you pass `--app `, FinalRun uses that binary directly. It extracts the package name (Android) or bundle ID (iOS) from the binary and ignores any `app` block in config files. + + + If an active environment file at `.finalrun/env/.yaml` contains an `app` block, those values override the workspace defaults. + + + If neither of the above applies, FinalRun falls back to the `app` block in `.finalrun/config.yaml`. + + + +### Using the `--app` flag + +Pass a local binary to run a specific build without changing any config file: + +```bash +finalrun test smoke.yaml --platform android --app path/to/your.apk +finalrun test smoke.yaml --platform ios --app path/to/YourApp.app +``` + +The CLI: +- Extracts the package name (Android) or bundle ID (iOS) from the binary +- Infers the platform from the file extension (`.apk` → Android, `.app` → iOS) +- Validates that the binary matches the `--platform` flag if both are provided + + + CLI flags always override values in `config.yaml`. You can use flags for one-off runs without modifying your workspace config. + + +### Per-environment app overrides + +If your app uses different identifiers per environment — for example, a `.staging` suffix — set the override in the corresponding env file instead of changing `config.yaml`: + +```yaml .finalrun/env/staging.yaml +app: + packageName: com.example.myapp.staging + bundleId: com.example.myapp.staging +``` + +Any environment that does not define its own `app` block falls back to the workspace default in `.finalrun/config.yaml`. diff --git a/mintlify-docs/docs.json b/mintlify-docs/docs.json new file mode 100644 index 0000000..06cc9ac --- /dev/null +++ b/mintlify-docs/docs.json @@ -0,0 +1,69 @@ +{ + "$schema": "https://mintlify.com/docs.json", + "name": "FinalRun", + "theme": "luma", + "colors": { + "primary": "#3f4fe8", + "light": "#d0a7f7", + "dark": "#6c4cfc" + }, + "logo": { + "light": "https://media.brand.dev/3ece83f5-5543-48fb-bd61-213417b8ba98.png", + "dark": "https://media.brand.dev/3ece83f5-5543-48fb-bd61-213417b8ba98.png" + }, + "favicon": "https://media.brand.dev/31164f48-397c-4036-929b-8153e11d15c1.jpg", + "navbar": { + "primary": { + "type": "github", + "href": "https://github.com/final-run/finalrun-agent" + } + }, + "navigation": { + "tabs": [ + { + "tab": "Docs", + "groups": [ + { + "group": "Get Started", + "pages": [ + "introduction", + "quickstart", + "installation" + ] + }, + { + "group": "Writing Tests", + "pages": [ + "tests/yaml-format", + "tests/suites", + "tests/placeholders" + ] + }, + { + "group": "Configuration", + "pages": [ + "configuration/workspace", + "configuration/environments", + "configuration/ai-providers" + ] + }, + { + "group": "Running Tests", + "pages": [ + "running/cli-reference", + "running/ai-agent-skills", + "running/reports" + ] + }, + { + "group": "Help", + "pages": [ + "troubleshooting", + "faq" + ] + } + ] + } + ] + } +} diff --git a/mintlify-docs/faq.mdx b/mintlify-docs/faq.mdx new file mode 100644 index 0000000..fefcb01 --- /dev/null +++ b/mintlify-docs/faq.mdx @@ -0,0 +1,103 @@ +--- +title: "FinalRun FAQ: pricing, AI providers, and CI/CD usage" +sidebarTitle: "FAQ" +description: "Answers to common FinalRun questions: pricing, supported AI providers, platform compatibility, CI/CD usage, three-phase test model, and artifact storage." +--- + +FinalRun is an open-source, CLI-based tool — no account required, no platform lock-in. The questions below cover the most common things people ask about how FinalRun works, what it costs, and how to get the most out of it. + + + + No. FinalRun is open-source and runs entirely from the command line. Install the CLI, set your AI provider API key, and start running tests. There is no account, signup, or FinalRun subscription required. + + + + FinalRun supports three AI providers. You bring your own key (BYOK) — costs are billed directly by your provider at standard API rates. + + | Provider | Supported models | Environment variable | + |---|---|---| + | Google | Gemini 3+ | `GOOGLE_API_KEY` | + | OpenAI | GPT-5+ | `OPENAI_API_KEY` | + | Anthropic | Claude Sonnet 4 / Opus 4+ | `ANTHROPIC_API_KEY` | + + Set the key in your shell or in a `.env` file at your workspace root. You can also override it for a single run with the `--api-key` flag. + + + + Currently FinalRun targets Android emulators (AVDs) and iOS simulators for local runs. Support for cloud devices and physical hardware is on the roadmap. + + If you want early access to cloud device support, [join the waitlist](https://docs.google.com/forms/d/e/1FAIpQLScOTaNWjvxIG8Ywn6THHYJuqBM-b86Y-Fx39YVoBVhHuBDZ2w/viewform?usp=publish-editor). + + + + - **Android** — any macOS, Linux, or Windows machine with Android SDK tools (`adb`, `emulator`, `scrcpy`) installed and a running Android Virtual Device. + - **iOS** — macOS only. Requires Xcode command line tools with `xcrun simctl`. + + Run `finalrun doctor` to check that all required dependencies are present on your machine before running tests. + + + + FinalRun itself is free and open-source. You pay your AI provider — Google, OpenAI, or Anthropic — for the tokens consumed during test execution. The cost depends on the model you choose and how long the test takes to complete. There are no additional charges from FinalRun. + + + + Yes. Install the CLI in your CI environment and set the required environment variables — your AI provider API key, `ANDROID_HOME` (for Android), and any secrets your test specs reference. Then call `finalrun test` or `finalrun suite` as a step in your pipeline. + + ```sh + finalrun test smoke.yaml --platform android --model google/gemini-3-flash-preview + ``` + + Run `finalrun check` before your test step to catch workspace configuration errors early, before you consume any API tokens. + + + + Every FinalRun test has three phases: + + - **`setup`** — optional actions that prepare a clean state before the test starts (for example, clearing app data). + - **`steps`** — the ordered, plain-English instructions the AI executes on your device screen. + - **`expected_state`** — the UI conditions the AI verifies once all steps have completed. + + A test passes only when all three phases succeed. If any phase fails, FinalRun stops the run and records the failure with the current screenshot, video, and device log. + + + + `finalrun check` validates your entire workspace before you run any tests. It checks: + + - Selector definitions + - Suite manifests + - Environment bindings (secrets and variables) + - App overrides + + Running `finalrun check` before a test run catches configuration errors early, so you are not spending API tokens on a run that will fail at startup. + + ```sh + finalrun check --env dev --platform android + ``` + + + + FinalRun ships a set of agent skills that let you generate tests, run them, and fix failures — all from your AI coding agent chat. Install the skills with: + + ```sh + npx skills add final-run/finalrun-agent + ``` + + Once installed, three slash commands are available in your AI coding agent: + + | Command | What it does | + |---|---| + | `/finalrun-generate-test` | Reads your source code, infers app identity, and generates complete YAML test specs | + | `/finalrun-use-cli` | Validates and runs your tests using the CLI | + | `/finalrun-test-and-fix` | Runs the full generate → run → diagnose → fix loop until the test is green | + + + + Artifacts for each run — including video, screenshots, and device logs — are stored at: + + ``` + ~/.finalrun/workspaces//artifacts/ + ``` + + Use `finalrun runs` to list all recorded runs for your current workspace, and `finalrun start-server` to open the visual report UI where you can browse results interactively. + + diff --git a/mintlify-docs/index.mdx b/mintlify-docs/index.mdx new file mode 100644 index 0000000..110b2eb --- /dev/null +++ b/mintlify-docs/index.mdx @@ -0,0 +1,67 @@ +--- +title: "FinalRun: AI-powered Android and iOS test automation" +description: "FinalRun lets you write plain-English test specs in YAML and run them on Android and iOS using AI models like Gemini, GPT, or Claude." +--- + +FinalRun is an AI-driven testing CLI that runs your Android and iOS apps through real user scenarios — written in plain English, executed by AI on a real device or emulator. You describe what to test in a YAML file; FinalRun taps, swipes, types, and verifies on your behalf, then produces a pass/fail report with video, screenshots, and device logs. + + + + Install FinalRun and run your first test in minutes. + + + Learn the YAML test format for natural-language mobile testing. + + + Every command, flag, and option available in the finalrun CLI. + + + Use AI coding agents to generate, run, and fix tests automatically. + + + +## How FinalRun works + +FinalRun connects an AI model to a real device or emulator. You write a test spec describing what a user would do; the AI reads the screen, performs each action, and checks the result. + + + + Run the one-line install script to get FinalRun, Node.js, and all platform tools set up. + ```bash + curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.sh | bash + ``` + + + FinalRun uses your own AI provider key (Google, OpenAI, or Anthropic) — no account signup needed. + ```bash + echo "GOOGLE_API_KEY=your-key-here" > .env + ``` + + + Create a YAML file in `.finalrun/tests/` describing steps in plain English. + ```yaml + name: login_smoke + steps: + - Launch the app. + - Enter ${secrets.email} on the login screen. + - Tap the Login button. + expected_state: + - The home screen is visible. + ``` + + + Execute the test on Android or iOS with a single command. + ```bash + finalrun test login_smoke.yaml --platform android --model google/gemini-3-flash-preview + ``` + + + + + + Set up your workspace config, app identity, and environments. + + + Fix common errors and verify host readiness with finalrun doctor. + + diff --git a/mintlify-docs/installation.mdx b/mintlify-docs/installation.mdx new file mode 100644 index 0000000..274b770 --- /dev/null +++ b/mintlify-docs/installation.mdx @@ -0,0 +1,137 @@ +--- +title: "Install FinalRun: CLI setup and platform prerequisites" +sidebarTitle: "Installation" +description: "Install the FinalRun CLI on macOS or Linux, set up Android or iOS platform prerequisites, and verify your host is ready to run tests." +--- + +FinalRun runs as a Node.js CLI. The install script handles Node.js, the CLI itself, and the platform driver assets needed for Android or iOS test execution. This page covers system requirements, install methods, platform-specific prerequisites, and how to verify your setup. + +## System requirements + +- **Node.js** 20.0.0 or later + +The one-line install script will set up Node.js for you if it isn't already present. + +## Install FinalRun + +**One-line install (recommended):** + +```bash +curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.sh | bash +``` + +This script installs Node.js (if needed), the `finalrun` CLI globally, and all bundled platform driver assets for Android and iOS. + +**npm (if you already have Node.js 20+):** + +```bash +npm install -g @finalrun/finalrun-agent +``` + +## Verify the installation + +After installing, confirm the CLI is on your PATH: + +```bash +finalrun --help +``` + +Then check host readiness for your target platform: + +```bash +finalrun doctor +``` + +`finalrun doctor` reports which required tools are present, missing, or misconfigured. You can also check a single platform: + +```bash +finalrun doctor --platform android +finalrun doctor --platform ios +``` + +Fix any issues reported before running tests. + +## Platform prerequisites + + + + Android tests run on emulators via `adb`. You need the Android SDK and a few additional tools on your PATH. + + **Required:** + + | Tool | How to provide | + |---|---| + | `adb` | Available through `ANDROID_HOME`, `ANDROID_SDK_ROOT`, or `PATH` | + | `emulator` | Must be on `PATH`; used to discover and boot Android Virtual Devices | + | `scrcpy` | Must be on `PATH`; used for screen recording during local runs | + | FinalRun Android driver assets | Installed automatically by the CLI during installation | + + **Install the Android SDK** via [Android Studio](https://developer.android.com/studio) or the standalone command-line tools. Once installed, make sure `platform-tools` and `emulator` directories are on your `PATH`, or set `ANDROID_HOME` to your SDK root. + + **Install scrcpy** using your system package manager: + + + ```bash macOS + brew install scrcpy + ``` + + ```bash Ubuntu / Debian + sudo apt install scrcpy + ``` + + + After setting up these tools, verify readiness: + + ```bash + finalrun doctor --platform android + ``` + + + + iOS testing requires macOS. It is not supported on Linux or Windows. + + + iOS tests run on simulators via Xcode's `xcrun simctl`. You need Xcode command line tools and a few standard utilities. + + **Required:** + + | Tool | Notes | + |---|---| + | macOS | iOS simulator support is macOS-only | + | Xcode command line tools | Provides `xcrun` | + | `xcrun simctl` | Used to manage and boot iOS simulators | + | `unzip` | Standard macOS utility | + | `/bin/bash` | Standard macOS shell | + | `plutil` | Standard macOS utility | + | FinalRun iOS driver archives | Installed automatically by the CLI during installation | + + Install Xcode command line tools if you haven't already: + + ```bash + xcode-select --install + ``` + + **Optional tools:** + + | Tool | Purpose | + |---|---| + | `ffmpeg` | Compresses iOS recordings after capture | + | `applesimutils` | Enables simulator permission helpers | + + Install optional tools via Homebrew: + + ```bash + brew install ffmpeg applesimutils + ``` + + After setting up these tools, verify readiness: + + ```bash + finalrun doctor --platform ios + ``` + + + +## Next steps + +Once `finalrun doctor` reports your host as ready, follow the [Quick Start](/quickstart) guide to write and run your first test. diff --git a/mintlify-docs/introduction.mdx b/mintlify-docs/introduction.mdx new file mode 100644 index 0000000..06d3d24 --- /dev/null +++ b/mintlify-docs/introduction.mdx @@ -0,0 +1,83 @@ +--- +title: "What is FinalRun? Natural-language mobile app testing" +description: "FinalRun is an AI-driven CLI that runs mobile app tests written in plain-English YAML. The AI sees your device screen and performs actions on your behalf." +--- + +FinalRun is an AI-driven CLI that tests your Android and iOS apps using natural language. You write test scenarios in YAML — describing actions the way a person would — and FinalRun launches your app on a real device or emulator, uses an AI model to see the screen, and performs each action: tapping, swiping, typing, and verifying the result. When the run finishes, you get a pass/fail report with video, screenshots, and device logs. + +## How it works + +FinalRun connects an AI model to a running device. The AI reads the live screen, interprets your plain-English instructions, decides what action to take, and repeats until your test is complete or a failure is detected. + +Each test file has three phases: + +- **`setup`** — optional actions that prepare clean state before the test starts (e.g., clear app data) +- **`steps`** — the ordered natural-language instructions the AI executes +- **`expected_state`** — the UI state the AI verifies after all steps complete + +```yaml +name: login_smoke +description: Verify that a user can log in and reach the home screen. + +setup: + - Clear app data. + +steps: + - Launch the app. + - Enter ${secrets.email} on the login screen. + - Enter ${secrets.password} on the password screen. + - Tap the login button. + +expected_state: + - The home screen is visible. + - The user's name appears in the header. +``` + +## Workspace structure + +FinalRun looks for a `.finalrun/` directory at the root of your project. Everything lives there: + +| Path | Purpose | +|---|---| +| `.finalrun/config.yaml` | Workspace defaults: app identity, AI model, environment | +| `.finalrun/tests/` | YAML test specs — one file per scenario | +| `.finalrun/suites/` | Suite manifests that group tests into logical collections | +| `.finalrun/env/` | Per-environment bindings for secrets and variables | + +## Supported platforms + +FinalRun runs tests on: + +- **Android** — emulators managed via `adb` and `emulator` +- **iOS** — simulators on macOS via `xcrun simctl` + +## Supported AI models + +FinalRun uses your own AI provider API key (BYOK — bring your own key). No FinalRun account is required to run tests. The model is selected with the `--model` flag or set as the default in `.finalrun/config.yaml`. + +| Provider | Model prefix | Environment variable | +|---|---|---| +| Google Gemini | `google/...` | `GOOGLE_API_KEY` | +| OpenAI GPT | `openai/...` | `OPENAI_API_KEY` | +| Anthropic Claude | `anthropic/...` | `ANTHROPIC_API_KEY` | + + +Test runs consume tokens from your configured AI provider. Standard API billing from your provider applies. + + +## Where to go next + + + + Install FinalRun and run your first test in five minutes. + + + Learn the full test spec format: fields, placeholders, and suites. + + + Every command, flag, and option available in the `finalrun` CLI. + + + Set up your app identity, default model, and environments. + + diff --git a/mintlify-docs/quickstart.mdx b/mintlify-docs/quickstart.mdx new file mode 100644 index 0000000..d0b72ab --- /dev/null +++ b/mintlify-docs/quickstart.mdx @@ -0,0 +1,163 @@ +--- +title: "Quick start: install FinalRun and run your first test" +sidebarTitle: "Quick Start" +description: "Install the FinalRun CLI, configure your AI provider key, write a YAML test spec, and execute it on an Android emulator or iOS simulator." +--- + +This guide walks you through installing FinalRun, creating a minimal workspace, writing a test, and executing it on a connected Android emulator or iOS simulator. By the end you'll have a working test run and know where to look for results. + + + + Run the one-line install script. It sets up Node.js (if needed), installs the CLI globally, and installs the platform driver assets for Android and iOS. + + ```bash + curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.sh | bash + ``` + + Alternatively, if you already have Node.js 20 or later, you can install via npm: + + ```bash + npm install -g @finalrun/finalrun-agent + ``` + + + + Check that the CLI is available and that your host has the required platform tools: + + ```bash + finalrun --help + finalrun doctor + ``` + + `finalrun doctor` checks for `adb`, `emulator`, `scrcpy` (Android), and Xcode command line tools (iOS). Fix any issues it reports before continuing. + + + Run `finalrun check` from inside your project directory before executing tests. It validates your workspace config, environment bindings, and suite manifests without launching a device. + + + + + FinalRun uses your own AI provider API key. Create a `.env` file at the root of your project with the key for your chosen provider: + + + ```bash Google Gemini + echo "GOOGLE_API_KEY=your-key-here" > .env + ``` + + ```bash OpenAI + echo "OPENAI_API_KEY=your-key-here" > .env + ``` + + ```bash Anthropic Claude + echo "ANTHROPIC_API_KEY=your-key-here" > .env + ``` + + + Add `.env` to your `.gitignore` so secrets are never committed. + + + Each test run makes real AI API calls that consume tokens. Standard billing from your AI provider applies. + + + + + Create the `.finalrun/` directory and a folder for your test specs: + + ```bash + mkdir -p .finalrun/tests + ``` + + Then create `.finalrun/config.yaml` with your app's identity and default settings: + + ```yaml + app: + name: MyApp + packageName: com.example.myapp + bundleId: com.example.myapp + env: dev + model: google/gemini-3-flash-preview + ``` + + | Field | Description | + |---|---| + | `app.packageName` | Android package identifier | + | `app.bundleId` | iOS bundle identifier | + | `env` | Default environment name (can be omitted if not using env files) | + | `model` | Default AI model in `provider/model` format | + + At least one of `app.packageName` or `app.bundleId` is required. + + + + Create `.finalrun/tests/login_smoke.yaml` with a basic login scenario: + + ```yaml + name: login_smoke + description: Verify that a user can log in and reach the home screen. + + setup: + - Clear app data. + + steps: + - Launch the app. + - Enter ${secrets.email} on the login screen. + - Enter ${secrets.password} on the password screen. + - Tap the login button. + + expected_state: + - The home screen is visible. + - The user's name appears in the header. + ``` + + The `${secrets.email}` and `${secrets.password}` placeholders are resolved from your `.env` file or environment variables at runtime. To use them, add the values to your `.env`: + + ```bash + TEST_USER_EMAIL=test@example.com + TEST_USER_PASSWORD=password123 + ``` + + Then declare the bindings in `.finalrun/env/dev.yaml`: + + ```yaml + secrets: + email: ${TEST_USER_EMAIL} + password: ${TEST_USER_PASSWORD} + ``` + + + + Run your test on Android or iOS. The `--model` flag can be omitted if you set a default in `config.yaml`. + + + ```bash Android + finalrun test login_smoke.yaml --platform android --model google/gemini-3-flash-preview + ``` + + ```bash iOS + finalrun test login_smoke.yaml --platform ios --model google/gemini-3-flash-preview + ``` + + + FinalRun will boot the emulator or simulator, install the app, and begin executing the test steps. You'll see live output as the AI works through each action. + + + + List your recent test runs: + + ```bash + finalrun runs + ``` + + Open the local report UI in your browser to see screenshots, video, and device logs: + + ```bash + finalrun start-server + ``` + + + +## What's next + +- [YAML test format](/tests/yaml-format) — learn all test spec fields, suite manifests, and placeholder syntax +- [CLI reference](/running/cli-reference) — full list of commands and flags +- [Workspace configuration](/configuration/workspace) — app identity, per-environment overrides, and `--app` flag diff --git a/mintlify-docs/running/ai-agent-skills.mdx b/mintlify-docs/running/ai-agent-skills.mdx new file mode 100644 index 0000000..274e890 --- /dev/null +++ b/mintlify-docs/running/ai-agent-skills.mdx @@ -0,0 +1,120 @@ +--- +title: "FinalRun AI agent skills: generate, run, and fix tests" +sidebarTitle: "AI Agent Skills" +description: "Install and use FinalRun's three AI agent skills — generate-test, use-cli, and test-and-fix — to automate mobile test creation, execution, and debugging." +--- + +FinalRun ships three AI coding agent skills that let your AI coding agent generate tests, run them, and fix failures — all from chat. The skills work with any agent that supports the skills protocol, including Claude Code, Cursor, Windsurf, and similar tools. + +## Install the skills + +Run the following command once in your repository to install all three skills into your coding agent: + +```sh +npx skills add final-run/finalrun-agent +``` + + + These skills work with any AI coding agent that supports the skills protocol. You do not need to configure them separately for each tool. + + +## Available skills + + + + Reads your app's source code, infers the app identity, and generates complete YAML test specs with setup, steps, and expected state — organized by feature folder. + + + Validates and runs tests once they are generated. Handles flag selection, env binding, and post-run artifact inspection. + + + Orchestrates the full generate → run → diagnose → fix loop. Keeps iterating until the run is green or a genuine blocker is hit. + + + +--- + +## /finalrun-generate-test + +The `/finalrun-generate-test` skill reads your app's source code, infers the app identity (package name or bundle ID), and generates complete test specs organized by feature folder under `.finalrun/tests/`. + +**Example usage** + +```sh +/finalrun-generate-test Generate tests for the authentication feature — cover login with valid credentials, login with wrong password, and logout +``` + +When you invoke this skill, the agent works through the following sequence: + + + + Reads relevant application source code to understand the UI, user flows, and infer the app's package name or bundle ID. + + + Creates or updates `.finalrun/config.yaml` with the app identity and scaffolds environment bindings under `.finalrun/env/`. + + + Presents the proposed test files, feature folder structure, and environment binding strategy for your approval before writing any files. + + + After you approve, writes YAML specs under `.finalrun/tests//` and a matching suite manifest under `.finalrun/suites/`. + + + Runs `finalrun check` to validate the workspace, bindings, and generated specs. Fixes any issues until the check passes. + + + + + The agent will not write test files until you explicitly approve the proposed plan in step 3. Review the proposed paths and binding strategy before confirming. + + +--- + +## /finalrun-use-cli + +The `/finalrun-use-cli` skill validates and runs your tests once they are generated. It handles CLI flag selection, env binding, and post-run artifact inspection — including reading `result.json`, screenshots, and device logs on failure. + +**Example usage** + +```sh +/finalrun-use-cli Run the auth tests on Android +``` + +The agent validates the workspace with `finalrun check` before executing, explains the exact command it will run, and summarizes the outcome with artifact links when done. + +--- + +## /finalrun-test-and-fix + +The `/finalrun-test-and-fix` skill orchestrates the full **generate → run → diagnose → fix** loop. It calls `/finalrun-generate-test` to author tests, runs them via `/finalrun-use-cli`, reads the CLI artifacts on failure, classifies whether the bug is in the app code or the test spec, applies the narrowest possible fix, and re-runs until the run is green. + +**Example usage** + +```sh +/finalrun-test-and-fix Verify and fix the checkout feature end-to-end on Android +``` + +The agent stops the loop only when the run is green, or when execution is genuinely blocked — for example, no emulator is available, a required secret is missing, or you explicitly opt out. In a blocked state it prints the exact command for you to run locally. + + + Use `/finalrun-test-and-fix` as your default entry point when you finish a UI feature. It covers the full cycle without needing to invoke the other two skills manually. + + +--- + +## Auto-triggering FinalRun after feature work + +You can configure your AI agent to automatically generate and run FinalRun tests whenever it finishes a UI feature — without you having to ask explicitly. Add the autotrigger content to your `AGENTS.md` file: + + + + ```sh + npx skills add final-run/finalrun-agent + ``` + + + Add a definition-of-done rule to your `AGENTS.md` that instructs the agent to run the `/finalrun-test-and-fix` skill automatically after every UI change. The rule tells the agent not to mark a task as done until FinalRun coverage is updated and the run is green. + + + +With the autotrigger in place, your agent treats a passing FinalRun run as a hard requirement for every UI task — no separate prompt needed. diff --git a/mintlify-docs/running/cli-reference.mdx b/mintlify-docs/running/cli-reference.mdx new file mode 100644 index 0000000..d5b600f --- /dev/null +++ b/mintlify-docs/running/cli-reference.mdx @@ -0,0 +1,150 @@ +--- +title: "FinalRun CLI reference: all commands, flags, and options" +sidebarTitle: "CLI Reference" +description: "Complete reference for all FinalRun CLI commands — test, suite, check, doctor, runs, and server commands — with flags and copy-paste usage examples." +--- + +The `finalrun` CLI is the main interface for validating workspaces, running tests, and inspecting reports. CLI flags always take precedence over settings in `.finalrun/config.yaml`, so you can override any default at the command line without editing your config file. + +## Getting started + +Use these commands to verify your environment before running tests. + + + + Validates your `.finalrun` workspace, environment bindings, selectors, and suite manifests. Falls back to the `env` set in `.finalrun/config.yaml` when `--env` is omitted. + + + Checks host readiness for local Android and iOS runs. Use `--platform` to check a single platform. + + + +```sh +# Check the full workspace +finalrun check + +# Validate a specific environment and platform +finalrun check --env dev --platform android +``` + +## Running tests + + + + Executes one or more YAML specs from `.finalrun/tests/`. + + ```sh + finalrun test [flags] + ``` + + **Examples** + + ```sh + # Run a single test + finalrun test smoke.yaml --platform android --model google/gemini-3-flash-preview + + # Run with a specific app binary + finalrun test smoke.yaml --platform android --app path/to/your.apk + + # Run two specs at once + finalrun test auth/login.yaml auth/logout.yaml --platform ios --env staging + ``` + + + Executes a suite manifest from `.finalrun/suites/`. + + ```sh + finalrun suite [flags] + ``` + + **Example** + + ```sh + finalrun suite auth_smoke.yaml --platform ios --model anthropic/claude-sonnet-4-6 + ``` + + + +### Common flags + +The following flags apply to both `finalrun test` and `finalrun suite`. + + + Target platform for the test run. Required when the platform cannot be inferred from the `--app` extension (`.apk` implies Android, `.app` implies iOS). + + + + AI model to use, in `provider/model` format. For example: `google/gemini-3-flash-preview`, `anthropic/claude-sonnet-4-6`, `openai/gpt-5`. Falls back to the `model` key in `.finalrun/config.yaml`. + + + + Environment name. Selects the matching file at `.finalrun/env/.yaml`. Falls back to the `env` key in `.finalrun/config.yaml`. + + + + Path to the `.apk` or `.app` binary to install and test. Overrides the app identity defined in `.finalrun/config.yaml`. + + + + Override the provider API key for this run. Use `--api-key` for one-off runs; for persistent configuration set the provider environment variable (`GOOGLE_API_KEY`, `OPENAI_API_KEY`, or `ANTHROPIC_API_KEY`). + + + + Enable debug logging for the run. + + + + Cap the number of AI action iterations per step. The run aborts when the limit is reached. + + + + CLI flags always take precedence over `.finalrun/config.yaml`. You can set defaults in config and override them selectively at the command line. + + +## Report commands + +After a test run, FinalRun saves artifacts locally. These commands let you list, browse, and manage those reports. + +| Command | Description | +|---|---| +| `finalrun runs` | Lists local reports from `~/.finalrun/workspaces//artifacts`. | +| `finalrun start-server` | Starts or reuses the local report UI for the current workspace. | +| `finalrun server-status` | Shows the current local report server status. | +| `finalrun stop-server` | Stops the local report server. | + +All report commands accept `--workspace ` to target a workspace other than the current directory. + + + +```sh List runs +finalrun runs + +# Machine-readable output +finalrun runs --json + +# Target a different workspace +finalrun runs --workspace /path/to/other/app +``` + +```sh Report server +# Start the report UI +finalrun start-server + +# Check server status +finalrun server-status + +# Stop the server +finalrun stop-server + +# Target a different workspace +finalrun start-server --workspace /path/to/other/app +``` + + + +## Getting help + +```sh +finalrun --help +finalrun --help +``` diff --git a/mintlify-docs/running/reports.mdx b/mintlify-docs/running/reports.mdx new file mode 100644 index 0000000..f813290 --- /dev/null +++ b/mintlify-docs/running/reports.mdx @@ -0,0 +1,101 @@ +--- +title: "FinalRun test reports: browse artifacts, video, and logs" +sidebarTitle: "Reports" +description: "Explore FinalRun run artifacts, open the local report UI, read result.json, and inspect failures with video playback and device logs." +--- + +After every test run, FinalRun saves a full set of artifacts to disk so you can diagnose failures, replay executions, and track results over time. You can browse these artifacts directly on the filesystem or through the local report UI, which adds video playback, log search, and step-by-step screenshot browsing. + +## Artifact layout + +Each run writes its artifacts to: + +``` +~/.finalrun/workspaces//artifacts/ +``` + +The `` is derived from your project directory, so each repository gets its own isolated artifacts folder. + +| File or folder | Contents | +|---|---| +| `result.json` | Test outcome, failure message, and step-level results. | +| `actions/` | Individual agent action files (JSON) showing the agent's reasoning, the action taken, and whether it succeeded at each step. | +| `screenshots/` | Per-step screenshots showing the device screen at each action. | +| `recording.mp4` / `recording.mov` | Screen recording of the full test run. | +| `device.log` | Device-level logs — logcat on Android, `log stream` on iOS — captured during the run. | +| `runner.log` | The CLI's own log for the entire run, written at the run directory root. | + +## Listing recent runs + +Use `finalrun runs` to list local reports for the current workspace: + + + +```sh Standard output +finalrun runs +``` + +```sh Machine-readable JSON +finalrun runs --json +``` + +```sh Different workspace +finalrun runs --workspace /path/to/other/app +``` + + + +## Opening the report UI + +The local report UI lets you browse run artifacts with video playback, device log search, and step-by-step screenshots. + +```sh +finalrun start-server +``` + +FinalRun starts a local web server (or reuses an existing one) and opens the report UI in your browser. The UI shows all runs for the current workspace. + + + Run `finalrun start-server` immediately after a test failure to replay the recording, step through screenshots, and search device logs — all in sync. This is the fastest way to understand what the AI agent saw on screen at each step. + + +### Report UI features + + + + Watch the full screen recording of any run. Scrub to any moment to see exactly what was on the device. + + + Browse per-action screenshots alongside the agent's action details for each step. + + + Search and filter device logs (logcat or log stream) captured during the run. Filter by log level to focus on errors and warnings. + + + +## Managing the server + +```sh +# Check whether the server is running +finalrun server-status + +# Stop the server +finalrun stop-server +``` + +## Targeting a different workspace + +All report commands accept `--workspace ` when you want to inspect runs from a project other than the current directory: + +```sh +finalrun start-server --workspace /path/to/other/app +finalrun runs --workspace /path/to/other/app +``` + +## Reading result.json + +`result.json` is the canonical record of a run. It contains the overall test outcome (`pass` or `fail`), a human-readable failure message, and step-level results that map directly to the entries in `actions/` and `screenshots/`. When your AI coding agent reads artifacts after a failure, `result.json` is the first file it reads to identify which step failed and why. + + + When the `/finalrun-use-cli` or `/finalrun-test-and-fix` skill diagnoses a failure, it reads `result.json`, the matching `actions/` entry, and the screenshot at the failed step before suggesting any fix. You can do the same inspection manually using `finalrun start-server`. + diff --git a/mintlify-docs/tests/placeholders.mdx b/mintlify-docs/tests/placeholders.mdx new file mode 100644 index 0000000..5905582 --- /dev/null +++ b/mintlify-docs/tests/placeholders.mdx @@ -0,0 +1,106 @@ +--- +title: "Inject secrets and variables into FinalRun test specs" +sidebarTitle: "Placeholders" +description: "Learn how to use ${secrets.*} and ${variables.*} placeholders in FinalRun test specs, how they resolve at run time, and how to configure env files safely." +--- + +FinalRun test specs support two types of placeholders that let you inject dynamic values without hardcoding them in YAML files. You reference a placeholder in a step; FinalRun resolves its value at run time from environment variables or a binding file. This keeps credentials and configuration out of your test source. + +## Placeholder types + +**`${secrets.*}`** — for sensitive values such as credentials and API keys. The logical key (e.g. `secrets.email`) maps to a shell environment variable declared in your binding file. The actual value is never stored in YAML. + +**`${variables.*}`** — for non-sensitive values such as locale codes, search terms, and feature flags. Values are declared directly in the binding file as plain strings. + +Both types must be declared in `.finalrun/env/.yaml` before you use them in a test. + +## Env binding file + +Create a file under `.finalrun/env/` for each environment you use (e.g. `dev.yaml`, `staging.yaml`). Declare secrets as `${SHELL_ENV_VAR}` placeholders and variables as plain values: + +```yaml +secrets: + email: ${TEST_USER_EMAIL} + password: ${TEST_USER_PASSWORD} + +variables: + locale: en-US + search_term: coffee +``` + +The `secrets` entries are placeholders only. The CLI resolves them from shell environment variables and `.env` files at run time. **Do not put real credentials in this YAML file.** + +## Using placeholders in tests + +Reference declared placeholders with `${secrets.}` or `${variables.}` syntax anywhere in `setup`, `steps`, or `expected_state`: + +```yaml +steps: + - Enter ${secrets.email} on the login screen. + - Enter ${secrets.password} on the password screen. + - Type ${variables.search_term} in the search field. +``` + +## Load order + +When you run FinalRun with an env named `N` (e.g. `--env dev`), the CLI resolves values in this order: + + + + The environment-specific dotenv file (e.g. `.env.dev`) is loaded first. + + + Fills in any keys not already set by the environment-specific file. + + + Shell environment variables win if the same key appears in both a file and the current shell environment. + + + + + When no env profile is configured, the CLI uses `process.env` and `.env` directly. You do not need a `.env.` file for a simple single-environment workspace. + + +## Selecting an env profile + +Pass `--env ` to the CLI to activate a specific env profile. The name must match a file under `.finalrun/env/`: + +```bash +finalrun test auth/login.yaml --platform android --model google/gemini-3-flash-preview --env staging +``` + +FinalRun will load `.env.staging`, then `.env`, then `process.env`, and resolve all `${secrets.*}` bindings declared in `.finalrun/env/staging.yaml`. + +## Never commit secrets + + + Do not commit `.env` files to your repository. Add the following lines to your `.gitignore` to exclude all dotenv files while keeping `.env.example` tracked as a template: + + +```gitignore +.env +.env.* +!.env.example +``` + +Use `.env.example` to document which shell variables team members need to export, without including real values. + +## Never hardcode secrets + +Always use placeholder syntax for sensitive values. Hardcoding credentials in a test spec exposes them in version control and in run reports. + + + +```yaml Good — placeholder syntax +steps: + - Enter ${secrets.email} on the login screen. + - Enter ${secrets.password} on the password screen. +``` + +```yaml Bad — hardcoded credentials +steps: + - Enter user@example.com on the login screen. + - Enter hunter2 on the password screen. +``` + + diff --git a/mintlify-docs/tests/suites.mdx b/mintlify-docs/tests/suites.mdx new file mode 100644 index 0000000..1b54521 --- /dev/null +++ b/mintlify-docs/tests/suites.mdx @@ -0,0 +1,63 @@ +--- +title: "FinalRun suite manifests: run groups of tests together" +sidebarTitle: "Suites" +description: "Learn how to create suite manifests that group related test files, organize them by feature, and run an entire suite with a single CLI command." +--- + +A suite is a YAML manifest that groups individual test files into a logical collection you can run together. Rather than running tests one at a time, you define a suite for each feature and run every scenario in it with a single command. Suite manifests live under `.finalrun/suites/`. + +## Suite fields + + + A stable identifier for the suite. Use `snake_case`. This name is what you pass to `finalrun suite` when you want to run it. + + + + A short, human-readable summary of what the suite covers. One or two sentences is enough. + + + + An ordered list of test file paths. Each path is relative to `.finalrun/tests/`. The agent runs the tests in the order listed. + + +## Example suite + +```yaml +name: auth_smoke +description: Covers the authentication smoke scenarios. +tests: + - auth/login.yaml + - auth/logout.yaml +``` + +## Organizing suites by feature + +The recommended convention is one suite per feature folder, mirroring the structure of `.finalrun/tests//`. A suite named `auth_smoke` covering tests in `.finalrun/tests/auth/` is a clear, predictable mapping that makes it easy to find which suite runs a given test. + + + Name your suite files after their feature folder. If your tests live in `.finalrun/tests/checkout/`, name the suite file `.finalrun/suites/checkout.yaml` and give it the `name: checkout` identifier. This one-to-one convention keeps suites discoverable as the test library grows. + + +## Running a suite + +Pass the suite manifest path to `finalrun suite`. Specify a platform and AI model: + +```bash +finalrun suite auth_smoke.yaml --platform android --model google/gemini-3-flash-preview +``` + +You can also target iOS: + +```bash +finalrun suite auth_smoke.yaml --platform ios --model google/gemini-3-flash-preview +``` + +## Validating a suite before running + +Run `finalrun check` with the `--suite` flag to validate the suite manifest and all referenced test files — confirming that paths resolve, placeholders are declared, and the workspace is configured correctly — before spending time on a full test run: + +```bash +finalrun check --suite auth_smoke.yaml +``` + +Fix any errors reported by `finalrun check` before proceeding. The command output is the source of truth for binding correctness and path resolution. diff --git a/mintlify-docs/tests/yaml-format.mdx b/mintlify-docs/tests/yaml-format.mdx new file mode 100644 index 0000000..a208232 --- /dev/null +++ b/mintlify-docs/tests/yaml-format.mdx @@ -0,0 +1,171 @@ +--- +title: "FinalRun YAML test format: fields, phases, examples" +sidebarTitle: "YAML Format" +description: "Learn the FinalRun YAML test format: required fields, the three-phase execution model, allowed actions, and how to write reliable natural-language steps." +--- + +FinalRun test specs are plain YAML files stored under `.finalrun/tests/`. Each file defines a single test scenario using natural-language steps that the AI agent executes on a real device or emulator. You describe what a user would do; FinalRun taps, swipes, types, and verifies on your behalf. + +## Test fields + +Every test file follows a fixed schema. The `name` and `steps` fields are required; all others are optional. + + + A stable, unique identifier for the test scenario. Use `snake_case`. This value appears in run reports and suite manifests, so keep it descriptive and consistent across renames. + + + + A short, human-readable summary of what the test validates. One or two sentences is enough. + + + + Actions the agent runs before the main steps to prepare a clean starting state. Every setup block must be idempotent — see [Setup and idempotent cleanup](#setup-and-idempotent-cleanup) below. + + + + An ordered list of natural-language steps the agent executes. Each step must use an action from the [allowed action vocabulary](#allowed-action-vocabulary). + + + + The expected UI state after all steps are complete. These are boolean conditions the agent checks against the final screen — not actions to perform. If every condition is met, the test passes; if any fail, the test fails. + + +## Three-phase execution model + +At runtime, the agent executes every test in three sequential phases: + + + + The agent runs any `setup` steps to guarantee a clean starting state, regardless of what a previous run may have left behind. + + + The agent performs each `steps` entry in order — tapping, typing, swiping, and verifying as instructed. + + + The agent checks each `expected_state` condition against the final screen. The test succeeds only when all conditions pass. + + + +## Example: login smoke test + +```yaml +name: login_smoke +description: Verify that a user can log in and reach the home screen. + +setup: + - Clear app data. + +steps: + - Launch the app. + - Enter ${secrets.email} on the login screen. + - Enter ${secrets.password} on the password screen. + - Tap the login button. + +expected_state: + - The home screen is visible. + - The user's name appears in the header. +``` + + + The `${secrets.email}` and `${secrets.password}` placeholders are resolved at run time from environment variables or `.env` files. See [Placeholders](/tests/placeholders) for details. + + +## Allowed action vocabulary + +Every step in `setup` or `steps` must use one of the following verbs. Do not write steps that require actions outside this list. + +| Verb to use in steps | What the agent does | Needs a UI target? | +|---|---|---| +| **Tap** / Click | Taps the specified element | Yes | +| **Long press** | Long-presses the specified element | Yes | +| **Type** / Enter text | Inputs text into the specified field | Yes | +| **Swipe** / Scroll | Swipes in a direction over the specified area | Yes | +| **Navigate back** | Presses the device back button | No | +| **Go to home screen** | Returns to the device home screen | No | +| **Rotate device** | Rotates the device orientation | No | +| **Hide keyboard** | Dismisses the on-screen keyboard | No | +| **Open URL / deeplink** | Opens a URL or deeplink | No | +| **Set location** | Sets the device GPS location | Yes (coordinates) | +| **Wait** | Pauses execution | No | +| **Verify** / Check | Visually inspects the screen for a condition | Yes (what to verify) | + + + **Verify** is the one step type that is not a device action. Use it in `setup` to confirm cleanup succeeded, and in `steps` to confirm intermediate states before critical actions. + + +## Writing good steps + +Good steps are specific and reference actual UI labels — the text or label visible on screen, not internal component names. + +- Reference the exact label: `Tap the Login button`, not `Tap the button`. +- Name the screen when it matters: `Enter the password on the Password screen`. +- Add inline `Verify` steps before critical actions so failures are caught with a clear message rather than a confusing grounding error: + +```yaml +steps: + - Verify the hamburger menu icon is visible in the top-left corner of the toolbar. + - Tap the hamburger menu icon in the top-left corner of the toolbar. +``` + +- Use `Verify` steps in `steps` to confirm intermediate states during multi-step flows. +- Reserve `expected_state` for the final screen only. Do not put navigation or interaction instructions there. + +### Avoid verifying ephemeral UI + +Do not assert on toasts, snackbars, or transient banners in `steps` or `expected_state`. These short-lived messages disappear on their own timer and can race against the agent's verification step. Verify the persistent consequence instead — the updated list, the changed badge count, the screen that appeared. + +```yaml +# Good — verifies a persistent outcome +expected_state: + - The item appears in the shopping cart. + +# Bad — toast may have already dismissed +expected_state: + - The "Added to cart" toast is visible. +``` + +## Positional strictness + +When a step specifies the position of a UI element — `top-left corner`, `in the header`, `first item` — the agent treats that position as a strict assertion. If the element is not found at the described location, the test fails; the agent will not search elsewhere. + +Use positional context when the element's location is part of what you are testing. Omit it when you only need to confirm the element exists, so the agent can scroll to find it. + +```yaml +# Position matters — include it +expected_state: + - The navigation drawer is open and visible on the left side of the screen. + - The profile avatar is visible at the top of the drawer. + +# Position doesn't matter — keep it generic +expected_state: + - The navigation drawer is open. + - The profile avatar is visible. +``` + +The second `expected_state` block above is too vague — `The navigation drawer is open` could match an unintended element. The first block is spatially precise and will only pass if the layout matches exactly. + +## Setup and idempotent cleanup + +Every test must be idempotent: assume it has already run and failed. If a previous run added data, enabled a toggle, or navigated to a new screen, your `setup` must reverse that state before the test begins. + +| If the test validates... | Setup must... | +|---|---| +| **Adding** an item | Check if the item exists and delete it first. | +| **Deleting** an item | Check if the item exists and add it first if missing. | +| **Enabling** a toggle | Disable the toggle first if it is already on. | +| **Moving or reordering** | Reset the list to a known default order first. | + +Always add a `Verify` step after each cleanup action to confirm the app is in the expected starting state. If cleanup fails, the test will fail early in setup rather than produce a misleading failure in the main steps. + +```yaml +setup: + - Navigate to the Shopping List screen. + - If the item 'Milk' is visible, swipe left on it and tap Delete. + - Verify that 'Milk' is no longer visible on the Shopping List screen. +``` + +## File organization + + + Group tests by feature under `.finalrun/tests//`. For example, authentication tests belong in `.finalrun/tests/auth/`, and onboarding tests in `.finalrun/tests/onboarding/`. This mirrors the suite structure and makes it easy to run all tests for a given feature at once. + diff --git a/mintlify-docs/troubleshooting.mdx b/mintlify-docs/troubleshooting.mdx new file mode 100644 index 0000000..bd5ddd0 --- /dev/null +++ b/mintlify-docs/troubleshooting.mdx @@ -0,0 +1,160 @@ +--- +title: "Troubleshoot FinalRun: common errors and device setup" +sidebarTitle: "Troubleshooting" +description: "Diagnose and fix common FinalRun errors — missing workspace, unconfigured API keys, missing device tools — and verify host readiness with finalrun doctor." +--- + +When something goes wrong during setup or a test run, the error message usually tells you exactly what FinalRun expected and where to look. The sections below cover the most common errors, their causes, and the steps to resolve them. After fixing a configuration issue, run `finalrun doctor` to confirm your environment is ready before re-running your tests. + +## Common errors + + + + FinalRun finds your workspace by walking up from your current directory until it finds a folder that contains `.finalrun/`. If you run `finalrun` from outside your app repository — or before creating the workspace — it cannot locate the directory. + + **Fix:** Make sure your shell is inside the app repository where `.finalrun/tests/` exists. You can confirm the structure is in place with: + + ```sh + ls .finalrun/tests/ + ``` + + If the directory is missing, follow the workspace setup guide to initialize your `.finalrun/` folder before running any commands. + + + + FinalRun reads your AI provider API key from the environment. The key it looks for depends on the provider prefix in your `--model` value or your `.finalrun/config.yaml` default. + + | Model prefix | Required environment variable | + |---|---| + | `openai/...` | `OPENAI_API_KEY` | + | `google/...` | `GOOGLE_API_KEY` | + | `anthropic/...` | `ANTHROPIC_API_KEY` | + + **Fix:** Set the correct variable in your shell or in a `.env` file at your workspace root. For example, if you are using a Google model: + + ```sh + echo "GOOGLE_API_KEY=your-key-here" >> .env + ``` + + FinalRun loads `.env` from the workspace root (the folder containing `.finalrun/`). You can also pass the key directly with the `--api-key` flag to override the environment for a single run. + + + Do not commit `.env` to version control. Add `.env` and `.env.*` to your `.gitignore`, keeping `.env.example` tracked as a template. + + + + + FinalRun requires a running Android Virtual Device (AVD) before it can connect and start a test. If no emulator is active when you run a test, FinalRun cannot proceed. + + **Fix:** Start an emulator before running your test. You can do this from the command line: + + ```sh + emulator -avd + ``` + + Or launch one from the **Device Manager** in Android Studio. Once the emulator has fully booted, verify that FinalRun can detect it: + + ```sh + finalrun doctor --platform android + ``` + + + + FinalRun depends on `scrcpy` for Android screen recording and `adb` (Android Debug Bridge) for device communication. If either tool is missing from your `PATH`, Android tests cannot run. + + **Fix:** Install both tools with Homebrew on macOS: + + ```sh + brew install scrcpy android-platform-tools + ``` + + After installation, verify that FinalRun can find all required Android tools: + + ```sh + finalrun doctor + ``` + + + `android-platform-tools` provides `adb`. Make sure `ANDROID_HOME` or `ANDROID_SDK_ROOT` is set in your shell so FinalRun can locate the full Android SDK. + + + + + When a test spec references a value like `${secrets.email}`, FinalRun looks it up in your environment binding file (`.finalrun/env/.yaml`) and then resolves the underlying variable from your shell environment or `.env` file. If either piece is missing, the placeholder cannot be resolved. + + **Fix:** Check two things: + + 1. The binding is declared in `.finalrun/env/.yaml` using the `${ENV_VAR}` placeholder syntax: + + ```yaml + secrets: + email: ${TEST_USER_EMAIL} + password: ${TEST_USER_PASSWORD} + ``` + + 2. The actual value is present in your shell environment or in the workspace-root `.env` (or `.env.`) file: + + ```sh + echo "TEST_USER_EMAIL=user@example.com" >> .env + ``` + + The environment name `` must match the `--env` flag you pass (or the `env` value in `.finalrun/config.yaml`). + + + + The `--app` flag expects a path to an existing binary that matches the target platform: an `.apk` file for Android or a `.app` directory for iOS. If the path does not exist or the file type does not match the `--platform` value, FinalRun rejects it. + + **Fix:** Verify the path exists and the file matches the platform you are targeting: + + ```sh + # Android + finalrun test smoke.yaml --platform android --app path/to/your.apk + + # iOS + finalrun test smoke.yaml --platform ios --app path/to/YourApp.app + ``` + + If you omit `--app`, FinalRun uses the app identity defined in `.finalrun/config.yaml`. + + + + In earlier versions of FinalRun, running the CLI from within AI coding agent terminals (such as Claude Code or Cursor) could cause TTY-related errors because those environments do not always provide a standard terminal interface. + + This issue has been resolved. Upgrade to the latest version of FinalRun to pick up the fix: + + ```sh + curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.sh | bash + ``` + + + +## Verify host readiness with finalrun doctor + +Before running tests, use `finalrun doctor` to check that all required tools and platform dependencies are installed and reachable. The command prints a tick/cross summary for each dependency. + +```sh +# Check both Android and iOS +finalrun doctor + +# Check Android only +finalrun doctor --platform android + +# Check iOS only +finalrun doctor --platform ios +``` + +Fix any items marked with a cross before running tests. For Android, the required tools are `adb`, `emulator`, and `scrcpy`. For iOS (macOS only), FinalRun requires Xcode command line tools with `xcrun simctl`. + +## Getting more information + +If an error is not covered above or you need more detail during a failing run, use the `--debug` flag to enable verbose logging: + +```sh +finalrun test smoke.yaml --platform android --debug +``` + +After a run completes, use `finalrun start-server` to open the visual report UI and inspect screenshots, video, and device logs for the failed run. + +If you are still stuck, join the FinalRun community on Slack — the team and other users are active there: + +[Join the FinalRun Slack community](https://join.slack.com/t/finalrun-community/shared_invite/zt-38qg6q9fq-9L87nNF8aX4HZ8_pn9KBgw) From 440dd4a70d9816debb8253b5973e579a9c79e9a6 Mon Sep 17 00:00:00 2001 From: ashish Date: Sun, 19 Apr 2026 17:50:41 -0700 Subject: [PATCH 22/80] docs(mintlify): rework landing page with hero video and local logos - Merge introduction.mdx content into index.mdx as the landing page: opener, YouTube demo embed, three-phase model with YAML example, and consolidated nav cards - Drop the duplicated 4-step Quickstart preview from the landing page - Replace hosted logo URLs with local light/dark PNGs under mintlify-docs/logo/ - Remove introduction.mdx from Get Started; platforms/providers/workspace content already lives in their dedicated pages Co-Authored-By: Claude Opus 4.7 --- mintlify-docs/docs.json | 5 +- mintlify-docs/index.mdx | 84 +++++++++--------- mintlify-docs/introduction.mdx | 83 ----------------- .../logo/finalrun-logo-dark-theme.png | Bin 0 -> 12090 bytes mintlify-docs/logo/finalrun-logo.png | Bin 0 -> 4925 bytes 5 files changed, 44 insertions(+), 128 deletions(-) delete mode 100644 mintlify-docs/introduction.mdx create mode 100644 mintlify-docs/logo/finalrun-logo-dark-theme.png create mode 100644 mintlify-docs/logo/finalrun-logo.png diff --git a/mintlify-docs/docs.json b/mintlify-docs/docs.json index 06cc9ac..be13eda 100644 --- a/mintlify-docs/docs.json +++ b/mintlify-docs/docs.json @@ -8,8 +8,8 @@ "dark": "#6c4cfc" }, "logo": { - "light": "https://media.brand.dev/3ece83f5-5543-48fb-bd61-213417b8ba98.png", - "dark": "https://media.brand.dev/3ece83f5-5543-48fb-bd61-213417b8ba98.png" + "light": "/logo/finalrun-logo.png", + "dark": "/logo/finalrun-logo-dark-theme.png" }, "favicon": "https://media.brand.dev/31164f48-397c-4036-929b-8153e11d15c1.jpg", "navbar": { @@ -26,7 +26,6 @@ { "group": "Get Started", "pages": [ - "introduction", "quickstart", "installation" ] diff --git a/mintlify-docs/index.mdx b/mintlify-docs/index.mdx index 110b2eb..1df27ee 100644 --- a/mintlify-docs/index.mdx +++ b/mintlify-docs/index.mdx @@ -3,60 +3,60 @@ title: "FinalRun: AI-powered Android and iOS test automation" description: "FinalRun lets you write plain-English test specs in YAML and run them on Android and iOS using AI models like Gemini, GPT, or Claude." --- -FinalRun is an AI-driven testing CLI that runs your Android and iOS apps through real user scenarios — written in plain English, executed by AI on a real device or emulator. You describe what to test in a YAML file; FinalRun taps, swipes, types, and verifies on your behalf, then produces a pass/fail report with video, screenshots, and device logs. +FinalRun is an AI-driven CLI that tests your Android and iOS apps using natural language. You write scenarios in YAML — describing actions the way a person would — and FinalRun launches your app on a real device or emulator, uses an AI model to see the screen, and performs each action: tapping, swiping, typing, and verifying the result. When the run finishes, you get a pass/fail report with video, screenshots, and device logs. + + + +## 1. Sign up + +Create an account at [https://cloud.finalrun.app](https://cloud.finalrun.app). + +## 2. Generate an API key + +In the dashboard, open the **API keys** section and create a new key. Copy the value — it is shown only once. + +## 3. Set `FINALRUN_API_KEY` + +The CLI reads `FINALRUN_API_KEY` from the same sources as provider keys. + + + + Add the key to a `.env` file at your workspace root: + + ```bash + echo "FINALRUN_API_KEY=your-key-here" > .env + ``` + + See [Managing environments and secrets](/configuration/environments) for dotenv load order details. + + + Export the variable in your shell session or CI environment: + + ```bash + export FINALRUN_API_KEY=your-key-here + ``` + + Shell variables take the highest priority in FinalRun's load order. + + + Pass the key directly as a CLI flag: + + ```bash + finalrun cloud test smoke.yaml --api-key your-key-here + ``` + + + +## Run a cloud test + +```bash +finalrun cloud test smoke.yaml --platform android +``` + + + Prefer to use your own AI provider account instead? See [AI Providers](/configuration/ai-providers) for the bring-your-own-key setup. + diff --git a/mintlify-docs/docs.json b/mintlify-docs/docs.json index 5bd3083..de91abc 100644 --- a/mintlify-docs/docs.json +++ b/mintlify-docs/docs.json @@ -44,7 +44,8 @@ "pages": [ "configuration/workspace", "configuration/environments", - "configuration/ai-providers" + "configuration/ai-providers", + "configuration/cloud-api-key" ] }, { diff --git a/scripts/install.ps1 b/scripts/install.ps1 index 1100185..fcb1dc5 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -335,6 +335,41 @@ function Read-SkillsPrompt { } } +function Test-ApiKeys { + Write-Heading "" + Write-Heading "── AI Provider Key ──" + Write-Heading "" + + $detected = @() + foreach ($var in @('FINALRUN_API_KEY', 'ANTHROPIC_API_KEY', 'OPENAI_API_KEY', 'GOOGLE_API_KEY')) { + if ([Environment]::GetEnvironmentVariable($var)) { + $detected += $var + } + } + + if ($detected.Count -gt 0) { + foreach ($v in $detected) { + Write-Success "$v detected" + } + return + } + + Write-Notice "No API key detected." + Write-Heading "" + Write-Heading ' Fastest way to get started — FinalRun Cloud (free $5 credits):' + Write-Heading "" + Write-Heading " Sign up at https://cloud.finalrun.app" + Write-Heading "" + Write-Heading " Prefer your own AI provider account? Bring your own key:" + Write-Heading "" + Write-Heading " ANTHROPIC_API_KEY → anthropic/claude-* models" + Write-Heading " OPENAI_API_KEY → openai/gpt-* models" + Write-Heading " GOOGLE_API_KEY → google/gemini-* models" + Write-Heading "" + Write-Heading " Set via .env (workspace root), shell export, or --api-key." + Write-Heading " Docs: https://docs.finalrun.app/configuration/ai-providers" +} + function Show-CISummary { param([string]$FinalRunDir) $binDir = Join-Path $FinalRunDir 'bin' @@ -443,6 +478,7 @@ function Invoke-Main { } Read-SkillsPrompt + Test-ApiKeys Show-Summary -BinPath $binPath -RuntimeDir $runtimeDir -AndroidOk $androidOk -FinalRunDir $finalRunDir } diff --git a/scripts/install.sh b/scripts/install.sh index ae3e9a0..060eff1 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -140,6 +140,7 @@ main() { setup_host_tools run_doctor prompt_skills + check_api_keys print_summary exit 0 @@ -430,6 +431,42 @@ prompt_skills() { fi } +check_api_keys() { + echo "" + info "── AI Provider Key ──" + echo "" + + local detected=() + local var + for var in FINALRUN_API_KEY ANTHROPIC_API_KEY OPENAI_API_KEY GOOGLE_API_KEY; do + if [ -n "${!var:-}" ]; then + detected+=("$var") + fi + done + + if [ ${#detected[@]} -gt 0 ]; then + for var in "${detected[@]}"; do + ok "$var detected" + done + return + fi + + warn "No API key detected." + echo "" + echo " Fastest way to get started — FinalRun Cloud (free \$5 credits):" + echo "" + echo " Sign up at https://cloud.finalrun.app" + echo "" + echo " Prefer your own AI provider account? Bring your own key:" + echo "" + echo " ANTHROPIC_API_KEY → anthropic/claude-* models" + echo " OPENAI_API_KEY → openai/gpt-* models" + echo " GOOGLE_API_KEY → google/gemini-* models" + echo "" + echo " Set via .env (workspace root), shell export, or --api-key." + echo " Docs: https://docs.finalrun.app/configuration/ai-providers" +} + print_summary() { echo "" info "── Summary ──" From fec8fc310cd654e70cf0482ce32c085c8dc50ec8 Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 15:22:05 -0700 Subject: [PATCH 70/80] installer: clickable URLs in the no-key block Wrap the cloud signup URL and both docs URLs in OSC 8 hyperlink escapes so they're single-click in modern terminals (iTerm2, Terminal.app, Windows Terminal, kitty, alacritty, WezTerm, gnome-terminal). Older terminals strip the escapes and show the bare URL, so the change degrades cleanly. Also adds a Docs: link under the FinalRun Cloud option (mirrors the existing one for BYOK), and aligns "Sign up at" -> "Sign up:" so the two labels line up. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install.ps1 | 14 ++++++++++++-- scripts/install.sh | 10 ++++++++-- 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/scripts/install.ps1 b/scripts/install.ps1 index fcb1dc5..722f0bd 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -64,6 +64,15 @@ function Write-Success { param([string]$Msg) Write-Host " ✓ $Msg" -Foreground function Write-Notice { param([string]$Msg) Write-Host " ⚠ $Msg" -ForegroundColor Yellow } function Write-Failure { param([string]$Msg) Write-Host " ✗ $Msg" -ForegroundColor Red } +# OSC 8 hyperlink — single-click in modern terminals (Windows Terminal, +# iTerm2, kitty, etc.). Older terminals strip the escapes and show the +# bare URL, so it degrades cleanly. +function Format-Link { + param([string]$Url) + $esc = [char]27 + "$esc]8;;$Url$esc\$Url$esc]8;;$esc\" +} + # --------------------------------------------------------------------------- # Step helpers # --------------------------------------------------------------------------- @@ -358,7 +367,8 @@ function Test-ApiKeys { Write-Heading "" Write-Heading ' Fastest way to get started — FinalRun Cloud (free $5 credits):' Write-Heading "" - Write-Heading " Sign up at https://cloud.finalrun.app" + Write-Heading " Sign up: $(Format-Link 'https://cloud.finalrun.app')" + Write-Heading " Docs: $(Format-Link 'https://docs.finalrun.app/configuration/cloud-api-key')" Write-Heading "" Write-Heading " Prefer your own AI provider account? Bring your own key:" Write-Heading "" @@ -367,7 +377,7 @@ function Test-ApiKeys { Write-Heading " GOOGLE_API_KEY → google/gemini-* models" Write-Heading "" Write-Heading " Set via .env (workspace root), shell export, or --api-key." - Write-Heading " Docs: https://docs.finalrun.app/configuration/ai-providers" + Write-Heading " Docs: $(Format-Link 'https://docs.finalrun.app/configuration/ai-providers')" } function Show-CISummary { diff --git a/scripts/install.sh b/scripts/install.sh index 060eff1..c5a5589 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -33,6 +33,11 @@ ok() { printf "${GREEN} ✓ %s${RESET}\n" "$*"; } warn() { printf "${YELLOW} ⚠ %s${RESET}\n" "$*"; } fail() { printf "${RED} ✗ %s${RESET}\n" "$*"; } +# OSC 8 hyperlink — single-click in modern terminals (iTerm2, Terminal.app, +# Windows Terminal, kitty, alacritty, WezTerm, gnome-terminal). Older +# terminals strip the escapes and show the bare URL, so it degrades cleanly. +link() { printf '\033]8;;%s\033\\%s\033]8;;\033\\' "$1" "$1"; } + GITHUB_REPO="final-run/finalrun-agent" # --------------------------------------------------------------------------- @@ -455,7 +460,8 @@ check_api_keys() { echo "" echo " Fastest way to get started — FinalRun Cloud (free \$5 credits):" echo "" - echo " Sign up at https://cloud.finalrun.app" + echo " Sign up: $(link 'https://cloud.finalrun.app')" + echo " Docs: $(link 'https://docs.finalrun.app/configuration/cloud-api-key')" echo "" echo " Prefer your own AI provider account? Bring your own key:" echo "" @@ -464,7 +470,7 @@ check_api_keys() { echo " GOOGLE_API_KEY → google/gemini-* models" echo "" echo " Set via .env (workspace root), shell export, or --api-key." - echo " Docs: https://docs.finalrun.app/configuration/ai-providers" + echo " Docs: $(link 'https://docs.finalrun.app/configuration/ai-providers')" } print_summary() { From 4d429dc8b8fe60c9fd1eaddf1f4654afefbb0348 Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 15:26:28 -0700 Subject: [PATCH 71/80] installer: revert OSC 8 hyperlinks; rely on terminal URL detection MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OSC 8 hyperlinks aren't supported by macOS Terminal.app, which is the most common terminal for the target audience. Plain URL text is auto-detected as clickable (cmd-click on macOS, ctrl-click on Windows/Linux) by every modern terminal — Terminal.app, iTerm2, VS Code, Windows Terminal, gnome-terminal, kitty, alacritty, WezTerm — so dropping the OSC 8 wrapping is universally compatible at the cost of a minor styling difference in iTerm2/etc. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install.ps1 | 15 +++------------ scripts/install.sh | 11 +++-------- 2 files changed, 6 insertions(+), 20 deletions(-) diff --git a/scripts/install.ps1 b/scripts/install.ps1 index 722f0bd..c420308 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -64,15 +64,6 @@ function Write-Success { param([string]$Msg) Write-Host " ✓ $Msg" -Foreground function Write-Notice { param([string]$Msg) Write-Host " ⚠ $Msg" -ForegroundColor Yellow } function Write-Failure { param([string]$Msg) Write-Host " ✗ $Msg" -ForegroundColor Red } -# OSC 8 hyperlink — single-click in modern terminals (Windows Terminal, -# iTerm2, kitty, etc.). Older terminals strip the escapes and show the -# bare URL, so it degrades cleanly. -function Format-Link { - param([string]$Url) - $esc = [char]27 - "$esc]8;;$Url$esc\$Url$esc]8;;$esc\" -} - # --------------------------------------------------------------------------- # Step helpers # --------------------------------------------------------------------------- @@ -367,8 +358,8 @@ function Test-ApiKeys { Write-Heading "" Write-Heading ' Fastest way to get started — FinalRun Cloud (free $5 credits):' Write-Heading "" - Write-Heading " Sign up: $(Format-Link 'https://cloud.finalrun.app')" - Write-Heading " Docs: $(Format-Link 'https://docs.finalrun.app/configuration/cloud-api-key')" + Write-Heading " Sign up: https://cloud.finalrun.app" + Write-Heading " Docs: https://docs.finalrun.app/configuration/cloud-api-key" Write-Heading "" Write-Heading " Prefer your own AI provider account? Bring your own key:" Write-Heading "" @@ -377,7 +368,7 @@ function Test-ApiKeys { Write-Heading " GOOGLE_API_KEY → google/gemini-* models" Write-Heading "" Write-Heading " Set via .env (workspace root), shell export, or --api-key." - Write-Heading " Docs: $(Format-Link 'https://docs.finalrun.app/configuration/ai-providers')" + Write-Heading " Docs: https://docs.finalrun.app/configuration/ai-providers" } function Show-CISummary { diff --git a/scripts/install.sh b/scripts/install.sh index c5a5589..1380bf6 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -33,11 +33,6 @@ ok() { printf "${GREEN} ✓ %s${RESET}\n" "$*"; } warn() { printf "${YELLOW} ⚠ %s${RESET}\n" "$*"; } fail() { printf "${RED} ✗ %s${RESET}\n" "$*"; } -# OSC 8 hyperlink — single-click in modern terminals (iTerm2, Terminal.app, -# Windows Terminal, kitty, alacritty, WezTerm, gnome-terminal). Older -# terminals strip the escapes and show the bare URL, so it degrades cleanly. -link() { printf '\033]8;;%s\033\\%s\033]8;;\033\\' "$1" "$1"; } - GITHUB_REPO="final-run/finalrun-agent" # --------------------------------------------------------------------------- @@ -460,8 +455,8 @@ check_api_keys() { echo "" echo " Fastest way to get started — FinalRun Cloud (free \$5 credits):" echo "" - echo " Sign up: $(link 'https://cloud.finalrun.app')" - echo " Docs: $(link 'https://docs.finalrun.app/configuration/cloud-api-key')" + echo " Sign up: https://cloud.finalrun.app" + echo " Docs: https://docs.finalrun.app/configuration/cloud-api-key" echo "" echo " Prefer your own AI provider account? Bring your own key:" echo "" @@ -470,7 +465,7 @@ check_api_keys() { echo " GOOGLE_API_KEY → google/gemini-* models" echo "" echo " Set via .env (workspace root), shell export, or --api-key." - echo " Docs: $(link 'https://docs.finalrun.app/configuration/ai-providers')" + echo " Docs: https://docs.finalrun.app/configuration/ai-providers" } print_summary() { From 339c4f5198d9a700b1d281db7f4b27e2366c0b46 Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 15:29:20 -0700 Subject: [PATCH 72/80] installer: underline URLs in the no-key block MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Wrap each URL in ANSI underline (ESC[4m / ESC[24m). Universally supported by Terminal.app, iTerm2, VS Code, Windows Terminal, and conhost — much wider than OSC 8. Click activation still works via each terminal's native URL detection (cmd-click on macOS, ctrl-click on Win/Linux). Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install.ps1 | 12 +++++++++--- scripts/install.sh | 7 ++++--- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/scripts/install.ps1 b/scripts/install.ps1 index c420308..8a04016 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -64,6 +64,12 @@ function Write-Success { param([string]$Msg) Write-Host " ✓ $Msg" -Foreground function Write-Notice { param([string]$Msg) Write-Host " ⚠ $Msg" -ForegroundColor Yellow } function Write-Failure { param([string]$Msg) Write-Host " ✗ $Msg" -ForegroundColor Red } +function Format-Underline { + param([string]$Text) + $esc = [char]27 + "$esc[4m$Text$esc[24m" +} + # --------------------------------------------------------------------------- # Step helpers # --------------------------------------------------------------------------- @@ -358,8 +364,8 @@ function Test-ApiKeys { Write-Heading "" Write-Heading ' Fastest way to get started — FinalRun Cloud (free $5 credits):' Write-Heading "" - Write-Heading " Sign up: https://cloud.finalrun.app" - Write-Heading " Docs: https://docs.finalrun.app/configuration/cloud-api-key" + Write-Heading " Sign up: $(Format-Underline 'https://cloud.finalrun.app')" + Write-Heading " Docs: $(Format-Underline 'https://docs.finalrun.app/configuration/cloud-api-key')" Write-Heading "" Write-Heading " Prefer your own AI provider account? Bring your own key:" Write-Heading "" @@ -368,7 +374,7 @@ function Test-ApiKeys { Write-Heading " GOOGLE_API_KEY → google/gemini-* models" Write-Heading "" Write-Heading " Set via .env (workspace root), shell export, or --api-key." - Write-Heading " Docs: https://docs.finalrun.app/configuration/ai-providers" + Write-Heading " Docs: $(Format-Underline 'https://docs.finalrun.app/configuration/ai-providers')" } function Show-CISummary { diff --git a/scripts/install.sh b/scripts/install.sh index 1380bf6..334ac0d 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -32,6 +32,7 @@ info() { printf "${BOLD}%s${RESET}\n" "$*"; } ok() { printf "${GREEN} ✓ %s${RESET}\n" "$*"; } warn() { printf "${YELLOW} ⚠ %s${RESET}\n" "$*"; } fail() { printf "${RED} ✗ %s${RESET}\n" "$*"; } +underline() { printf '\033[4m%s\033[24m' "$1"; } GITHUB_REPO="final-run/finalrun-agent" @@ -455,8 +456,8 @@ check_api_keys() { echo "" echo " Fastest way to get started — FinalRun Cloud (free \$5 credits):" echo "" - echo " Sign up: https://cloud.finalrun.app" - echo " Docs: https://docs.finalrun.app/configuration/cloud-api-key" + echo " Sign up: $(underline 'https://cloud.finalrun.app')" + echo " Docs: $(underline 'https://docs.finalrun.app/configuration/cloud-api-key')" echo "" echo " Prefer your own AI provider account? Bring your own key:" echo "" @@ -465,7 +466,7 @@ check_api_keys() { echo " GOOGLE_API_KEY → google/gemini-* models" echo "" echo " Set via .env (workspace root), shell export, or --api-key." - echo " Docs: https://docs.finalrun.app/configuration/ai-providers" + echo " Docs: $(underline 'https://docs.finalrun.app/configuration/ai-providers')" } print_summary() { From 7e1892036dc7df97e988b85b665ed28aa1e98e9c Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 16:01:07 -0700 Subject: [PATCH 73/80] installer: skip platform prompt when tools already installed; fix Xcode false positive MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Skip the "Which platform(s) would you like to set up host tools for?" prompt when every relevant platform is already detected as ready. Detection matches the existing setup_* checks: - Android: SDK path resolves AND scrcpy on PATH - iOS (mac only): xcrun -f simctl succeeds AND applesimutils on PATH When all relevant platforms are ready the installer prints one "✓ tools detected — skipping setup." line per platform, sets PLATFORM_CHOICE so run_doctor still verifies, and proceeds. Partial states (e.g. Android ready, iOS not) keep the existing prompt — partial cases are uncommon and tracking them adds surface area. Also fixes a false-positive in setup_ios: the previous check `xcode-select -p` succeeded for both full Xcode AND Command Line Tools-only installs, so users with CLT (often installed for git) saw a misleading "✓ Xcode detected" before the install failed later. `xcrun -f simctl` is the canonical Xcode-vs-CLT signal — simctl ships with Xcode and is absent from CLT-only installs, and `xcrun -f` is a path lookup so it doesn't trigger a license check. The "install Command Line Tools" branch in setup_ios is removed — CLT alone never satisfies iOS testing, so suggesting it as a fix was wrong. Error messages now distinguish the CLT-only case (suggest `sudo xcode-select -s` after installing Xcode) from the nothing-installed case (install Xcode from the App Store). Same skip behavior in install.ps1 (Android-only on Windows). Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install.ps1 | 21 ++++++++++- scripts/install.sh | 92 ++++++++++++++++++++++++++++++++++++++------- 2 files changed, 98 insertions(+), 15 deletions(-) diff --git a/scripts/install.ps1 b/scripts/install.ps1 index 8a04016..9c61299 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -240,6 +240,18 @@ function Install-Runtime { return $runtimeDir } +function Test-AndroidReady { + $androidHome = $env:ANDROID_HOME + if (-not $androidHome) { $androidHome = $env:ANDROID_SDK_ROOT } + $defaultSdk = Join-Path $env:LOCALAPPDATA 'Android\Sdk' + + $sdkPresent = $false + if ($androidHome -and (Test-Path $androidHome)) { $sdkPresent = $true } + if (Test-Path $defaultSdk) { $sdkPresent = $true } + + return ($sdkPresent -and [bool](Get-Command scrcpy -ErrorAction SilentlyContinue)) +} + function Read-AndroidPrompt { Write-Heading "" Write-Heading "── Platform Setup ──" @@ -477,7 +489,14 @@ function Invoke-Main { $runtimeDir = Install-Runtime -Version $version -Platform $platform -FinalRunDir $finalRunDir $androidOk = $false - if (Read-AndroidPrompt) { + if (Test-AndroidReady) { + Write-Heading "" + Write-Heading "── Platform Setup ──" + Write-Heading "" + Write-Success "Android tools detected — skipping setup." + $androidOk = $true + Invoke-Doctor -BinPath $binPath + } elseif (Read-AndroidPrompt) { $androidOk = Install-Android if ($androidOk) { Invoke-Doctor -BinPath $binPath diff --git a/scripts/install.sh b/scripts/install.sh index 334ac0d..477ed3c 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -137,8 +137,10 @@ main() { exec /dev/null 2>&1 +} + +# iOS readiness requires the full Xcode app (not just Command Line Tools). +# `xcrun -f simctl` is the canonical signal: simctl ships with Xcode and is +# absent from CLT-only installs. `xcrun -f` is a path lookup, so it does not +# trigger a license check. +ios_ready() { + [ "$OS" = "darwin" ] || return 1 + xcrun -f simctl >/dev/null 2>&1 && command -v applesimutils >/dev/null 2>&1 +} + +# If every relevant platform is already set up, skip the prompt entirely: +# print confirmations, set PLATFORM_CHOICE so run_doctor still verifies, and +# return 0 so main() bypasses prompt_platform + setup_host_tools. +maybe_skip_platform_setup() { + local android=false ios=false + android_ready && android=true + ios_ready && ios=true + + if [ "$OS" = "darwin" ]; then + if [ "$android" = true ] && [ "$ios" = true ]; then + echo "" + info "── Platform Setup ──" + echo "" + ok "Android tools detected — skipping setup." + ok "iOS tools detected — skipping setup." + PLATFORM_CHOICE=both + ANDROID_OK=true + IOS_OK=true + return 0 + fi + else + # Linux: only Android applies — iOS isn't reachable on this host. + if [ "$android" = true ]; then + echo "" + info "── Platform Setup ──" + echo "" + ok "Android tools detected — skipping setup." + PLATFORM_CHOICE=android + ANDROID_OK=true + return 0 + fi + fi + return 1 +} + prompt_platform() { echo "" info "── Platform Setup ──" @@ -343,22 +401,28 @@ setup_ios() { return 1 fi - if ! xcode-select -p >/dev/null 2>&1; then - fail "Xcode not found." - info " Install Xcode from the App Store, then re-run the installer." + # `xcode-select -p` succeeds for both full Xcode AND Command Line Tools, + # which is why the previous check produced false-positive "Xcode detected" + # messages on CLT-only machines. `xcrun -f simctl` only succeeds when the + # full Xcode app is the active developer dir — simctl ships with Xcode, + # not CLT. + if ! xcrun -f simctl >/dev/null 2>&1; then + local devdir + devdir=$(xcode-select -p 2>/dev/null || true) + if [ -z "$devdir" ]; then + fail "Xcode not found." + info " Install Xcode from the App Store, launch it once to accept the" + info " license, then re-run the installer." + else + fail "Xcode app not active (xcode-select -p points to: $devdir)." + info " iOS simulators need the full Xcode app, not just Command Line Tools." + info " Install Xcode from the App Store, then run:" + info " sudo xcode-select -s /Applications/Xcode.app/Contents/Developer" + fi return 1 fi ok "Xcode detected." - if xcrun --version >/dev/null 2>&1; then - ok "Xcode Command Line Tools already installed." - else - info " Installing Xcode Command Line Tools..." - info " A system dialog may appear — please accept it." - xcode-select --install 2>/dev/null || true - ok "Xcode Command Line Tools installation initiated (re-run after it finishes)." - fi - if command -v applesimutils >/dev/null 2>&1; then ok "applesimutils already installed." elif command -v brew >/dev/null 2>&1; then From 874a9499c5f734b571d1ca226ea566c0c1769519 Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 16:15:00 -0700 Subject: [PATCH 74/80] installer: auto-sync FinalRun skills (no prompt, skip when up to date) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drops the [Y/n] prompt for skills install and replaces it with an auto-trigger that's smart about already-installed skills. Behavior: - npx missing -> warn and skip (unchanged). - No finalrun-* skills installed globally -> 'skills add final-run/finalrun-agent -y -g' (full install). - Some finalrun-* skills installed -> 'skills update -g -y ' for just those. The skills CLI's update is internally diff-aware: it compares against the source repo and only downloads what's stale. Output line then branches on whether the run reported "up to date" -> "already up to date." vs "updated.". Detection uses 'skills ls -g --json' so it works regardless of how the user got their skills installed. Update is scoped to installed finalrun-* names so we don't inadvertently touch other global skills (find-skills, etc.) the user maintains. User removals are respected: if someone removed one of the FinalRun skills on purpose, we don't reinstall it on re-run — we only update what's there. Same logic in install.ps1 with ConvertFrom-Json on the skills list. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install.ps1 | 49 +++++++++++++++++++++++++++++++------ scripts/install.sh | 59 +++++++++++++++++++++++++++++++-------------- 2 files changed, 82 insertions(+), 26 deletions(-) diff --git a/scripts/install.ps1 b/scripts/install.ps1 index 9c61299..fa97655 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -332,24 +332,57 @@ function Invoke-Doctor { # might be missing a piece they'll fix later. Bash version does the same. } -function Read-SkillsPrompt { +function Sync-Skills { Write-Heading "" Write-Heading "── FinalRun AI Agent Skills ──" Write-Heading "" - $reply = Read-Host "Install AI agent skills (used by Claude Code/Cursor for /finalrun-* commands)? [Y/n]" - if ($reply -match '^[Nn]') { return } - if (-not (Get-Command npx -ErrorAction SilentlyContinue)) { Write-Notice "npx not found — skills require Node + npm. Install Node 20+ from" Write-Notice "https://nodejs.org and re-run the installer if you want them." return } - Write-Heading "Installing FinalRun skills..." - & npx skills add final-run/finalrun-agent + # Detect already-installed finalrun-* skills via the skills CLI's JSON + # output. `skills update` is internally diff-aware — it only downloads + # what's stale, so an up-to-date system pays only a network round-trip. + $installed = @() + try { + $listJson = & npx --yes skills ls -g --json 2>$null + if ($LASTEXITCODE -eq 0 -and $listJson) { + $entries = $listJson | ConvertFrom-Json + $installed = @($entries | + Where-Object { $_.name -like 'finalrun-*' } | + ForEach-Object { $_.name }) + } + } catch { + # Fall through — treat as not-installed. + } + + if ($installed.Count -eq 0) { + Write-Heading "Installing FinalRun skills..." + & npx --yes skills add final-run/finalrun-agent -y -g + if ($LASTEXITCODE -eq 0) { + Write-Success "FinalRun skills installed." + } else { + Write-Notice "FinalRun skills install failed — see output above. Re-run 'npx skills add final-run/finalrun-agent -g' to retry." + } + return + } + + Write-Heading "Checking FinalRun skills for updates..." + # `skills update` prints "All global skills are up to date" when nothing is + # stale, and "Updated N skill(s)" otherwise. + $out = & npx --yes skills update -g -y @installed 2>&1 | Out-String + Write-Host $out if ($LASTEXITCODE -eq 0) { - Write-Success "FinalRun skills installed." + if ($out -match '(?i)up to date') { + Write-Success "FinalRun skills already up to date." + } else { + Write-Success "FinalRun skills updated." + } + } else { + Write-Notice "FinalRun skills update failed — see output above." } } @@ -503,7 +536,7 @@ function Invoke-Main { } } - Read-SkillsPrompt + Sync-Skills Test-ApiKeys Show-Summary -BinPath $binPath -RuntimeDir $runtimeDir -AndroidOk $androidOk -FinalRunDir $finalRunDir } diff --git a/scripts/install.sh b/scripts/install.sh index 477ed3c..80a1a53 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -142,7 +142,7 @@ main() { setup_host_tools fi run_doctor - prompt_skills + sync_skills check_api_keys print_summary @@ -469,30 +469,53 @@ run_doctor() { "$BIN_DEST" doctor --platform "$doctor_platform" || true } -prompt_skills() { +sync_skills() { echo "" info "── FinalRun AI Agent Skills ──" echo "" - printf "Install AI agent skills (used by Claude Code/Cursor for /finalrun-* commands)? [Y/n] [30s timeout]: " - local reply choice - if read -r -t 30 reply; then - case "$reply" in - n|N|no|NO) choice=skip ;; - *) choice=install ;; - esac - else - echo "" - warn "No response in 30s — skipping skills install." - choice=skip + + if ! command -v npx >/dev/null 2>&1; then + warn "npx not found — skills require Node + npm. Install Node 20+ and re-run the installer if you want them." + return fi - if [ "$choice" = "install" ]; then - if ! command -v npx >/dev/null 2>&1; then - warn "npx not found — skills require Node + npm. Install Node 20+ and re-run the installer if you want them." + # Detect already-installed finalrun-* skills via the skills CLI's JSON + # output. `skills update` is internally diff-aware (it compares against the + # source repo and only downloads stale skills), so running it on an + # up-to-date system is essentially a no-op + a network round-trip. + local installed + installed=$(npx --yes skills ls -g --json 2>/dev/null \ + | grep -oE '"finalrun-[a-z0-9-]+"' \ + | tr -d '"' \ + | sort -u \ + | tr '\n' ' ') + + if [ -z "$installed" ]; then + info "Installing FinalRun skills..." + if npx --yes skills add final-run/finalrun-agent -y -g; then + ok "FinalRun skills installed." else - info "Installing FinalRun skills..." - npx skills add final-run/finalrun-agent && ok "FinalRun skills installed." + warn "FinalRun skills install failed — see output above. Re-run 'npx skills add final-run/finalrun-agent -g' to retry." fi + return + fi + + info "Checking FinalRun skills for updates..." + # Capture so we can branch the success line on whether anything changed. + # `skills update` prints "All global skills are up to date" when nothing is + # stale, and "Updated N skill(s)" / "Found N update(s)" when work happened. + local out + # shellcheck disable=SC2086 — $installed is a deliberately split list. + if out=$(npx --yes skills update -g -y $installed 2>&1); then + printf '%s\n' "$out" + if printf '%s' "$out" | grep -qi 'up to date'; then + ok "FinalRun skills already up to date." + else + ok "FinalRun skills updated." + fi + else + printf '%s\n' "$out" + warn "FinalRun skills update failed — see output above." fi } From 7035c80b687947b8dd7236420c4dd9324fe12070 Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 16:56:17 -0700 Subject: [PATCH 75/80] installer: symlink into ~/.local/bin and verify PATH instead of telling users to export it MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related UX fixes so the post-install "open a new terminal or run export PATH=..." instruction shows up only when it's actually needed. 1. Symlink the binary into ~/.local/bin (Unix) The actual binary still lives at ~/.finalrun/bin/finalrun (versioned data isolated). install_binary now also creates ~/.local/bin/finalrun pointing at it. This is the convention used by claude, uv, pipx, pixi, mise — and ~/.local/bin is already on $PATH for the majority of Linux distros via /etc/profile or systemd. So Linux first-install users get instant gratification. setup_path now writes 'export PATH="$HOME/.local/bin:$PATH"' to .bashrc / .bash_profile / .profile / .zshrc / .zprofile and the fish equivalent (fish_add_path) — but only when ~/.local/bin isn't already on the current $PATH. Idempotent via grep on the literal marker. 2. Verify and only complain when broken New verify_path helper runs `command -v finalrun` and prints either "✓ finalrun is on your PATH." or a one-line "open a new terminal" hint. Replaces the unconditional 'open a new terminal, or run: export PATH=...' footer in both print_summary and print_ci_summary. Windows (install.ps1) gets the analogous treatment: - Update-UserPath now also assigns $env:Path in the running PowerShell session, so finalrun works immediately in the same window (irm | iex runs inside the user's process). New windows still pick up PATH via the registry write. - New Test-FinalrunOnPath uses Get-Command finalrun to branch the message; replaces the static '$env:Path = ...' instruction in both Show-Summary and Show-CISummary. Net result: - Linux first install: silent ✓ (PATH already had ~/.local/bin) - macOS first install: rc files updated, single line nudge to open a new terminal (instead of the long export hint) - Windows first install: silent ✓, current shell works - Re-runs on a working shell: silent ✓ (verification passes) Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install.ps1 | 30 +++++++++++++------- scripts/install.sh | 67 +++++++++++++++++++++++++++++++++++++++------ 2 files changed, 78 insertions(+), 19 deletions(-) diff --git a/scripts/install.ps1 b/scripts/install.ps1 index fa97655..d3b1bab 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -189,8 +189,14 @@ function Update-UserPath { $trimmed = $current.TrimEnd(';') $newPath = if ($trimmed) { "$trimmed;$BinDir" } else { $BinDir } [Environment]::SetEnvironmentVariable('Path', $newPath, 'User') - # Note: this does NOT update $env:Path for the current shell. The summary - # message instructs the user to open a new terminal. + + # Also update $env:Path for the *current* PowerShell session so finalrun + # works immediately in the same window. The script is run via `irm | iex`, + # so this assignment lives in the user's shell process — no restart needed + # for the running window. New windows pick up PATH from the registry. + if (-not (($env:Path -split ';') -icontains $BinDir)) { + $env:Path = "$env:Path;$BinDir" + } } function Install-Runtime { @@ -422,9 +428,18 @@ function Test-ApiKeys { Write-Heading " Docs: $(Format-Underline 'https://docs.finalrun.app/configuration/ai-providers')" } +function Test-FinalrunOnPath { + Write-Heading "" + if (Get-Command finalrun -ErrorAction SilentlyContinue) { + Write-Success "finalrun is on your PATH." + return + } + Write-Notice "finalrun isn't on PATH for this shell yet." + Write-Heading " Open a new PowerShell window — your User PATH was updated." +} + function Show-CISummary { param([string]$FinalRunDir) - $binDir = Join-Path $FinalRunDir 'bin' Write-Heading "" Write-Success "finalrun installed." Write-Heading "" @@ -435,9 +450,7 @@ function Show-CISummary { Write-Heading "For local Android execution on this machine, re-run without -CI:" Write-Heading "" Write-Heading " irm https://raw.githubusercontent.com/$script:GitHubRepo/main/scripts/install.ps1 | iex" - Write-Heading "" - Write-Heading "Open a new PowerShell window or run:" - Write-Heading " `$env:Path = `"$binDir;`$env:Path`"" + Test-FinalrunOnPath } function Show-Summary { @@ -447,7 +460,6 @@ function Show-Summary { [bool]$AndroidOk, [string]$FinalRunDir ) - $binDir = Join-Path $FinalRunDir 'bin' Write-Heading "" Write-Heading "── Summary ──" Write-Heading "" @@ -461,9 +473,7 @@ function Show-Summary { Write-Notice "Android: setup incomplete — run 'finalrun doctor --platform android' for details." } - Write-Heading "" - Write-Heading "Open a new PowerShell window, or run:" - Write-Heading " `$env:Path = `"$binDir;`$env:Path`"" + Test-FinalrunOnPath Write-Heading "" Write-Heading "Try it: finalrun --help" Write-Heading "" diff --git a/scripts/install.sh b/scripts/install.sh index 80a1a53..034ffcf 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -197,17 +197,56 @@ install_binary() { fi ok "Installed $BIN_DEST" + + # Symlink into ~/.local/bin so users get a "just works" experience: this + # path is already on $PATH for most Linux distros, and it matches the + # convention used by claude, uv, pipx, pixi, mise. macOS users still need + # to start a new shell on first install (rc files written by setup_path), + # but that's the same constraint every curl|bash installer lives with. + LOCAL_BIN="$HOME/.local/bin" + LOCAL_BIN_LINK="$LOCAL_BIN/finalrun" + if mkdir -p "$LOCAL_BIN" 2>/dev/null && ln -sf "$BIN_DEST" "$LOCAL_BIN_LINK" 2>/dev/null; then + ok "Linked $LOCAL_BIN_LINK -> $BIN_DEST" + else + LOCAL_BIN_LINK="" + warn "Could not write to $LOCAL_BIN — binary is at $BIN_DEST but you'll need to add it to PATH manually." + fi } setup_path() { - local path_line="export PATH=\$PATH:$FINALRUN_DIR/bin" + # Already on PATH via $LOCAL_BIN? Skip rc modification entirely. Common on + # Linux distros that put ~/.local/bin in PATH via /etc/profile or systemd. + case ":${PATH}:" in + *":$LOCAL_BIN:"*) return 0 ;; + esac + + local sh_line="export PATH=\"\$HOME/.local/bin:\$PATH\"" + local fish_line="fish_add_path -p \"\$HOME/.local/bin\"" local rc - for rc in "$HOME/.bashrc" "$HOME/.bash_profile" "${ZDOTDIR:-$HOME}/.zshrc"; do + + # POSIX-shell rc files. Idempotent via the literal $HOME/.local/bin marker. + for rc in \ + "$HOME/.bashrc" \ + "$HOME/.bash_profile" \ + "$HOME/.profile" \ + "${ZDOTDIR:-$HOME}/.zshrc" \ + "${ZDOTDIR:-$HOME}/.zprofile" + do touch "$rc" 2>/dev/null || true - if [ -f "$rc" ] && ! grep -qF "$FINALRUN_DIR/bin" "$rc" 2>/dev/null; then - echo "$path_line" >> "$rc" + if [ -f "$rc" ] && ! grep -qF '$HOME/.local/bin' "$rc" 2>/dev/null; then + printf '\n# finalrun\n%s\n' "$sh_line" >> "$rc" fi done + + # Fish has different syntax and a dedicated config file. + local fish_rc="$HOME/.config/fish/config.fish" + if [ -d "$HOME/.config/fish" ] || command -v fish >/dev/null 2>&1; then + mkdir -p "$HOME/.config/fish" 2>/dev/null || true + touch "$fish_rc" 2>/dev/null || true + if [ -f "$fish_rc" ] && ! grep -qF '.local/bin' "$fish_rc" 2>/dev/null; then + printf '\n# finalrun\n%s\n' "$fish_line" >> "$fish_rc" + fi + fi } print_ci_summary() { @@ -221,8 +260,21 @@ print_ci_summary() { info "For local test execution on this machine, re-run without --ci:" echo "" echo " curl -fsSL https://raw.githubusercontent.com/${GITHUB_REPO}/main/scripts/install.sh | bash" + verify_path +} + +# Tells the user whether finalrun is reachable in their *current* shell. If +# yes (Linux usually, re-runs always), stay quiet. If no (typical macOS +# first install), point them at the cheapest fix. +verify_path() { echo "" - echo "Open a new terminal or run: export PATH=\"\$PATH:$FINALRUN_DIR/bin\"" + if command -v finalrun >/dev/null 2>&1; then + ok "finalrun is on your PATH." + return 0 + fi + warn "finalrun isn't on PATH for this shell yet." + echo " Open a new terminal — your shell rc files were updated." + echo " If it still doesn't resolve, ensure \$HOME/.local/bin is in your PATH." } download_runtime() { @@ -571,10 +623,7 @@ print_summary() { ;; esac - echo "" - info "Open a new terminal, or run:" - echo "" - echo " export PATH=\"\$PATH:$FINALRUN_DIR/bin\"" + verify_path echo "" info "Try it: finalrun --help" echo "" From 5983e450fa91ff8638925ee80e5e2bcf7e152da9 Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 17:03:51 -0700 Subject: [PATCH 76/80] installer: escalate the no-API-key message from yellow warn to red fail MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Today the line is a yellow ⚠ that's easy to skim past, even though finalrun literally cannot run tests without a key. Switch to the red ✗ (the existing fail / Write-Failure helper, which prints without exiting) and add a single explanatory clause: ✗ No API key detected — finalrun cannot run tests without one. Also tightens the BYOK section header from "Prefer your own AI provider account? Bring your own key:" to "Or bring your own provider key:" — parallel structure with the Cloud option above it, matches the new "this is not optional" tone. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install.ps1 | 4 ++-- scripts/install.sh | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/scripts/install.ps1 b/scripts/install.ps1 index d3b1bab..5d949b5 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -411,14 +411,14 @@ function Test-ApiKeys { return } - Write-Notice "No API key detected." + Write-Failure "No API key detected — finalrun cannot run tests without one." Write-Heading "" Write-Heading ' Fastest way to get started — FinalRun Cloud (free $5 credits):' Write-Heading "" Write-Heading " Sign up: $(Format-Underline 'https://cloud.finalrun.app')" Write-Heading " Docs: $(Format-Underline 'https://docs.finalrun.app/configuration/cloud-api-key')" Write-Heading "" - Write-Heading " Prefer your own AI provider account? Bring your own key:" + Write-Heading " Or bring your own provider key:" Write-Heading "" Write-Heading " ANTHROPIC_API_KEY → anthropic/claude-* models" Write-Heading " OPENAI_API_KEY → openai/gpt-* models" diff --git a/scripts/install.sh b/scripts/install.sh index 034ffcf..0dbc515 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -591,14 +591,14 @@ check_api_keys() { return fi - warn "No API key detected." + fail "No API key detected — finalrun cannot run tests without one." echo "" echo " Fastest way to get started — FinalRun Cloud (free \$5 credits):" echo "" echo " Sign up: $(underline 'https://cloud.finalrun.app')" echo " Docs: $(underline 'https://docs.finalrun.app/configuration/cloud-api-key')" echo "" - echo " Prefer your own AI provider account? Bring your own key:" + echo " Or bring your own provider key:" echo "" echo " ANTHROPIC_API_KEY → anthropic/claude-* models" echo " OPENAI_API_KEY → openai/gpt-* models" From 05c2e615862f8d67ede905f3999ce506215f4045 Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 17:08:29 -0700 Subject: [PATCH 77/80] installer: address CodeRabbit findings on PR #126 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1. install.ps1: split Update-UserPath into independent registry + session updates. The previous early-return at the registry check meant a current PowerShell session that lacked $BinDir wouldn't get patched if the registry was already correct (e.g. PS process started before a previous installer run, or the session PATH was manually pruned). Now the two checks run independently — registry update for new windows, $env:Path update for the current window. 2. install.sh: track a single PATH_TARGET_DIR and reuse it across install_binary, setup_path, and verify_path. Previously, when `ln -sf` to ~/.local/bin failed (read-only, exotic FS), setup_path still wrote ~/.local/bin to shell rcs and verify_path still nudged the user toward ~/.local/bin — but no finalrun lived there. Now PATH_TARGET_DIR falls back to $FINALRUN_DIR/bin on symlink failure and the rest of the flow follows that single source of truth. Smoke-tested all four branches: ~/.local/bin success, fallback to ~/.finalrun/bin, skip-when-in-PATH, idempotent re-run. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install.ps1 | 39 +++++++++++++++++++++------------------ scripts/install.sh | 26 ++++++++++++++++---------- 2 files changed, 37 insertions(+), 28 deletions(-) diff --git a/scripts/install.ps1 b/scripts/install.ps1 index 5d949b5..ca81217 100644 --- a/scripts/install.ps1 +++ b/scripts/install.ps1 @@ -173,29 +173,32 @@ function Install-Binary { function Update-UserPath { param([string]$BinDir) + # Two independent updates: User PATH in the registry (for new windows), + # and $env:Path in the current PS session (for the running window). The + # registry might already be correct while $env:Path lags behind — e.g. + # the PS process started before a previous installer run, or somebody + # manually pruned the session. Don't short-circuit one because the other + # is fine. + + # Registry: idempotent via exact-segment, case-insensitive membership + # check (Windows paths are case-insensitive; substring matching would + # falsely match C:\foo\bin against C:\foo\bin\subdir). $current = [Environment]::GetEnvironmentVariable('Path', 'User') if ($null -eq $current) { $current = '' } - - # Idempotent: split and check exact membership, not substring (which - # would falsely match C:\foo\bin against C:\foo\bin\subdir). - # `-icontains` for case-insensitive comparison — Windows paths are - # case-insensitive, so a pre-existing entry with different casing - # (e.g. C:\Users\foo\.finalrun\Bin) shouldn't get a duplicate appended. $segments = ($current -split ';') | Where-Object { $_ } - if ($segments -icontains $BinDir) { - return + if (-not ($segments -icontains $BinDir)) { + $trimmed = $current.TrimEnd(';') + $newPath = if ($trimmed) { "$trimmed;$BinDir" } else { $BinDir } + [Environment]::SetEnvironmentVariable('Path', $newPath, 'User') } - $trimmed = $current.TrimEnd(';') - $newPath = if ($trimmed) { "$trimmed;$BinDir" } else { $BinDir } - [Environment]::SetEnvironmentVariable('Path', $newPath, 'User') - - # Also update $env:Path for the *current* PowerShell session so finalrun - # works immediately in the same window. The script is run via `irm | iex`, - # so this assignment lives in the user's shell process — no restart needed - # for the running window. New windows pick up PATH from the registry. - if (-not (($env:Path -split ';') -icontains $BinDir)) { - $env:Path = "$env:Path;$BinDir" + # Current PowerShell session: same idempotent check, separate state. + # The script is run via `irm | iex`, so this assignment lives in the + # user's shell process — no restart needed for the running window. + $sessionSegments = (($env:Path -split ';') | Where-Object { $_ }) + if (-not ($sessionSegments -icontains $BinDir)) { + $sessionTrimmed = ($env:Path).TrimEnd(';') + $env:Path = if ($sessionTrimmed) { "$sessionTrimmed;$BinDir" } else { $BinDir } } } diff --git a/scripts/install.sh b/scripts/install.sh index 0dbc515..89bd613 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -203,28 +203,34 @@ install_binary() { # convention used by claude, uv, pipx, pixi, mise. macOS users still need # to start a new shell on first install (rc files written by setup_path), # but that's the same constraint every curl|bash installer lives with. + # + # If the symlink fails (read-only ~/.local/bin, exotic filesystem), + # PATH_TARGET_DIR falls back to the binary's actual home so setup_path + # and verify_path don't end up pointing at a directory with no finalrun. LOCAL_BIN="$HOME/.local/bin" LOCAL_BIN_LINK="$LOCAL_BIN/finalrun" if mkdir -p "$LOCAL_BIN" 2>/dev/null && ln -sf "$BIN_DEST" "$LOCAL_BIN_LINK" 2>/dev/null; then ok "Linked $LOCAL_BIN_LINK -> $BIN_DEST" + PATH_TARGET_DIR="$LOCAL_BIN" else LOCAL_BIN_LINK="" - warn "Could not write to $LOCAL_BIN — binary is at $BIN_DEST but you'll need to add it to PATH manually." + PATH_TARGET_DIR="$FINALRUN_DIR/bin" + warn "Could not write to $LOCAL_BIN — falling back to $PATH_TARGET_DIR for PATH setup." fi } setup_path() { - # Already on PATH via $LOCAL_BIN? Skip rc modification entirely. Common on - # Linux distros that put ~/.local/bin in PATH via /etc/profile or systemd. + # Already on PATH? Skip rc modification entirely. Common on Linux distros + # that put ~/.local/bin in PATH via /etc/profile or systemd. case ":${PATH}:" in - *":$LOCAL_BIN:"*) return 0 ;; + *":$PATH_TARGET_DIR:"*) return 0 ;; esac - local sh_line="export PATH=\"\$HOME/.local/bin:\$PATH\"" - local fish_line="fish_add_path -p \"\$HOME/.local/bin\"" + local sh_line="export PATH=\"$PATH_TARGET_DIR:\$PATH\"" + local fish_line="fish_add_path -p \"$PATH_TARGET_DIR\"" local rc - # POSIX-shell rc files. Idempotent via the literal $HOME/.local/bin marker. + # POSIX-shell rc files. Idempotent via the literal target marker. for rc in \ "$HOME/.bashrc" \ "$HOME/.bash_profile" \ @@ -233,7 +239,7 @@ setup_path() { "${ZDOTDIR:-$HOME}/.zprofile" do touch "$rc" 2>/dev/null || true - if [ -f "$rc" ] && ! grep -qF '$HOME/.local/bin' "$rc" 2>/dev/null; then + if [ -f "$rc" ] && ! grep -qF "$PATH_TARGET_DIR" "$rc" 2>/dev/null; then printf '\n# finalrun\n%s\n' "$sh_line" >> "$rc" fi done @@ -243,7 +249,7 @@ setup_path() { if [ -d "$HOME/.config/fish" ] || command -v fish >/dev/null 2>&1; then mkdir -p "$HOME/.config/fish" 2>/dev/null || true touch "$fish_rc" 2>/dev/null || true - if [ -f "$fish_rc" ] && ! grep -qF '.local/bin' "$fish_rc" 2>/dev/null; then + if [ -f "$fish_rc" ] && ! grep -qF "$PATH_TARGET_DIR" "$fish_rc" 2>/dev/null; then printf '\n# finalrun\n%s\n' "$fish_line" >> "$fish_rc" fi fi @@ -274,7 +280,7 @@ verify_path() { fi warn "finalrun isn't on PATH for this shell yet." echo " Open a new terminal — your shell rc files were updated." - echo " If it still doesn't resolve, ensure \$HOME/.local/bin is in your PATH." + echo " If it still doesn't resolve, ensure $PATH_TARGET_DIR is in your PATH." } download_runtime() { From d3ec2be3cad87634af49748dfb3381ec97637016 Mon Sep 17 00:00:00 2001 From: Arnold Laishram Date: Mon, 27 Apr 2026 17:24:31 -0700 Subject: [PATCH 78/80] Unify TypeScript 6 across workspaces and fix TS 6 typings defaults. Pin typescript to ^6.0.3 in every workspace package so npm hoists one major version for builds and tooling. Extend tsconfig.base.json with typeRoots targeting the repo @types folder and types: [`*`] so TypeScript 6 still loads ambient typings (Node globals, tests) after its types default stopped auto-including packages under node_modules/@types. Made-with: Cursor --- packages/cli/package.json | 2 +- packages/cloud-core/package.json | 2 +- packages/common/package.json | 2 +- packages/device-node/package.json | 2 +- packages/goal-executor/package.json | 2 +- packages/report-web/package.json | 2 +- tsconfig.base.json | 2 ++ 7 files changed, 8 insertions(+), 6 deletions(-) diff --git a/packages/cli/package.json b/packages/cli/package.json index 1cecc73..30f7471 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -61,6 +61,6 @@ }, "devDependencies": { "tsx": "^4.19.0", - "typescript": "^5.7.0" + "typescript": "^6.0.3" } } diff --git a/packages/cloud-core/package.json b/packages/cloud-core/package.json index aaa427a..4bee0a0 100644 --- a/packages/cloud-core/package.json +++ b/packages/cloud-core/package.json @@ -17,6 +17,6 @@ }, "devDependencies": { "@types/adm-zip": "^0.5.8", - "typescript": "^5.7.0" + "typescript": "^6.0.3" } } diff --git a/packages/common/package.json b/packages/common/package.json index 4827eae..c4016a4 100644 --- a/packages/common/package.json +++ b/packages/common/package.json @@ -17,6 +17,6 @@ "test": "node --test \"dist/**/*.test.js\"" }, "devDependencies": { - "typescript": "^5.7.0" + "typescript": "^6.0.3" } } diff --git a/packages/device-node/package.json b/packages/device-node/package.json index 3689297..094a672 100644 --- a/packages/device-node/package.json +++ b/packages/device-node/package.json @@ -22,7 +22,7 @@ }, "devDependencies": { "ts-proto": "^2.6.0", - "typescript": "^5.7.0", + "typescript": "^6.0.3", "grpc-tools": "^1.12.4" } } diff --git a/packages/goal-executor/package.json b/packages/goal-executor/package.json index 165d73f..df3ab29 100644 --- a/packages/goal-executor/package.json +++ b/packages/goal-executor/package.json @@ -25,6 +25,6 @@ "zod": "^4.1.8" }, "devDependencies": { - "typescript": "^5.7.0" + "typescript": "^6.0.3" } } diff --git a/packages/report-web/package.json b/packages/report-web/package.json index 4071406..a9ee7ab 100644 --- a/packages/report-web/package.json +++ b/packages/report-web/package.json @@ -48,7 +48,7 @@ "@vitejs/plugin-react": "^6.0.1", "tsup": "^8.5.1", "tsx": "^4.19.0", - "typescript": "^5.9.0", + "typescript": "^6.0.3", "vite": "^8.0.10" } } diff --git a/tsconfig.base.json b/tsconfig.base.json index 44d5ebd..4b92177 100644 --- a/tsconfig.base.json +++ b/tsconfig.base.json @@ -4,6 +4,8 @@ "module": "Node16", "moduleResolution": "Node16", "lib": ["ES2022"], + "typeRoots": ["./node_modules/@types"], + "types": ["*"], "declaration": true, "declarationMap": true, "sourceMap": true, From 4db3f1ddc5901defb19b75dca2b68f91f9c9afba Mon Sep 17 00:00:00 2001 From: ashish Date: Mon, 27 Apr 2026 17:50:11 -0700 Subject: [PATCH 79/80] Fix clean-install build under TypeScript 6 and Vite 8. Make `npm i && npm run build` work from a fully clean state. Three issues surfaced after the TS 6 unification commit and Vite 8 bump: - tsconfig.base.json typeRoots was workspace-relative only, so only workspaces that nested their own @types/ (report-web) found Node types; the rest hit TS2591 on `node:*` imports. Add the repo-root path so hoisted @types are visible from every workspace. - Add @types/node at the root so a modern version hoists, instead of relying on an ancient transitive @types/node@12 that predates the `node:` module-name protocol. - Vite 8 lists rolldown and lightningcss as regular deps, but npm workspace hoisting drops them. Declare both explicitly in report-web (rolldown was already there; lightningcss was missing). Also replace the brittle hoisted-package check in scripts/ensure-dev-install.mjs with a simple root node_modules check so it stops false-firing on packages npm chose to nest. --- package.json | 1 + packages/report-web/package.json | 4 +++- scripts/ensure-dev-install.mjs | 14 +++----------- tsconfig.base.json | 2 +- 4 files changed, 8 insertions(+), 13 deletions(-) diff --git a/package.json b/package.json index 019a4bb..321bc93 100644 --- a/package.json +++ b/package.json @@ -45,6 +45,7 @@ "devDependencies": { "@changesets/cli": "^2.27.0", "@eslint/js": "^10.0.1", + "@types/node": "^24.12.2", "eslint": "^10.1.0", "eslint-config-prettier": "^10.1.8", "globals": "^17.4.0", diff --git a/packages/report-web/package.json b/packages/report-web/package.json index a9ee7ab..2f2a9c5 100644 --- a/packages/report-web/package.json +++ b/packages/report-web/package.json @@ -38,9 +38,11 @@ }, "dependencies": { "@finalrun/common": "^0.1.7", + "lightningcss": "^1.32.0", "react": "^19.2.0", "react-dom": "^19.2.0", - "react-router-dom": "^7.14.2" + "react-router-dom": "^7.14.2", + "rolldown": "^1.0.0-rc.17" }, "devDependencies": { "@types/react": "^19.2.14", diff --git a/scripts/ensure-dev-install.mjs b/scripts/ensure-dev-install.mjs index 8607660..4fc7f6c 100644 --- a/scripts/ensure-dev-install.mjs +++ b/scripts/ensure-dev-install.mjs @@ -5,27 +5,19 @@ import { fileURLToPath } from 'node:url'; const scriptDir = path.dirname(fileURLToPath(import.meta.url)); const repoRoot = path.resolve(scriptDir, '..'); -const requiredPackages = ['typescript', 'tsx', 'vite']; -const missingPackages = requiredPackages.filter((packageName) => - !fs.existsSync(path.join(repoRoot, 'node_modules', packageName, 'package.json')), -); - -if (missingPackages.length === 0) { +if (fs.existsSync(path.join(repoRoot, 'node_modules'))) { process.exit(0); } -const packageList = missingPackages.join(', '); console.error( [ - `Missing local workspace dependencies in ${repoRoot}.`, - `Missing packages: ${packageList}`, + `No node_modules in ${repoRoot}.`, '', 'Run this once from the repo root:', ` cd ${repoRoot}`, ' npm ci', '', - 'If this is a fresh git worktree, each worktree needs its own node_modules', - 'or a symlink to a shared install before build/dev/test commands will work.', + 'Each git worktree needs its own node_modules (or a symlink to a shared install).', ].join('\n'), ); process.exit(1); diff --git a/tsconfig.base.json b/tsconfig.base.json index 4b92177..4912ee3 100644 --- a/tsconfig.base.json +++ b/tsconfig.base.json @@ -4,7 +4,7 @@ "module": "Node16", "moduleResolution": "Node16", "lib": ["ES2022"], - "typeRoots": ["./node_modules/@types"], + "typeRoots": ["./node_modules/@types", "../../node_modules/@types"], "types": ["*"], "declaration": true, "declarationMap": true, From 3b764c8eef524a770480db7856d4805f1214a9b5 Mon Sep 17 00:00:00 2001 From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com> Date: Tue, 28 Apr 2026 01:58:59 +0000 Subject: [PATCH 80/80] Document standalone binary install path on installation page Generated-By: mintlify-agent --- mintlify-docs/installation.mdx | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/mintlify-docs/installation.mdx b/mintlify-docs/installation.mdx index 1df71ce..3180709 100644 --- a/mintlify-docs/installation.mdx +++ b/mintlify-docs/installation.mdx @@ -1,21 +1,15 @@ --- title: "Install FinalRun: prerequisites, CLI setup, and host verification" sidebarTitle: "Installation" -description: "Set up prerequisites, install the FinalRun CLI on macOS or Linux, and verify that your host is ready to run Android or iOS tests." +description: "Set up prerequisites, install the FinalRun CLI on macOS, Linux, or Windows, and verify that your host is ready to run Android or iOS tests." --- -FinalRun runs as a Node.js CLI. This page walks you through the prerequisites, the install itself, and how to verify your machine is ready. Once `finalrun doctor` reports a clean bill of health, continue to the [Quickstart](/quickstart) to write and run your first test. +FinalRun is distributed as a standalone `finalrun` binary — no Node.js or npm required. This page walks you through the prerequisites, the install itself, and how to verify your machine is ready. Once `finalrun doctor` reports a clean bill of health, continue to the [Quickstart](/quickstart) to write and run your first test. ## Prerequisites Before installing FinalRun, make sure your machine has the following in place. -### Node.js - -- **Node.js** 20.0.0 or later - -The one-line install script sets up Node.js for you if it isn't already present. - ### AI provider API key FinalRun is BYOK (bring your own key). Before your first test run, obtain an API key from one of the supported providers: @@ -99,18 +93,22 @@ You'll set the key during the Quickstart. See [AI Providers](/configuration/ai-p ## Install FinalRun -**One-line install (recommended):** +The installer downloads the standalone `finalrun` binary for your platform, the bundled platform driver assets for Android and iOS, and the FinalRun AI agent skills for your coding agent. -```bash + +```bash macOS / Linux curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.sh | bash ``` -The script installs Node.js (if needed), the `finalrun` CLI globally, the bundled platform driver assets for Android and iOS, and the FinalRun AI agent skills for your coding agent. +```powershell Windows +irm https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.ps1 | iex +``` + -**npm (if you already have Node.js 20+):** +To keep the CLI up to date later, run: ```bash -npm install -g @finalrun/finalrun-agent +finalrun upgrade ``` ## Verify the installation