diff --git a/packages/skills/skills/argent-ios-profiler/SKILL.md b/packages/skills/skills/argent-ios-profiler/SKILL.md index 0678ab9..1c1fb04 100644 --- a/packages/skills/skills/argent-ios-profiler/SKILL.md +++ b/packages/skills/skills/argent-ios-profiler/SKILL.md @@ -37,14 +37,7 @@ After presenting findings, ask the user whether to investigate further, implemen ### Step 0: Ensure the target app is running -The `ios-profiler-start` tool **auto-detects** the running app on the simulator. -You do not need to derive `app_process` manually — just make sure the app is launched. - -1. If the app is already running on the simulator, skip to Step 1 (do not pass `app_process`). -2. If the app is not running, use `launch-app` with the correct bundle ID first. -3. Only pass `app_process` explicitly if the tool reports multiple running user apps and you need to disambiguate. - -> **Note**: If multiple build flavors are installed (dev, staging, prod), the tool will detect whichever one is currently running. If both are running, it will ask you to specify. +Launch the app on the simulator if it is not already foregrounded (`launch-app` as needed). **`ios-profiler-start` auto-detects** the running app; omit `app_process` unless the tool fails with multiple running user apps and you must pick a `CFBundleExecutable`. See the **`ios-profiler-start` tool description** and `app_process` field for behavior when several flavors or apps are running. ### Step 1: Start recording diff --git a/packages/skills/skills/argent-simulator-interact/SKILL.md b/packages/skills/skills/argent-simulator-interact/SKILL.md index c11a0bf..3242f38 100644 --- a/packages/skills/skills/argent-simulator-interact/SKILL.md +++ b/packages/skills/skills/argent-simulator-interact/SKILL.md @@ -1,6 +1,6 @@ --- name: argent-simulator-interact -description: Interact with an iOS simulator using argent MCP tools. Use when tapping UI elements, perfroming gestures, scrolling, typing text, pressing hardware buttons, launching apps, opening URLs, taking screenshots. +description: Interact with an iOS simulator using argent MCP tools. Use when tapping UI elements, performing gestures, scrolling, typing text, pressing hardware buttons, launching apps, opening URLs, taking screenshots. --- ## 1. Before You Start @@ -55,9 +55,9 @@ Common schemes: `messages://`, `settings://`, `maps://?q=`, `tel://", "x": 0.5, "y": 0.5 } -``` - -Coordinates: `0.0` = left/top, `1.0` = right/bottom. +### gesture-tap Before tapping near the bottom of the screen in React Native apps, check that "Open Debugger to View Warnings" banners are not visible — tapping them breaks the debugger connection. Close them with the X icon if present. -### gesture-swipe — Straight-line gesture - -```json -{ "udid": "", "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } -``` - -Swipe **up** (`fromY > toY`) = scroll content **down**. Default duration: 300ms. Optional: `"durationMs": 500` for slower swipe. - -### gesture-pinch — Two-finger pinch - -```json -{ "udid": "", "centerX": 0.5, "centerY": 0.5, "startDistance": 0.2, "endDistance": 0.6 } -``` - -All values are normalized 0.0–1.0 (fractions of screen, not pixels) — same as all other gesture tools. `startDistance: 0.2` means fingers start 20% of the screen apart; `endDistance: 0.6` means they end 60% apart. `startDistance < endDistance` = pinch out (zoom in). `startDistance > endDistance` = pinch in (zoom out). Defaults: `angle: 0` (horizontal), `durationMs: 300`. Optional: `"angle": 90` for vertical axis, `"durationMs": 500` for slower pinch. - -### gesture-rotate — Two-finger rotation - -```json -{ - "udid": "", - "centerX": 0.5, - "centerY": 0.5, - "radius": 0.15, - "startAngle": 0, - "endAngle": 90 -} -``` - -All positions and radius are normalized 0.0–1.0 (fractions of screen, not pixels). `radius: 0.15` means each finger is 15% of the screen away from center. `endAngle > startAngle` = clockwise. Default duration: 300ms. Optional: `"durationMs": 500` for slower rotation. - -### gesture-custom — Custom touch sequence +### gesture-custom For long-press, drag-and-drop, and other complex sequences, see `references/gesture-examples.md`. Set `"interpolate": 10` to auto-generate smooth intermediate Move events between keyframes. -### button — Hardware button press - -```json -{ "udid": "", "button": "home" } -``` - -Values: `home`, `back`, `power`, `volumeUp`, `volumeDown`, `appSwitch`, `actionButton` - -### paste — Type text into focused field +### paste and keyboard -```json -{ "udid": "", "text": "Hello, world!" } -``` - -Tap the field first, then paste. Fall back to `keyboard` if it doesn't work. - -### keyboard — Type text or press special keys - -```json -{ "udid": "", "text": "search query", "key": "enter" } -``` - -Special keys: `enter`, `escape`, `backspace`, `tab`, `space`, `arrow-up`, `arrow-down`, `arrow-left`, `arrow-right`, `f1`–`f12`. Optional: `"delayMs": 100` between keystrokes (default 50ms). - -### rotate — Change orientation - -```json -{ "udid": "", "orientation": "LandscapeLeft" } -``` - -Values: `Portrait`, `LandscapeLeft`, `LandscapeRight`, `PortraitUpsideDown` +Tap the target field first, then use `paste`. Fall back to `keyboard` when paste is unreliable; allowed named keys and timing are in the `keyboard` tool schema. --- @@ -205,68 +142,8 @@ Screenshots are downscaled by default (30% of original resolution) to reduce con ## 8. Action Sequencing with `run-sequence` -Use `run-sequence` to batch multiple interaction steps into **a single tool call**. Only one screenshot is returned — after all steps complete. Use cases: -scrolling multiple times, typing and submitting automatically, known sequence of multiple taps, rotating device back and forth. - -Do **not** use `run-sequence` when any step depends on observing the result of a previous step - -### Use cases - -Use the sequencing when: - -- Knowing that some action needs multiple steps without necessarily immediate insight of screenshot -- "scroll to bottom", "scroll to top", "scroll to do X" -> sequence scroll 3-5 times -- form interactions, "clear and retype field" -> you may use triple-tap to select all, type new value -- "submit form" → fill all fields in sequence, tap submit -- "go back to X" → defined tap sequence for the navigation - -### Allowed tools inside `run-sequence` - -`gesture-tap`, `gesture-swipe`, `gesture-custom`, `gesture-pinch`, `gesture-rotate`, `button`, `keyboard`, `rotate` +Use `run-sequence` to batch multiple interaction steps into **a single tool call**. You do not get intermediate screenshots — only outcomes after the full sequence (call `screenshot` separately if needed afterward). Good fits: several swipes in a row, type then submit, known sequence of multiple taps, filling in forms in one call, rotating back and forth. -The `udid` is shared — do **not** include it in each step's `args`. Optional `delayMs` per step (default 100ms). - -### Examples - -Scroll down three times: - -```json -{ - "udid": "", - "steps": [ - { "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } }, - { "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } }, - { "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } } - ] -} -``` - -Type into a focused field and submit: - -```json -{ - "udid": "", - "steps": [ - { "tool": "keyboard", "args": { "text": "hello world" } }, - { "tool": "keyboard", "args": { "key": "enter" } } - ] -} -``` - -Tap a known button, then scroll down: - -```json -{ - "udid": "", - "steps": [ - { "tool": "gesture-tap", "args": { "x": 0.5, "y": 0.15 } }, - { - "tool": "gesture-swipe", - "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 }, - "delayMs": 300 - } - ] -} -``` +**Examples:** multi-step scroll (“scroll to bottom”), form fill + submit, a known navigation tap sequence. The **`run-sequence` tool description** lists allowed nested tools, per-step argument shapes (with `udid` omitted from each step’s `args`), default step delay, copy-paste JSON examples, and partial-result-on-error behavior — use that as the source of truth. -Stops on the first error and returns partial results. +Do **not** use `run-sequence` when any step depends on **observing** the UI after a prior step (e.g. a control that only appears after a tap). Use individual tool calls and discovery between steps instead. diff --git a/packages/tool-server/src/tools/interactions/run-sequence.ts b/packages/tool-server/src/tools/interactions/run-sequence.ts index 952f260..aa1fdb3 100644 --- a/packages/tool-server/src/tools/interactions/run-sequence.ts +++ b/packages/tool-server/src/tools/interactions/run-sequence.ts @@ -56,6 +56,8 @@ Use when you need sequential actions and do NOT need to observe the screen betwe Returns { completed, total, steps } with per-step results. Fails if an unrecognised tool name is used in a step (error returned at that step, execution stops). No screenshot is captured automatically — call screenshot separately after the sequence if needed. +For when to batch vs use separate calls with discovery between steps, see the argent-simulator-interact skill (Action Sequencing / run-sequence). + ONLY use this when every step is known in advance. If any step depends on the result of a previous one (e.g. tapping a menu item that only appears after a prior tap), use individual tool calls instead. diff --git a/scripts/extract-tools.mjs b/scripts/extract-tools.mjs index a001ea5..4652cf3 100644 --- a/scripts/extract-tools.mjs +++ b/scripts/extract-tools.mjs @@ -43,8 +43,9 @@ function extractFromFile(filePath) { const id = idMatch[1]; const afterId = src.slice(idMatch.index); - // Look for description within the next 2000 chars (handles multi-line) - const descWindow = afterId.slice(0, 2000); + // Look for description within the next N chars (handles multi-line; some tools + // have long descriptions before the closing backtick, e.g. run-sequence) + const descWindow = afterId.slice(0, 3000); // Template literal description let descMatch = descWindow.match(/\bdescription:\s*`([\s\S]*?)`/);