Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 1 addition & 8 deletions packages/skills/skills/argent-ios-profiler/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,7 @@ After presenting findings, ask the user whether to investigate further, implemen

### Step 0: Ensure the target app is running

The `ios-profiler-start` tool **auto-detects** the running app on the simulator.
You do not need to derive `app_process` manually — just make sure the app is launched.

1. If the app is already running on the simulator, skip to Step 1 (do not pass `app_process`).
2. If the app is not running, use `launch-app` with the correct bundle ID first.
3. Only pass `app_process` explicitly if the tool reports multiple running user apps and you need to disambiguate.

> **Note**: If multiple build flavors are installed (dev, staging, prod), the tool will detect whichever one is currently running. If both are running, it will ask you to specify.
Launch the app on the simulator if it is not already foregrounded (`launch-app` as needed). **`ios-profiler-start` auto-detects** the running app; omit `app_process` unless the tool fails with multiple running user apps and you must pick a `CFBundleExecutable`. See the **`ios-profiler-start` tool description** and `app_process` field for behavior when several flavors or apps are running.

### Step 1: Start recording

Expand Down
145 changes: 11 additions & 134 deletions packages/skills/skills/argent-simulator-interact/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: argent-simulator-interact
description: Interact with an iOS simulator using argent MCP tools. Use when tapping UI elements, perfroming gestures, scrolling, typing text, pressing hardware buttons, launching apps, opening URLs, taking screenshots.
description: Interact with an iOS simulator using argent MCP tools. Use when tapping UI elements, performing gestures, scrolling, typing text, pressing hardware buttons, launching apps, opening URLs, taking screenshots.
---

## 1. Before You Start
Expand Down Expand Up @@ -55,9 +55,9 @@ Common schemes: `messages://`, `settings://`, `maps://?q=<query>`, `tel://<numbe
| Pinch/zoom | `gesture-pinch` | Two-finger pinch with auto-interpolation |
| Rotation | `gesture-rotate` | Two-finger rotation with auto-interpolation |
| Custom gesture | `gesture-custom` | Arbitrary touch sequences, optional interpolation |
| Hardware key | `button` | Home, back, power, volume, appSwitch, actionButton |
| Hardware key | `button` | Allowed values in tool schema |
| Type text (fast) | `paste` | Form fields — uses clipboard |
| Type text | `keyboard` | Fallback when paste fails; supports Enter, Escape, arrows |
| Type text | `keyboard` | Fallback when paste fails; see tool schema for keys |
| Rotate device | `rotate` | Orientation changes |

## 5. Finding Tap Targets
Expand Down Expand Up @@ -94,82 +94,19 @@ Read the exact error and choose the action that matches it:

## 6. Tool Usage

### gesture-tap — Single tap at a point
Parameter shapes, normalized coordinates, swipe direction, pinch/rotate semantics, and optional fields are defined in each tool’s **description** and **input schema** (`gesture-tap`, `gesture-swipe`, `gesture-pinch`, `gesture-rotate`, `gesture-custom`, `button`, `paste`, `keyboard`, `rotate`). Load those definitions before calling a tool (see §1).

```json
{ "udid": "<UDID>", "x": 0.5, "y": 0.5 }
```

Coordinates: `0.0` = left/top, `1.0` = right/bottom.
### gesture-tap

Before tapping near the bottom of the screen in React Native apps, check that "Open Debugger to View Warnings" banners are not visible — tapping them breaks the debugger connection. Close them with the X icon if present.

### gesture-swipe — Straight-line gesture

```json
{ "udid": "<UDID>", "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 }
```

Swipe **up** (`fromY > toY`) = scroll content **down**. Default duration: 300ms. Optional: `"durationMs": 500` for slower swipe.

### gesture-pinch — Two-finger pinch

```json
{ "udid": "<UDID>", "centerX": 0.5, "centerY": 0.5, "startDistance": 0.2, "endDistance": 0.6 }
```

All values are normalized 0.0–1.0 (fractions of screen, not pixels) — same as all other gesture tools. `startDistance: 0.2` means fingers start 20% of the screen apart; `endDistance: 0.6` means they end 60% apart. `startDistance < endDistance` = pinch out (zoom in). `startDistance > endDistance` = pinch in (zoom out). Defaults: `angle: 0` (horizontal), `durationMs: 300`. Optional: `"angle": 90` for vertical axis, `"durationMs": 500` for slower pinch.

### gesture-rotate — Two-finger rotation

```json
{
"udid": "<UDID>",
"centerX": 0.5,
"centerY": 0.5,
"radius": 0.15,
"startAngle": 0,
"endAngle": 90
}
```

All positions and radius are normalized 0.0–1.0 (fractions of screen, not pixels). `radius: 0.15` means each finger is 15% of the screen away from center. `endAngle > startAngle` = clockwise. Default duration: 300ms. Optional: `"durationMs": 500` for slower rotation.

### gesture-custom — Custom touch sequence
### gesture-custom

For long-press, drag-and-drop, and other complex sequences, see `references/gesture-examples.md`. Set `"interpolate": 10` to auto-generate smooth intermediate Move events between keyframes.

### button — Hardware button press

```json
{ "udid": "<UDID>", "button": "home" }
```

Values: `home`, `back`, `power`, `volumeUp`, `volumeDown`, `appSwitch`, `actionButton`

### paste — Type text into focused field
### paste and keyboard

```json
{ "udid": "<UDID>", "text": "Hello, world!" }
```

Tap the field first, then paste. Fall back to `keyboard` if it doesn't work.

### keyboard — Type text or press special keys

```json
{ "udid": "<UDID>", "text": "search query", "key": "enter" }
```

Special keys: `enter`, `escape`, `backspace`, `tab`, `space`, `arrow-up`, `arrow-down`, `arrow-left`, `arrow-right`, `f1`–`f12`. Optional: `"delayMs": 100` between keystrokes (default 50ms).

### rotate — Change orientation

```json
{ "udid": "<UDID>", "orientation": "LandscapeLeft" }
```

Values: `Portrait`, `LandscapeLeft`, `LandscapeRight`, `PortraitUpsideDown`
Tap the target field first, then use `paste`. Fall back to `keyboard` when paste is unreliable; allowed named keys and timing are in the `keyboard` tool schema.

---

Expand Down Expand Up @@ -205,68 +142,8 @@ Screenshots are downscaled by default (30% of original resolution) to reduce con

## 8. Action Sequencing with `run-sequence`

Use `run-sequence` to batch multiple interaction steps into **a single tool call**. Only one screenshot is returned — after all steps complete. Use cases:
scrolling multiple times, typing and submitting automatically, known sequence of multiple taps, rotating device back and forth.

Do **not** use `run-sequence` when any step depends on observing the result of a previous step

### Use cases

Use the sequencing when:

- Knowing that some action needs multiple steps without necessarily immediate insight of screenshot
- "scroll to bottom", "scroll to top", "scroll to do X" -> sequence scroll 3-5 times
- form interactions, "clear and retype field" -> you may use triple-tap to select all, type new value
- "submit form" → fill all fields in sequence, tap submit
- "go back to X" → defined tap sequence for the navigation

### Allowed tools inside `run-sequence`

`gesture-tap`, `gesture-swipe`, `gesture-custom`, `gesture-pinch`, `gesture-rotate`, `button`, `keyboard`, `rotate`
Use `run-sequence` to batch multiple interaction steps into **a single tool call**. You do not get intermediate screenshots — only outcomes after the full sequence (call `screenshot` separately if needed afterward). Good fits: several swipes in a row, type then submit, known sequence of multiple taps, filling in forms in one call, rotating back and forth.

The `udid` is shared — do **not** include it in each step's `args`. Optional `delayMs` per step (default 100ms).

### Examples

Scroll down three times:

```json
{
"udid": "<UDID>",
"steps": [
{ "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } },
{ "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } },
{ "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } }
]
}
```

Type into a focused field and submit:

```json
{
"udid": "<UDID>",
"steps": [
{ "tool": "keyboard", "args": { "text": "hello world" } },
{ "tool": "keyboard", "args": { "key": "enter" } }
]
}
```

Tap a known button, then scroll down:

```json
{
"udid": "<UDID>",
"steps": [
{ "tool": "gesture-tap", "args": { "x": 0.5, "y": 0.15 } },
{
"tool": "gesture-swipe",
"args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 },
"delayMs": 300
}
]
}
```
**Examples:** multi-step scroll (“scroll to bottom”), form fill + submit, a known navigation tap sequence. The **`run-sequence` tool description** lists allowed nested tools, per-step argument shapes (with `udid` omitted from each step’s `args`), default step delay, copy-paste JSON examples, and partial-result-on-error behavior — use that as the source of truth.

Stops on the first error and returns partial results.
Do **not** use `run-sequence` when any step depends on **observing** the UI after a prior step (e.g. a control that only appears after a tap). Use individual tool calls and discovery between steps instead.
2 changes: 2 additions & 0 deletions packages/tool-server/src/tools/interactions/run-sequence.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ Use when you need sequential actions and do NOT need to observe the screen betwe
Returns { completed, total, steps } with per-step results. Fails if an unrecognised tool name is used in a step (error returned at that step, execution stops).
No screenshot is captured automatically — call screenshot separately after the sequence if needed.

For when to batch vs use separate calls with discovery between steps, see the argent-simulator-interact skill (Action Sequencing / run-sequence).

ONLY use this when every step is known in advance. If any step depends on the
result of a previous one (e.g. tapping a menu item that only appears after
a prior tap), use individual tool calls instead.
Expand Down
5 changes: 3 additions & 2 deletions scripts/extract-tools.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,9 @@ function extractFromFile(filePath) {
const id = idMatch[1];
const afterId = src.slice(idMatch.index);

// Look for description within the next 2000 chars (handles multi-line)
const descWindow = afterId.slice(0, 2000);
// Look for description within the next N chars (handles multi-line; some tools
// have long descriptions before the closing backtick, e.g. run-sequence)
const descWindow = afterId.slice(0, 3000);

// Template literal description
let descMatch = descWindow.match(/\bdescription:\s*`([\s\S]*?)`/);
Expand Down
Loading