Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,5 @@ reports/
.DS_Store
/chat
/askui_chat.db
.cache/

262 changes: 262 additions & 0 deletions docs/caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
# Caching

The caching mechanism allows you to record and replay agent action sequences (trajectories) for faster and more robust test execution. This feature is particularly useful for regression testing, where you want to replay known-good interaction sequences to verify that your application still behaves correctly.

## Overview

The caching system works by recording all tool use actions (mouse movements, clicks, typing, etc.) performed by the agent during an `act()` execution. These recorded sequences can then be replayed in subsequent executions, allowing the agent to skip the decision-making process and execute the actions directly.

## Caching Strategies

The caching mechanism supports four strategies, configured via the `caching_settings` parameter in the `act()` method:

- **`"no"`** (default): No caching is used. The agent executes normally without recording or replaying actions.
- **`"write"`**: Records all agent actions to a cache file for future replay.
- **`"read"`**: Provides tools to the agent to list and execute previously cached trajectories.
- **`"both"`**: Combines read and write modes - the agent can use existing cached trajectories and will also record new ones.

## Configuration

Caching is configured using the `CachingSettings` class:

```python
from askui.models.shared.settings import CachingSettings, CachedExecutionToolSettings

caching_settings = CachingSettings(
strategy="write", # One of: "read", "write", "both", "no"
cache_dir=".cache", # Directory to store cache files
filename="my_test.json", # Filename for the cache file (optional for write mode)
execute_cached_trajectory_tool_settings=CachedExecutionToolSettings(
delay_time_between_action=0.5 # Delay in seconds between each cached action
)
)
```

### Parameters

- **`strategy`**: The caching strategy to use (`"read"`, `"write"`, `"both"`, or `"no"`).
- **`cache_dir`**: Directory where cache files are stored. Defaults to `".cache"`.
- **`filename`**: Name of the cache file to write to or read from. If not specified in write mode, a timestamped filename will be generated automatically (format: `cached_trajectory_YYYYMMDDHHMMSSffffff.json`).
- **`execute_cached_trajectory_tool_settings`**: Configuration for the trajectory execution tool (optional). See [Execution Settings](#execution-settings) below.

### Execution Settings

The `CachedExecutionToolSettings` class allows you to configure how cached trajectories are executed:

```python
from askui.models.shared.settings import CachedExecutionToolSettings

execution_settings = CachedExecutionToolSettings(
delay_time_between_action=0.5 # Delay in seconds between each action (default: 0.5)
)
```

#### Parameters

- **`delay_time_between_action`**: The time to wait (in seconds) between executing consecutive cached actions. This delay helps ensure UI elements have time to respond before the next action is executed. Defaults to `0.5` seconds.

You can adjust this value based on your application's responsiveness:
- For faster applications or quick interactions, you might use a smaller delay (e.g., `0.1` or `0.2` seconds)
- For slower applications or complex UI updates, you might need a longer delay (e.g., `1.0` or `2.0` seconds)

## Usage Examples

### Writing a Cache (Recording)

Record agent actions to a cache file for later replay:

```python
from askui import VisionAgent
from askui.models.shared.settings import CachingSettings

with VisionAgent() as agent:
agent.act(
goal="Fill out the login form with username 'admin' and password 'secret123'",
caching_settings=CachingSettings(
strategy="write",
cache_dir=".cache",
filename="login_test.json"
)
)
```

After execution, a cache file will be created at `.cache/login_test.json` containing all the tool use actions performed by the agent.

### Reading from Cache (Replaying)

Provide the agent with access to previously recorded trajectories:

```python
from askui import VisionAgent
from askui.models.shared.settings import CachingSettings

with VisionAgent() as agent:
agent.act(
goal="Fill out the login form",
caching_settings=CachingSettings(
strategy="read",
cache_dir=".cache"
)
)
```

When using `strategy="read"`, the agent receives two additional tools:

1. **`retrieve_available_trajectories_tool`**: Lists all available cache files in the cache directory
2. **`execute_cached_executions_tool`**: Executes a specific cached trajectory

The agent will automatically check if a relevant cached trajectory exists and use it if appropriate. After executing a cached trajectory, the agent will verify the results and make corrections if needed.

### Using Custom Execution Settings

You can customize the delay between cached actions to match your application's responsiveness:

```python
from askui import VisionAgent
from askui.models.shared.settings import CachingSettings, CachedExecutionToolSettings

with VisionAgent() as agent:
agent.act(
goal="Fill out the login form",
caching_settings=CachingSettings(
strategy="read",
cache_dir=".cache",
execute_cached_trajectory_tool_settings=CachedExecutionToolSettings(
delay_time_between_action=1.0 # Wait 1 second between each action
)
)
)
```

This is particularly useful when:
- Your application has animations or transitions that need time to complete
- UI elements take time to become interactive after appearing
- You're testing on slower hardware or environments

### Using Both Strategies

Enable both reading and writing simultaneously:

```python
from askui import VisionAgent
from askui.models.shared.settings import CachingSettings

with VisionAgent() as agent:
agent.act(
goal="Complete the checkout process",
caching_settings=CachingSettings(
strategy="both",
cache_dir=".cache",
filename="checkout_test.json"
)
)
```

In this mode:
- The agent can use existing cached trajectories to speed up execution
- New actions will be recorded to the specified cache file
- If a cached execution is used, no new cache file will be written (to avoid duplicates)

## Cache File Format

Cache files are JSON files containing an array of tool use blocks. Each block represents a single tool invocation with the following structure:

```json
[
{
"type": "tool_use",
"id": "toolu_01AbCdEfGhIjKlMnOpQrStUv",
"name": "computer",
"input": {
"action": "mouse_move",
"coordinate": [150, 200]
}
},
{
"type": "tool_use",
"id": "toolu_02AbCdEfGhIjKlMnOpQrStUv",
"name": "computer",
"input": {
"action": "left_click"
}
},
{
"type": "tool_use",
"id": "toolu_03AbCdEfGhIjKlMnOpQrStUv",
"name": "computer",
"input": {
"action": "type",
"text": "admin"
}
}
]
```

Note: Screenshot actions are excluded from cached trajectories as they don't modify the UI state.

## How It Works

### Write Mode

In write mode, the `CacheWriter` class:

1. Intercepts all assistant messages via a callback function
2. Extracts tool use blocks from the messages
3. Stores them in memory during execution
4. Writes them to a JSON file when the agent finishes (on `stop_reason="end_turn"`)
5. Automatically skips writing if a cached execution was used (to avoid recording replays)

### Read Mode

In read mode:

1. Two caching tools are added to the agent's toolbox
2. A special system prompt (`CACHE_USE_PROMPT`) is appended to instruct the agent on how to use trajectories
3. The agent can call `retrieve_available_trajectories_tool` to see available cache files
4. The agent can call `execute_cached_executions_tool` with a trajectory file path to replay it
5. During replay, each tool use block is executed sequentially with a configurable delay between actions (default: 0.5 seconds)
6. Screenshot and trajectory retrieval tools are skipped during replay
7. The agent is instructed to verify results after replay and make corrections if needed

The delay between actions can be customized using `CachedExecutionToolSettings` to accommodate different application response times.

## Limitations

- **UI State Sensitivity**: Cached trajectories assume the UI is in the same state as when they were recorded. If the UI has changed, the replay may fail or produce incorrect results.
- **No on_message Callback**: When using `strategy="write"` or `strategy="both"`, you cannot provide a custom `on_message` callback, as the caching system uses this callback to record actions.
- **Verification Required**: After executing a cached trajectory, the agent should verify that the results are correct, as UI changes may cause partial failures.

## Example: Complete Test Workflow

Here's a complete example showing how to record and replay a test:

```python
from askui import VisionAgent
from askui.models.shared.settings import CachingSettings, CachedExecutionToolSettings

# Step 1: Record a successful login flow
print("Recording login flow...")
with VisionAgent() as agent:
agent.act(
goal="Navigate to the login page and log in with username 'testuser' and password 'testpass123'",
caching_settings=CachingSettings(
strategy="write",
cache_dir="test_cache",
filename="user_login.json"
)
)

# Step 2: Later, replay the login flow for regression testing
print("\nReplaying login flow for regression test...")
with VisionAgent() as agent:
agent.act(
goal="Log in to the application",
caching_settings=CachingSettings(
strategy="read",
cache_dir="test_cache"
execute_cached_trajectory_tool_settings=CachedExecutionToolSettings(
delay_time_between_action=1.0
)
)
)
```
Loading