Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions crates/agentic-core/tests/cassettes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Cassette Recorder

`record_cassette.py` runs an embedded proxy between the script and an upstream API (OpenAI or vLLM). Every request and response is captured into a YAML cassette for use in replay tests.

## How it works

```
[record_cassette.py] -> [proxy :7070] -> [OpenAI | vLLM]
(cassette written here)
```

The proxy intercepts each turn, records the request body and response, then appends a `t<N>` entry to the output YAML.

The recorder is interactive. For each turn it prompts you to type the input message and waits for you to press Enter before sending the request. You can run it directly in your terminal and type the prompts by hand, or pipe them in from a script using `printf` or `echo` to feed all turns non-interactively:

```bash
# interactive -- type each prompt when asked
python tests/cassettes/record_cassette.py --mode responses --turns 2 --no-stream --vllm http://localhost:5050 --model Qwen/Qwen3-30B-A3B-FP8 --output out.yaml

# non-interactive -- pipe prompts in (one line per turn)
printf 'First prompt\nSecond prompt\n' | python tests/cassettes/record_cassette.py --mode responses --turns 2 --no-stream --vllm http://localhost:5050 --model Qwen/Qwen3-30B-A3B-FP8 --output out.yaml
```

The recorder scripts (`record_reasoning_cassettes.sh`, `record_tool_call_cassettes.sh`, etc.) use `printf` to feed fixed prompts per test so no manual input is needed.

## Modes

| Mode | Description |
|------|-------------|
| `responses` | Chains turns via `previous_response_id`. Only mode supported with `--vllm`. |
| `conv` | Creates a conversation object, passes `conversation` id each turn. |
| `isolation` | Two independent conversations (A and B) recorded into one cassette. |
| `mixed` | Turn 1 uses `conversation` id, turns 2+ switch to `previous_response_id`. |
| `store_true_then_store_false` | Turn 1: `store=true` with conversation id. Remaining turns: `store=false`, still pass conversation id. |

## CLI options

```
--turns N Number of turns
--output PATH Output YAML path
--mode MODE responses | conv | isolation | mixed | store_true_then_store_false (default: conv)
--stream / --no-stream Streaming or non-streaming (default: streaming)
--model NAME Model name sent in requests
--no-store Set store=false
--vllm URL vLLM upstream, e.g. http://localhost:8000 (responses mode only)
--openai URL OpenAI upstream (default https://api.openai.com)
--tools FILE JSON file containing a tools array (responses mode only)
--tool-choice VALUE "auto", "none", "required", or JSON e.g. '{"type":"function","name":"foo"}'
--proxy-port PORT Local proxy port (default 7070)
--branch-from TURN Branch from this turn's response id (repeatable)
--branch-turn-number N First turn number for the corresponding branch (repeatable)
```

## Cassette YAML structure

Each cassette has a `turns` list. One entry is appended per request.

**Single turn (`--turns 1`, non-streaming):**

```yaml
turns:
- filename: t1
request:
method: POST
path: /v1/responses
body:
model: Qwen/Qwen3-30B-A3B-FP8
input: Reply with exactly one word: HELLO
stream: false
store: true
headers:
content-type: application/json
query_params: {}
response:
status_code: 200
headers:
content-type: application/json
body:
id: resp_abc123
output: [...]
usage: {...}
```

**Two turns (`--turns 2`, non-streaming) -- `t2` adds `previous_response_id`:**

```yaml
turns:
- filename: t1
request:
body:
input: "Remember the word APPLE. Just say: OK"
store: true
response:
body:
id: resp_abc123

- filename: t2
request:
body:
input: What word did I ask you to remember?
previous_response_id: resp_abc123
response:
body:
id: resp_def456
```

**Tool call turn -- `tool_choice` and `tools` appear in the request body:**

```yaml
turns:
- filename: t1
request:
body:
input: What is the NVIDIA stock price?
tool_choice: auto
tools:
- type: function
name: get_stock_price
description: ...
parameters: {...}
response:
body:
output:
- type: function_call
name: get_stock_price
arguments: '{"ticker": "NVDA"}'
```

**Streaming turn -- `response.body` is replaced by `response.sse`, a list of raw SSE lines:**

```yaml
turns:
- filename: t1
request:
body:
stream: true
response:
status_code: 200
headers:
content-type: text/event-stream; charset=utf-8
sse:
- "event: response.created\n"
- "data: {...}\n"
- "event: response.output_text.delta\n"
- "data: {...}\n"
- "event: response.completed\n"
- "data: {...}\n"
```

## Recorder scripts

| Script | Cassettes | Backend |
|--------|-----------|---------|
| `record_text_only_cassettes.sh` | 10 text-only cassettes (responses + conv modes, streaming + non-streaming) | OpenAI (`OPENAI_API_KEY`) |
| `record_reasoning_cassettes.sh` | 2 reasoning cassettes (single turn, streaming + non-streaming) | vLLM |
| `record_tool_call_cassettes.sh` | 8 tool-call cassettes (4 tool_choice modes x streaming + non-streaming) | vLLM |

### Text-only (OpenAI)

```bash
OPENAI_API_KEY=sk-... bash tests/cassettes/record_text_only_cassettes.sh
MODEL=gpt-4o-mini OPENAI_API_KEY=sk-... bash tests/cassettes/record_text_only_cassettes.sh
```

### Reasoning (vLLM)

```bash
vllm serve Qwen/Qwen3-30B-A3B-FP8 --reasoning-parser deepseek_r1 --port 5050 > server.log 2>&1

VLLM_URL=http://0.0.0.0:5050 MODEL=Qwen/Qwen3-30B-A3B-FP8 bash tests/cassettes/record_reasoning_cassettes.sh
```

### Tool calls (vLLM)

```bash
vllm serve Qwen/Qwen3-30B-A3B-FP8 --tool-call-parser hermes --enable-auto-tool-choice --port 5050 > server.log 2>&1

VLLM_URL=http://0.0.0.0:5050 MODEL=Qwen/Qwen3-30B-A3B-FP8 bash tests/cassettes/record_tool_call_cassettes.sh
```
45 changes: 44 additions & 1 deletion crates/agentic-core/tests/cassettes/record_cassette.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,13 @@ def _prompt(label: str) -> str:
sys.exit(0)


def _inject_tools(body: dict, tools: list | None, tool_choice: Any) -> None:
if tools is not None:
body["tools"] = tools
if tool_choice is not None:
body["tool_choice"] = tool_choice


def run_conv(
client: httpx.Client,
turns: int,
Expand Down Expand Up @@ -470,6 +477,8 @@ def run_responses(
store: bool,
branches: list[tuple[int, int | None]],
proxy_url: str,
tools: list | None = None,
tool_choice: Any = None,
) -> None:
response_ids: dict[int, str] = {}
branch_map: dict[int, int] = {}
Expand Down Expand Up @@ -497,6 +506,7 @@ def run_responses(
body: dict = {"model": model, "input": prompt, "stream": stream, "store": store}
if previous_response_id and store:
body["previous_response_id"] = previous_response_id
_inject_tools(body, tools, tool_choice)
response_id = _send(client, body, stream, proxy_url)
previous_response_id = response_id if store else None
if response_id:
Expand All @@ -522,6 +532,7 @@ def run_responses(
"store": store,
"previous_response_id": branch_resp_id,
}
_inject_tools(body, tools, tool_choice)
_send(client, body, stream, proxy_url)


Expand Down Expand Up @@ -593,6 +604,21 @@ def run_responses(
default=None,
help="vLLM upstream URL, e.g. http://localhost:8000 (responses mode only, no auth).",
)
@click.option(
"--tools",
"tools_file",
metavar="FILE",
default=None,
type=click.Path(exists=True),
help="Path to a JSON file containing a tools array to inject into every request.",
)
@click.option(
"--tool-choice",
"tool_choice_raw",
metavar="VALUE",
default=None,
help='tool_choice value: "auto", "none", "required", or JSON e.g. \'{"type":"function","name":"foo"}\'.',
)
def main(
turns: int,
output: str,
Expand All @@ -605,6 +631,8 @@ def main(
proxy_port: int,
openai_url: str | None,
vllm_url: str | None,
tools_file: str | None,
tool_choice_raw: str | None,
) -> None:
"""Interactive multi-turn cassette recorder (proxy embedded)."""
if branch_turn_number and not branch_from:
Expand All @@ -625,6 +653,21 @@ def main(
f"--vllm is only supported with --mode responses (got --mode {mode})."
)

tools: list | None = None
if tools_file:
with open(tools_file, encoding="utf-8") as f:
tools = json.load(f)
if not isinstance(tools, list):
raise click.UsageError("--tools file must contain a JSON array.")

tool_choice: Any = None
if tool_choice_raw:
stripped = tool_choice_raw.strip()
if stripped.startswith("{") or stripped.startswith("["):
tool_choice = json.loads(stripped)
else:
tool_choice = stripped

if vllm_url:
target = vllm_url.rstrip("/")
headers: dict = {}
Expand Down Expand Up @@ -660,7 +703,7 @@ def main(
elif mode == "mixed":
run_mixed(client, turns, model, stream, store, proxy_url)
elif mode == "responses":
run_responses(client, turns, model, stream, store, branches, proxy_url)
run_responses(client, turns, model, stream, store, branches, proxy_url, tools, tool_choice)
elif mode == "store_true_then_store_false":
run_store_true_then_store_false(client, turns, model, stream, proxy_url)
finally:
Expand Down
Loading