CLI reference

Invocation:

python -m agent_cap.agents [flags]

Discovery flags

Flag	Effect
`--list-strategies`	Print built-in + plugin strategies, exit.
`--list-protocols`	Print registered LLM protocols, exit.

Strategy

Flag	Default	Notes
`--strategy NAME`	`plan-execute`	One of `--list-strategies`.
`--max-turns N`	`8`	Cap on tool-use turns per agent.
`--sequence r1,r2,...`	(config order)	Only for `--strategy sequential`.

Agents and roles

Two YAML layouts are supported. See yaml.md for details. From the command line:

Flag	Repeatable	Notes
`--agent ROLE=k=v,k=v`	yes	Inline. ROLE is the agent pool key AND the role name unless you also use `--role`.
`--agents-file PATH`	yes	YAML file with only `agents:` (+ optional `defaults:`/`include:`). Later files override earlier.
`--agents-glob PATTERN`	yes	Shell glob, expanded and sorted.
`--role ROLE=AGENT`	yes	Bind strategy role to a pool key. Lets two roles share one endpoint.

Single-agent shortcuts

For the common case of "one agent named agent", these flags build the role directly without --agent ROLE=k=v,k=v syntax. They are ignored if an agent role already exists (from --config, --agent, --agents-file).

Flag	Endpoint field
`--model NAME`	`name`
`--base-url URL`	`base_url`
`--api-key KEY`	`api_key`
`--max-tokens N`	`max_tokens`
`--temperature T`	`temperature`
`--top-p P`	`top_p`
`--seed N`	`seed`
`--engine NAME`	`engine` (e.g. vllm, sglang)
`--protocol NAME`	`protocol` (openai, harmony, mock)
`--system-prompt TEXT`	`system_prompt`
`--use-streaming`	`use_streaming=true`

Example without any YAML:

python -m agent_cap.agents --strategy single \
  --model gpt-oss-120b --base-url http://localhost:30000 --api-key EMPTY \
  --engine sglang --max-tokens 131072 --top-p 1.0 --seed 42 \
  --dataset imo-answerbench --num-tasks 5 --max-turns 128 \
  --tool-backend math-python --evaluator imo \
  --judge "name=openrouter/elephant-alpha,base_url=https://openrouter.ai/api/v1,api_key=$OPENROUTER_API_KEY" \
  --output-dir results/imo_gptoss_sglang --resume

Supported keys inside --agent and --role values:

name           model id / served-model-name
base_url       OpenAI-compatible base URL (must include /v1 for hosted)
api_key        bearer; any non-empty string is fine for self-hosted
max_tokens     int, default 16384
temperature    float, default 0.0
top_p          float, default 1.0 (harmony protocol only; openai uses provider default)
seed           int, optional (harmony protocol only; openai client does not forward seed)
use_streaming  bool, default false
protocol       explicit override: openai | harmony | mock | ...
engine         serving-engine variant for the protocol (e.g. vllm | sglang for harmony)
system_prompt  optional system message

Tasks

Flag	Notes
`--task "..."`	Single prompt.
`--task-file PATH.jsonl`	Each line: `{"task_id": ..., "user_prompt": ..., ...}`.
`--dataset NAME`	Uses `agent_cap.runner.unified_runner._load_dataset_tasks`. Known names: `imo-answerbench`, `mcp-atlas`, `swe-bench-pro`, `medagentbench`, `financebench`.
`--num-tasks N`	Cap dataset to N tasks (0 = all).

Tools

Flag	Notes
`--demo-tools`	Built-in `calc` + `echo` for smoke tests.
`--tool-backend NAME`	Real backends: `mcp`, `math-python`, or `demo`.
`--mcp-server-url URL`	Required when `--tool-backend mcp`.
`--no-tools`	Force-disable tools even if config provides them.

Evaluation

Flag	Notes
`--evaluator NAME`	One of `--list-strategies`-equivalent for evaluators: `exact`, `gtfa`, `imo`, `imo-answerbench`, `llm-judge` / `judge`, `none`.
`--judge "k=v,k=v"`	Config for `llm-judge` (or as fallback for other evaluators). Keys: `name`, `base_url`, `api_key`, `temperature`, `max_tokens`, `timeout_s`, `system_prompt`, `user_template`, `decision_field`, `score_field`, `extract`.

Output and resume

Flag	Notes
`--output PATH.json`	Single-file dump when not using `--output-dir`.
`--output-dir DIR`	Writes `results.jsonl` (one row per task) and `output-data.jsonl`.
`--resume`	Skip `task_id`s already present in `<output-dir>/results.jsonl`.
`-v / -vv`	-v prints per-task summary; -vv adds per-turn trace.

Runtime

Flag	Notes
`--mock`	Deterministic offline LLM. No API key, no network.
`--load-module DOTTED.PATH`	Import a module so its `@register_*` decorators fire. Repeatable.
`--config PATH.yaml`	Main YAML config. CLI flags override fields in it.

Precedence

When a setting can be supplied in multiple places:

CLI flag  >  --config YAML  >  --agents-file / --agents-glob  >  YAML `include:`

For per-agent fields specifically:

CLI --agent (inline)  >  CLI --role (mapping)  >  --config agents:  >  --agents-file agents:  >  defaults:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI reference

Discovery flags

Strategy

Agents and roles

Single-agent shortcuts

Tasks

Tools

Evaluation

Output and resume

Runtime

Precedence

FilesExpand file tree

cli.md

Latest commit

History

cli.md

File metadata and controls

CLI reference

Discovery flags

Strategy

Agents and roles

Single-agent shortcuts

Tasks

Tools

Evaluation

Output and resume

Runtime

Precedence