FeatureBench Infer CLI Arguments

This document describes all CLI arguments supported by fb infer.

1 Basic Usage

fb infer \
  --agent gemini_cli \
  --model gemini-3-pro-preview

2 Resume Mode

fb infer \
  --resume runs/2025-12-02__16-06-04

In resume mode, most arguments are loaded from run_metadata.json. Only a few flags can override metadata (see the argument list below).

3 Argument Reference

Core

--config-path
Path to config.toml.
If not provided, uses default discovery (searching upward from featurebench/infer).
--agent, -a
Agent to use: claude_code, gemini_cli, openhands, codex, mini_swe_agent.
Required unless --resume is used.
--model, -m
Model name (e.g., claude-sonnet-4-20250514, gemini-3-pro-preview).
For openhands or mini_swe_agent, use provider/model format.
Required unless --resume is used.
--api-key
Override agent API key (takes precedence over config).
Saved into run_metadata.json; resume uses metadata unless overridden again.
--base-url
Override agent base URL (takes precedence over config).
Saved into run_metadata.json; resume uses metadata unless overridden again.
--version
Override agent version (takes precedence over config).
Saved into run_metadata.json; resume uses metadata unless overridden again.
--dataset
HuggingFace dataset repo name (e.g., LiberCoders/FeatureBench).
Default: LiberCoders/FeatureBench in non-resume mode.
--split
Dataset split name (e.g., lite, full).
Default: full in non-resume mode. In resume mode, uses metadata.
--level
Filter tasks by level (1 or 2).
Default: all levels.
--task-id, -t
Only process specified task IDs (space-separated).
Default: all tasks.
--n-attempts
Number of attempts per task.
Default: 1.
--n-concurrent
Number of concurrent tasks.
Default: 1 in non-resume mode.
Resume mode: can override metadata if explicitly provided.
--output-dir, -o
Output directory root.
Default: runs.
Resume mode: ignored (uses the resume directory).
--timeout
Timeout per task (seconds).
Default: 3600.
Resume mode: can override metadata if explicitly provided.
--resume
Resume from a previous run directory (e.g., runs/2025-12-02__16-06-04).
Most arguments are loaded from run_metadata.json.
--force-rerun
Force rerun specific task IDs even if they were completed.
Accepts space-separated task IDs or a .txt file path (one task_id per line).

Networking / Runtime

--proxy-port
Proxy port for container network (host gateway) (e.g., --proxy-port 7890).
Default: None.
Resume mode: can override metadata if explicitly provided.
--runtime-proxy
Enable or disable HTTP_PROXY/HTTPS_PROXY at agent runtime.
Choices: on, off.
Default: on when --proxy-port is provided, otherwise off.
Resume mode: can override metadata if explicitly provided.
--gpu-ids
Comma-separated GPU IDs (e.g., 0,1,2,3).
Default: all available.
Resume mode: can override metadata if explicitly provided.
--force-timeout
If a task run times out (infer.log contains [TIMEOUT after ... seconds]), treat that attempt as successful instead of failed.
Default: disabled.
Resume mode: can override metadata if explicitly provided.

Prompt Control (For ablation experiment)

--without
Remove the ## Interface Descriptions section from the prompt.
Resume mode: ignored (uses metadata).
--white
Enable white-box mode (expose FAIL_TO_PASS test file path in prompt).
Resume mode: ignored (uses metadata).

OpenHands Only

--native-tool-calling
Force native tool calling (LLM_NATIVE_TOOL_CALLING=true).
Resume mode: ignored (uses metadata).
--max-iters
Maximum iterations for OpenHands (OPENHANDS_MAX_ITERATIONS).
Default: no override (OpenHands default applies).
Resume mode: ignored (uses metadata).

Section 4: Output Directory Structure

runs/{timestamp}/
├── output.jsonl                  # Inference results (one JSON per line)
├── run_metadata.json             # Run configuration and metadata
├── run_summary_{timestamp}.json  # Run summary of success and failure
└── run_outputs/ 
    └── {task_id}/
        └── attempt-{n}/
            ├── infer.log         # Agent execution log
            ├── run.log           # Runtime log
            └── patch.diff        # Generated patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FeatureBench Infer CLI Arguments

1 Basic Usage

2 Resume Mode

3 Argument Reference

Core

Networking / Runtime

Prompt Control (For ablation experiment)

OpenHands Only

Section 4: Output Directory Structure

FilesExpand file tree

infer_cli_arg.md

Latest commit

History

infer_cli_arg.md

File metadata and controls

FeatureBench Infer CLI Arguments

1 Basic Usage

2 Resume Mode

3 Argument Reference

Core

Networking / Runtime

Prompt Control (For ablation experiment)

OpenHands Only

Section 4: Output Directory Structure