This document describes all CLI arguments supported by fb infer.
fb infer \
--agent gemini_cli \
--model gemini-3-pro-previewfb infer \
--resume runs/2025-12-02__16-06-04In resume mode, most arguments are loaded from run_metadata.json. Only a few
flags can override metadata (see the argument list below).
-
--config-path
Path toconfig.toml.
If not provided, uses default discovery (searching upward fromfeaturebench/infer). -
--agent, -a
Agent to use:claude_code,gemini_cli,openhands,codex,mini_swe_agent.
Required unless--resumeis used. -
--model, -m
Model name (e.g.,claude-sonnet-4-20250514,gemini-3-pro-preview).
Foropenhandsormini_swe_agent, useprovider/modelformat.
Required unless--resumeis used. -
--api-key
Override agent API key (takes precedence over config).
Saved intorun_metadata.json; resume uses metadata unless overridden again. -
--base-url
Override agent base URL (takes precedence over config).
Saved intorun_metadata.json; resume uses metadata unless overridden again. -
--version
Override agent version (takes precedence over config).
Saved intorun_metadata.json; resume uses metadata unless overridden again. -
--dataset
HuggingFace dataset repo name (e.g.,LiberCoders/FeatureBench).
Default:LiberCoders/FeatureBenchin non-resume mode. -
--split
Dataset split name (e.g.,lite,full).
Default:fullin non-resume mode. In resume mode, uses metadata. -
--level
Filter tasks by level (1or2).
Default: all levels. -
--task-id, -t
Only process specified task IDs (space-separated).
Default: all tasks. -
--n-attempts
Number of attempts per task.
Default:1. -
--n-concurrent
Number of concurrent tasks.
Default:1in non-resume mode.
Resume mode: can override metadata if explicitly provided. -
--output-dir, -o
Output directory root.
Default:runs.
Resume mode: ignored (uses the resume directory). -
--timeout
Timeout per task (seconds).
Default:3600.
Resume mode: can override metadata if explicitly provided. -
--resume
Resume from a previous run directory (e.g.,runs/2025-12-02__16-06-04).
Most arguments are loaded fromrun_metadata.json. -
--force-rerun
Force rerun specific task IDs even if they were completed.
Accepts space-separated task IDs or a.txtfile path (one task_id per line).
-
--proxy-port
Proxy port for container network (host gateway) (e.g.,--proxy-port 7890).
Default:None.
Resume mode: can override metadata if explicitly provided. -
--runtime-proxy
Enable or disableHTTP_PROXY/HTTPS_PROXYat agent runtime.
Choices:on,off.
Default:onwhen--proxy-portis provided, otherwiseoff.
Resume mode: can override metadata if explicitly provided. -
--gpu-ids
Comma-separated GPU IDs (e.g.,0,1,2,3).
Default: all available.
Resume mode: can override metadata if explicitly provided. -
--force-timeout
If a task run times out (infer.logcontains[TIMEOUT after ... seconds]), treat that attempt as successful instead of failed.
Default: disabled.
Resume mode: can override metadata if explicitly provided.
-
--without
Remove the## Interface Descriptionssection from the prompt.
Resume mode: ignored (uses metadata). -
--white
Enable white-box mode (expose FAIL_TO_PASS test file path in prompt).
Resume mode: ignored (uses metadata).
-
--native-tool-calling
Force native tool calling (LLM_NATIVE_TOOL_CALLING=true).
Resume mode: ignored (uses metadata). -
--max-iters
Maximum iterations for OpenHands (OPENHANDS_MAX_ITERATIONS).
Default: no override (OpenHands default applies).
Resume mode: ignored (uses metadata).
runs/{timestamp}/
├── output.jsonl # Inference results (one JSON per line)
├── run_metadata.json # Run configuration and metadata
├── run_summary_{timestamp}.json # Run summary of success and failure
└── run_outputs/
└── {task_id}/
└── attempt-{n}/
├── infer.log # Agent execution log
├── run.log # Runtime log
└── patch.diff # Generated patch