A/B tests split live traffic through a gateway between a control variant and a treatment variant, then use
online evaluation configs to measure which performs better. When you have a winner, promote applies it to your project
config.
A/B tests are fire-and-forget jobs (like run recommendation and run batch-evaluation): you start one with
run ab-test, then manage its lifecycle with view / pause / resume / stop / promote / archive. They are
not declared in agentcore.json and are not created by deploy — the gateway, its targets, and any config bundles
must already be deployed first.
| Mode | Compares | Variant inputs |
|---|---|---|
config-bundle (default) |
Two versions of the same config bundle | --control-bundle/--control-version, --treatment-bundle/--treatment-version, shared --online-eval |
target-based |
Two gateway targets (runtime endpoints) | --control-target/--treatment-target, --control-online-eval/--treatment-online-eval |
Each A/B test needs its own gateway, and only one test can be RUNNING per gateway at a time.
# Config-bundle mode: compare two versions of one bundle (50/50 split)
agentcore run ab-test \
-n PromptTest \
-g MyGateway \
--mode config-bundle \
-r MyAgent \
--control-bundle MyBundle --control-version <v1> \
--treatment-bundle MyBundle --treatment-version <v2> \
--online-eval MyEvalConfig
# Target-based mode: compare two gateway targets
agentcore run ab-test \
-n TargetTest \
-g MyGateway \
--mode target-based \
-r MyAgent \
--control-target prodTarget \
--treatment-target stagingTarget \
--control-online-eval ctrlEval \
--treatment-online-eval treatEvalA test is enabled (RUNNING) on create by default. Pass --disable-on-create to create it stopped.
| Flag | Description |
|---|---|
-n, --name <name> |
Name for the A/B test |
-g, --gateway <name> |
Gateway name (must already be deployed) |
-m, --mode <mode> |
config-bundle (default) or target-based |
-r, --runtime <name> |
Runtime name (recorded as the agent) |
--control-weight <n> |
Control traffic weight 0–100 (default 50) |
--treatment-weight <n> |
Treatment traffic weight 0–100 (default 50) |
--max-duration-days <days> |
Auto-stop the test after this many days |
--role-arn <arn> |
Execution role ARN (auto-created if omitted) |
--disable-on-create |
Create the test without starting it (default: enabled) |
--gateway-filter <path> |
Restrict the test to a single gateway target path (e.g. /orders/*); applies to both modes |
--region <region> |
AWS region (auto-detected if omitted) |
--wait |
Block until the test reaches a terminal state |
--json |
JSON output |
| config-bundle mode | |
--control-bundle <name> |
Control bundle name or ARN |
--control-version <version> |
Control bundle version (or LATEST) |
--treatment-bundle <name> |
Treatment bundle name or ARN |
--treatment-version <version> |
Treatment bundle version (or LATEST) |
--online-eval <name> |
Shared online eval config name or ARN |
--traffic-header <name> |
Route traffic on this header instead of by weight |
| target-based mode | |
--control-target <name> |
Control gateway-target name |
--treatment-target <name> |
Treatment gateway-target name |
--control-online-eval <name> |
Online eval for the control endpoint (required) |
--treatment-online-eval <name> |
Online eval for the treatment endpoint (required) |
Names must start with a letter and contain only letters, digits, underscores, and hyphens (max 48 characters).
All lifecycle commands take the test's job ID via -i, --id (get it from run ab-test --json or view ab-test):
# List all A/B test jobs, or view one in detail
agentcore view ab-test
agentcore view ab-test <id> --json
# Pause / resume traffic splitting
agentcore pause ab-test -i <id>
agentcore resume ab-test -i <id>
# Stop the test (terminal)
agentcore stop ab-test -i <id>
# Apply the winning variant to agentcore.json, then deploy to roll it out
agentcore promote ab-test -i <id>
agentcore deploy
# Remove the job from local history (and the test from the service)
agentcore archive ab-test -i <id>promote writes the winning (treatment) variant into agentcore.json:
- config-bundle mode — control and treatment must be two versions of the same bundle; promote adopts the treatment version's components into that bundle. Promoting across two different bundles is rejected.
- target-based mode — control adopts the treatment endpoint: when both are named endpoints of the same runtime, control's endpoint version is bumped to the treatment's; otherwise the control target is repointed to the treatment's runtime/endpoint.
Promote does not deploy — review the change and run agentcore deploy to roll it out.
view ab-test <id> shows an Invocation URL derived from the test's gateway. Send traffic there and the gateway
splits it between the variants per the configured weights:
https://<gatewayId>.gateway.bedrock-agentcore.<region>.amazonaws.com/<target-or-agent>/invocations
(target-based uses the control target's path; config-bundle uses the agent name.)
view ab-test <id> shows, once the online evals have scored enough traffic, per-evaluator metrics: the control mean,
each treatment's mean with percent change, and a significance marker. --json includes the same under
results.evaluatorMetrics, plus status, executionStatus, variants, and invocationUrl.
Job records are saved under .cli/jobs/ab-tests/. Browse them in the TUI:
agentcore
# Navigate to: Run → A/B Tests (or View → A/B Tests)Run agentcore → Run → A/B Test for a guided flow:
- Select mode (config-bundle or target-based)
- Select the gateway
- Pick control + treatment variants (bundle versions, or gateway targets)
- Select online eval config(s)
- Optionally set a gateway filter
- Name the test and confirm
Selecting a test from the A/B Tests list shows its detail (status, variants, invocation URL, results) with keybindings to pause/resume/stop/promote/debug.