Auditing Pipelines with `ado-aw audit`

Overview

ado-aw audit audits one Azure DevOps build at a time. It downloads the selected build artifacts, runs the built-in analyzers (firewall, MCP gateway, OTel, safe outputs, detection verdict, build timeline, and missing-tool / missing-data / noop extraction), and renders a structured console report or the raw AuditData JSON. The MVP is single-run only; diff mode and cross-run trend reporting are follow-ups.

Usage

ado-aw audit <build-id-or-url> [options]

Accepted input formats

Input	Example
Numeric build ID	`12345`
dev.azure.com URL	`https://dev.azure.com/my-org/My%20Project/_build/results?buildId=12345`
dev.azure.com URL with job/step anchors	`...?buildId=12345&j=<guid>&t=<guid>` (accepted; the MVP audits the parent build)
Legacy visualstudio.com URL	`https://my-org.visualstudio.com/proj/_build/results?buildId=12345`
On-prem Azure DevOps Server URL	`https://onprem.example.com/DefaultCollection/MyProject/_build/results?buildId=12345`

URL-encoded project segments are decoded before the ADO context is resolved. t= and s= are both accepted as step anchors.

Flags

Flag	Default	Behavior
`-o, --output <dir>`	`./logs`	Directory under which `<dir>/build-<id>/` is written.
`--json`	off	Emit the full `AuditData` as JSON to stdout (suppresses the trailing `Audit complete` stderr line).
`--org <url>`	auto	Azure DevOps organization override for bare build IDs. Full build URLs provide the host / org directly.
`--project <name>`	auto	Azure DevOps project override for bare build IDs. Full build URLs provide the project directly.
`--pat <token>`	env	Personal Access Token. Also reads `AZURE_DEVOPS_EXT_PAT`. Falls back to the existing Azure CLI auth chain when omitted.
`--artifacts <set,...>`	all	Restrict download + analysis to a subset of artifact sets. Valid values: `agent`, `detection`, `safe-outputs` (`safe_outputs` alias also accepted).
`--no-cache`	off	Force re-processing even if `<dir>/build-<id>/run-summary.json` already exists.

Behavior

The command resolves <build-id-or-url> first. Bare IDs use --org / --project or git-remote auto-detection; full build URLs contribute host, org, and project, and those URL-derived values win.
Only the three audit artifact families are in scope: agent_outputs*, analyzed_outputs*, and safe_outputs*. Other published build artifacts are ignored.
Artifact refresh is cache-preserving. If a matching local artifact directory already exists, it is renamed aside before re-download and restored if the download fails.
Analyzer failures are soft. The command records a warning, keeps any successfully-derived sections, and still renders the report.
When multiple local directories share one recognized prefix, the lexicographically last match is used.

Output layout

<output>/build-<id>/
├── run-summary.json                  # Cached AuditData, CLI-version-keyed
├── agent_outputs[_<BuildId>]/        # Downloaded artifact (Agent stage)
│   ├── staging/
│   │   ├── safe_outputs.ndjson       # Agent's safe-output proposals
│   │   ├── aw_info.json              # Runtime engine / agent / source metadata
│   │   └── otel.jsonl                # Copilot OTel (when emitted)
│   └── logs/
│       ├── firewall/                 # AWF Squid proxy logs
│       ├── mcpg/                     # MCP Gateway logs
│       ├── safeoutputs.log           # SafeOutputs HTTP server log
│       └── agent-output.txt          # Filtered agent stdout
├── analyzed_outputs[_<BuildId>]/     # Downloaded artifact (Detection stage)
│   ├── threat-analysis.json          # Aggregate verdict + reasons
│   └── threat-analysis-output.txt
└── safe_outputs[_<BuildId>]/         # Downloaded artifact (SafeOutputs stage)
    └── safe-outputs-executed.ndjson  # Per-item execution log

aw_info.json, otel.jsonl, and safe_outputs.ndjson are searched in staging/ first and then at the artifact top level so older layouts still audit cleanly.

Report shape (`AuditData`)

Current top-level keys include the following. Optional sections are omitted from --json when empty.

Key	Source
`overview`	ADO build metadata + `aw_info.json` (engine, model, agent name, source, target).
`task_domain`	Audit heuristics over the run's prompts and outputs.
`behavior_fingerprint`	Higher-level audit heuristics over the run's behavior.
`agentic_assessments`	Higher-level audit assessments emitted by the analyzers.
`metrics`	OTel JSONL (`otel.jsonl`) plus audit-time warning / error counts.
`key_findings`	Heuristic rules + analyzer-emitted findings (for example aggregate-gate rejection).
`recommendations`	Follow-up actions derived from findings.
`performance_metrics`	Derived from `metrics`, runtime duration, tool usage, and firewall counts.
`engine_config`	Runtime engine configuration derived from `aw_info.json`.
`safe_output_summary`	Counts of proposed / executed / rejected / not processed items.
`safe_output_execution`	Per-item trace joining proposal + detection + execution.
`rejected_safe_outputs`	Rollup of rejections by reason / threat flag.
`detection_analysis`	`threat-analysis.json`.
`mcp_server_health`	MCPG logs aggregated per server.
`mcp_tool_usage`	MCPG logs aggregated per `(server, tool)`.
`mcp_failures`	MCPG `tool_error` / `server_error` events.
`jobs`	ADO `/timeline` records filtered to `type: Job`.
`firewall_analysis`	AWF Squid proxy logs aggregated by domain.
`policy_analysis`	AWF policy artifacts aggregated into allow / deny summaries.
`missing_tools` / `missing_data` / `noops`	NDJSON entries from the corresponding SafeOutputs MCP tools.
`downloaded_files`	One entry per file under `<output>/build-<id>/`.
`errors` / `warnings`	Run-level error / warning aggregates.
`tool_usage`	High-level runtime tool-usage rollups derived from telemetry.
`created_items`	Successful `executed` items with extracted id / url / title.

Rejected safe-output trace

When threat-analysis.json reports any threat flag, the audit treats the SafeOutputs batch as rejected by the aggregate gate and records each proposal with:

status: not_processed_due_to_aggregate_gate
applies_to_whole_batch: true
rejection_reason: the aggregate reasons[] from threat-analysis.json, joined with ;

Additionally, exactly one severity-high finding is emitted summarizing the gate decision: which threat flags fired, how many proposals were dropped, and the full aggregate reasons.

Per-item detection verdicts are not currently available. threat-analysis.md emits an aggregate verdict only; per-item verdicts are a follow-up that should stay aligned with gh-aw.

Cache behavior

<output>/build-<id>/run-summary.json is written after a successful run. On subsequent invocations against the same build:

If the cached ado_aw_version matches the current CLI version, the report is rendered from cache and download / analysis is skipped. The cache-hit info line is printed only in console mode.
If the cached file is missing, cannot be parsed, or was written by a different ado-aw version, it is ignored and the build is processed again.
--no-cache always re-processes.

Permission failures

The initial build-metadata fetch is live ADO-only. A 401 / 403 at that step is fatal.
If artifact listing or artifact download returns 401 / 403 and the run directory already contains at least one recognized artifact family, the audit continues from local cache and records a warning.
If artifact listing or download returns 401 / 403 and no local artifact cache exists, the command emits a structured error pointing at az pipelines runs artifact download --run-id <id> --path <dir> as the manual escape hatch.

Out-of-scope (planned follow-ups)

Diff mode (ado-aw audit <a> <b>) — domain / MCP / metrics diffs.
Cross-run trends (ado-aw audit --last N) — trend report over recent builds.
--parse — Rust-native log.md / firewall.md renderers.
Job / step audit — pin to a specific timeline record.
MCP-exposed audit — agentic-pipelines MCP tool for in-pipeline self-audit.
Per-item detection verdict — coordinated upstream with gh-aw.
Additional pipeline inventory artifacts — graceful-degradation gaps such as richer AWF policy / firewall inventories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auditing Pipelines with `ado-aw audit`

Overview

Usage

Accepted input formats

Flags

Behavior

Output layout

Report shape (`AuditData`)

Rejected safe-output trace

Cache behavior

Permission failures

Out-of-scope (planned follow-ups)

Related Documentation

FilesExpand file tree

audit.md

Latest commit

History

audit.md

File metadata and controls

Auditing Pipelines with ado-aw audit

Overview

Usage

Accepted input formats

Flags

Behavior

Output layout

Report shape (AuditData)

Rejected safe-output trace

Cache behavior

Permission failures

Out-of-scope (planned follow-ups)

Related Documentation

Auditing Pipelines with `ado-aw audit`

Report shape (`AuditData`)