Skip to content

run: agent mode for unattended/containerized deploys#159

Merged
dangtony98 merged 10 commits intomainfrom
run-agent-mode
May 9, 2026
Merged

run: agent mode for unattended/containerized deploys#159
dangtony98 merged 10 commits intomainfrom
run-agent-mode

Conversation

@dangtony98
Copy link
Copy Markdown
Contributor

Summary

  • agent-vault run now accepts a pre-supplied token via env (AGENT_VAULT_TOKEN + AGENT_VAULT_ADDR + AGENT_VAULT_VAULT) and skips the admin-session/mint flow — the path needed for unattended container deploys (k8s, Fly, ECS) where there's no TTY and no on-disk session. The token is validated against /discover once at startup so bad/expired tokens fail fast with a clear error rather than producing 401s on every proxied call. --ttl is rejected in this mode (token lifetime is fixed at mint time).
  • Renames AGENT_VAULT_SESSION_TOKENAGENT_VAULT_TOKEN to match industry convention (GITHUB_TOKEN, STRIPE_API_KEY). The old name remains a backward-compat alias with a one-time stderr deprecation warning; vault run sets both names on the child env during the deprecation window so SDK consumers keep working until they update.
  • Both surfaces of the agent-facing contract updated as a pair (cmd/skill_cli.md, cmd/skill_http.md); env-var docs updated in all three required places (.env.example, docs/self-hosting/environment-variables.mdx, docs/reference/cli.mdx); TypeScript SDK adopts the same fallback-with-deprecation-warning pattern; new docs/guides/deploy-agent-container.mdx walks through the Dockerfile + 3-env-var deploy recipe.

Test plan

  • go test ./...cmd/ green, SDK 94/94 green. (internal/isolation/TestParseAndValidateMount_RejectDockerSocket fails the same way on main — pre-existing, unrelated.)
  • go vet ./... clean.
  • go build ./... clean.
  • New unit coverage in cmd/cmd_test.go and cmd/run_test.go: agent-mode env-var resolution (new + legacy + precedence + missing-addr error), validateEnvToken happy path + 401, resolveVaultForAgentMode (flag/env/error), --ttl rejection in agent mode.
  • Local end-to-end smoke: admin mode (auth loginagent-vault run -- env | grep AGENT_VAULT shows both env-var names).
  • Local end-to-end smoke: agent mode with no on-disk session — AGENT_VAULT_TOKEN=… AGENT_VAULT_ADDR=… AGENT_VAULT_VAULT=… ./agent-vault run -- curl https://api.github.com/zen validates, sets up MITM, exec's curl.
  • Local end-to-end smoke: deliberately bad token → friendly startup error before exec; --ttl 3600 with token pre-set → hard error.
  • Container smoke test: build the Dockerfile from the new guide; docker run with the three env vars against a local broker.

Notes

  • AGENT_VAULT_SESSION_TOKEN removal is deferred to a future major version. Worth tracking in the changelog so it's not forgotten.
  • Out of scope (deferred per the design discussion): --ca-path for distroless images, AGENT_VAULT_URL connection-string env, multi-vault default policy when AGENT_VAULT_VAULT is unset, published ghcr.io/infisical/agent-vault binary distribution.

`agent-vault run` previously required either an on-disk admin session
(from `auth login`) or an interactive TTY to mint a vault-scoped token —
neither fits a container whose ENTRYPOINT is `agent-vault run`.

This adds an "agent mode": when AGENT_VAULT_TOKEN, AGENT_VAULT_ADDR, and
AGENT_VAULT_VAULT are pre-set on the environment, `vault run` skips the
mint step and uses the env-supplied token (vault-scoped session OR
long-lived agent token) as the credential. The token is validated
against /discover once at startup so bad/expired tokens fail fast
instead of producing 401s on every proxied call. --ttl is rejected in
this mode since the token's lifetime is fixed at mint time.

Also renames AGENT_VAULT_SESSION_TOKEN → AGENT_VAULT_TOKEN to match
industry convention (GITHUB_TOKEN, STRIPE_API_KEY) — the new name fits
both token types without the "session" baggage. The old name remains as
a backward-compat alias with a one-time stderr deprecation warning, and
`vault run` sets both names on the child env during the deprecation
window so SDK consumers keep working until they update.

Updates: cmd skill files (cli + http) as a pair, env-var docs (.env.example,
environment-variables.mdx, cli.mdx), agent protocol docs, web UI
inline help, the TypeScript SDK (with the same fallback + warning
pattern), and adds a new docs/guides/deploy-agent-container.mdx
walkthrough.
@infisical-review-police
Copy link
Copy Markdown

💬 Discussion in Slack: #pr-review-agent-vault-159-run-agent-mode-for-unattended-containerized-deploys

Posted by Review Police — reviews, comments, new commits, and CI failures will stream into this channel.

@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 9, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
agent-vault 🟢 Ready View Preview May 9, 2026, 1:46 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Comment thread cmd/discover.go
Comment thread cmd/run.go Outdated
Comment thread docs/guides/deploy-agent-container.mdx Outdated
- Detect vault mismatch in validateEnvToken: parse /discover's response
  and reject when AGENT_VAULT_VAULT names a different vault than the
  supplied scoped-session token's baked-in one. The broker silently uses
  the session's vault and ignores X-Vault for scoped sessions, which
  would otherwise produce a child env where AGENT_VAULT_VAULT lies about
  the actual vault in use.
- Cross-link the new "Deploy your agent in a container" guide from the
  two adjacent guides: a Next-steps card on connect-custom-agent.mdx and
  a disambiguation note on container-isolation.mdx (which is about
  sandboxing dev-machine agents, a fundamentally different concern from
  shipping the agent as a container image).
- Add tests for vault-mismatch (rejected) and matching-vaults (passes)
  in TestValidateEnvToken.
@dangtony98
Copy link
Copy Markdown
Contributor Author

Addressing review feedback in 13dcff1:

1. Vault mismatch detection (cmd/helpers.go) — validateEnvToken now parses /discover's response and rejects when AGENT_VAULT_VAULT names a different vault than the supplied scoped-session token's baked-in vault. The broker silently uses the session's vault and ignores X-Vault for scoped sessions, so without this check the child would see a misleading AGENT_VAULT_VAULT in its env. Two new sub-tests in TestValidateEnvToken cover the mismatch (rejected) and matching (passes) cases.

2. Cross-links — added the two pointers the plan called for:

  • A "Deploy your agent in a container" card in the Next-steps section of connect-custom-agent.mdx.
  • A disambiguation <Note> at the top of container-isolation.mdx clarifying that it's about sandboxing the dev-machine agent in a container — fundamentally different from shipping the agent as a container image.

3. CHANGELOG — skipped: this repo doesn't have a CHANGELOG file; the deprecation notice lives in the commit message + skill-file copy + env-var docs, which is the project convention.

Test status unchanged: go test ./cmd/ green (incl. 4 sub-tests in TestValidateEnvToken now), go vet clean, go build clean.

Three small cleanups on the vault-mismatch-detection commit:

- Hoist the 1 MiB JSON read limit to a named const (discoverRespMaxBytes)
  so the rationale lives in one place; mirrors maxRequestBodySize on the
  server side.
- Restore the io.Discard drain after json.Decode. Decode stops at the
  first JSON value's closing brace, leaving trailing bytes that prevent
  net/http from pooling the connection — the original drain was lost
  when the success path was rewritten.
- Tighten the mismatch error: drop the "(it has no effect on a
  vault-scoped session token)" parenthetical since the doc comment
  already covers the WHY. Fold the redundant "matching vaults pass"
  test into the existing happy-path subtest by returning a vault name
  in the response body.
@dangtony98
Copy link
Copy Markdown
Contributor Author

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vault-mismatch fix in 13dcff1 looks good, but the three critical findings from my prior review remain unaddressed in this revision (X-Vault not threaded through vault discover and vault proposal create, child env in agent mode not deduped against parent AGENT_VAULT_* keys, and the deploy guide still references nonexistent agent create / agent token subcommands) — those inline threads were marked resolved but the code paths weren't changed.

Extended reasoning...

Overview

This PR adds an agent mode to agent-vault run for unattended container deploys: a pre-supplied AGENT_VAULT_TOKEN + AGENT_VAULT_ADDR + AGENT_VAULT_VAULT skips the admin-session/mint flow, and the token is validated against /discover once at startup. It also renames AGENT_VAULT_SESSION_TOKENAGENT_VAULT_TOKEN (with a one-time deprecation alias), updates the TS SDK, both skill files, all three env-var doc surfaces, and ships a new docs/guides/deploy-agent-container.mdx. 28 files, mixing Go CLI logic, MDX docs, embedded skill files, and SDK changes.

Security risks

Direct: the new agent-mode path widens what AGENT_VAULT_TOKEN accepts (vault-scoped session or long-lived agent token), and the env-supplied token is now what vault run execs the child with. The validateEnvToken startup probe and the new vault-mismatch check are sensible guardrails. The unaddressed env-de-dup bug (#2 above) is the highest-impact issue — POSIX getenv first-match means a stale parent AGENT_VAULT_VAULT silently overrides --vault, routing prod traffic at the wrong vault with no diagnostic. Not an exfil path, but a quiet wrong-credential-routing footgun.

Level of scrutiny

High. Auth-token handling, exec wrapper, security-sensitive defaults, and the canonical operator-facing recipe for the headline feature. The same broken Dockerfile (unpublished GHCR image, wrong source path) is duplicated into the two skill files that load into Claude Code / Cursor skill registries, propagating downstream.

Other factors

The author explicitly requested re-review with @claude review after 13dcff1, so a clear status update on what's still outstanding is warranted. The fact that the prior inline threads show resolved=true while the code paths are unchanged suggests the author may believe those were fixed — flagging that mismatch is the new, useful information a human review needs to act on. Test coverage for the new agent-mode helpers and validateEnvToken is solid; the gaps are in the call-site integrations and the doc/skill recipes.

Comment thread docs/guides/deploy-agent-container.mdx
Comment thread cmd/helpers.go
dangtony98 added 2 commits May 8, 2026 22:53
- Send X-Vault on `vault discover` and `vault proposal create` so
  instance-level agent tokens (which carry no baked-in vault) work end
  to end — without it, the very first commands the skill files instruct
  return HTTP 400 from the broker.
- Strip pre-existing AGENT_VAULT_* keys from the inherited env before
  appending the freshly resolved values. POSIX getenv returns the first
  match, so a stale parent AGENT_VAULT_VAULT would otherwise silently
  override --vault. Mirrors the existing mitmInjectedKeys fix.
- Reword the vault-mismatch error so it's accurate when the requested
  vault came from --vault (not just AGENT_VAULT_VAULT) and the
  remediation leads to a working invocation.
- Fix the deploy-container Dockerfile in the guide and skill files: the
  source was the unpublished `ghcr.io/infisical/agent-vault` image at
  the wrong path; switch to the actually-published Docker Hub image
  (`infisical/agent-vault:latest`) at `/usr/local/bin/agent-vault`.
- Drop fabricated `agent create` / `agent token <id>` references from
  the deploy guide and skills — those subcommands don't exist. The
  deploy guide now treats the long-lived agent token as a prerequisite
  and links to docs/agents/overview for how operators mint it.
"claude-code laptop" was distracting; "claude-code" is the more typical
example agents will identify themselves with.
Comment thread cmd/helpers.go
Comment thread docs/guides/deploy-agent-container.mdx Outdated
Comment thread docs/guides/deploy-agent-container.mdx Outdated
Comment thread cmd/skill_cli.md Outdated
Fixes from the four unresolved PR threads:
- cmd/helpers.go: missing-addr error names the actual env var the user set
  (AGENT_VAULT_TOKEN vs the legacy AGENT_VAULT_SESSION_TOKEN alias).
- cmd/run.go, docs/guides/deploy-agent-container.mdx, .env.example: drop
  the misleading "for agent tokens" qualifier on AGENT_VAULT_VAULT;
  resolveVaultForAgentMode requires it for both token types in agent mode.
- docs/guides/deploy-agent-container.mdx: point token revocation at the
  Agents tab + agent-vault agent rotate/delete <name>; the per-vault
  Tokens tab filters out agent-owned sessions.
- cmd/skill_cli.md, cmd/skill_http.md: docs.infisical.com -> docs.agent-vault.dev.

Additional accuracy fixes from a full audit of the PR's docs vs code:
- AGENTS.md: X-Vault is control-plane only (/discover, /v1/proposals);
  proxy traffic communicates vault via Proxy-Authorization userinfo.
- docs/agents/protocol.mdx: same — drop "credentials" from the X-Vault
  list; credential handlers read ?vault= query / body, not X-Vault.
- cmd/skill_http.md: server only reads X-Vault, never AGENT_VAULT_VAULT;
  rewrite the note to make the client's responsibility explicit.
- cmd/skill_http.md: DELETE /v1/sessions/{public_id} -> {id} to match
  the route template and the response JSON field name.
- docs/guides/deploy-agent-container.mdx: rotation invalidates old
  tokens at redeem time, not at "agent rotate" time.
Comment thread docs/agents/overview.mdx
Comment thread cmd/run.go Outdated
- docs/agents/overview.mdx: trim X-Vault scope to /discover and /v1/proposals
  (matches the wording in protocol.mdx and AGENTS.md after the previous
  commit; /v1/credentials reads vault from ?vault= query / JSON body, not
  X-Vault, so the broker has never enforced X-Vault on that endpoint).

- cmd/{run.go,helpers.go}: thread tokenSource through resolveSession into
  the --ttl-rejection and broker-rejection errors so they name the env var
  the user actually set (AGENT_VAULT_TOKEN or the deprecated alias) — same
  symmetry fix as the missing-addr error reworded earlier. Replaces the
  fromEnv bool return on resolveSession with the tokenSource string ("" in
  human mode); call sites branch on `tokenSource != ""`.
Comment thread cmd/discover.go Outdated
The X-Vault thread-through landed in 13dcff1, but the vault these two
commands send to the broker comes from the human-mode resolveVault, which
falls through to store.DefaultVault when nothing is configured. An
operator following the canonical container recipe with AGENT_VAULT_TOKEN
+ AGENT_VAULT_ADDR set but no AGENT_VAULT_VAULT silently sends
"X-Vault: default" — routing at the wrong vault if one named "default"
exists, or producing a confusing 404 naming a vault the user never set
otherwise. `vault run` already errors fast in this case via
resolveVaultForAgentMode; extend the same contract to its peer commands.

- cmd/run.go: add resolveVaultForCommand, an agent-mode-aware wrapper
  that picks resolveVaultForAgentMode when tokenSource != "" and falls
  back to resolveVault otherwise.
- cmd/discover.go, cmd/proposal_create.go: capture the tokenSource that
  was already returned by resolveSession (and previously discarded), and
  resolve the vault through the new helper.
- cmd/discover.go Long help + docs/reference/cli.mdx: name the agent-mode
  AGENT_VAULT_VAULT requirement, parallel to cmd/run.go:42-52.
Comment thread docs/guides/connect-custom-agent.mdx
Comment thread cmd/discover.go
Comment thread cmd/skill_cli.md
Comment thread cmd/proposal_create.go
- docs/guides/connect-custom-agent.mdx + docs/agents/protocol.mdx: narrow
  the X-Vault scope wording from "every vault-scoped request" to
  "/discover and /v1/proposals", matching the wording the previous round
  applied to overview.mdx, AGENTS.md, and protocol.mdx:37. Only those two
  endpoints' handlers route through resolveVaultForSession; other
  vault-scoped endpoints (e.g. /v1/credentials) take the vault from a
  ?vault= query param or JSON body.

- cmd/discover.go Long help + docs/reference/cli.mdx (discover accordion):
  broaden "vault-scoped session" to "vault-scoped session token or
  long-lived agent token" to match the wider contract every other surface
  in this PR already documents (cmd/run.go:47, AGENTS.md:17, etc.) — the
  X-Vault thread-through landed on /discover so agent tokens work here too.

- cmd/credential.go (list/get/set/delete): extend to agent mode in lockstep
  with discover and proposal_create. Previously these called ensureSession
  unconditionally, so an operator following the new container-deploy
  recipe and running e.g. `agent-vault vault credential get DB_PASSWORD`
  hit "not logged in, run 'agent-vault auth login' first" despite a valid
  AGENT_VAULT_TOKEN sitting in env. The HTTP layer already accepts agent
  tokens with member+ role on credential reads/writes; the CLI was the
  only gate. Fix is symmetric: resolveSession + resolveVaultForCommand.

- cmd/proposal_create.go Long help + docs/reference/cli.mdx (proposal-
  create accordion): mirror the agent-mode requirement note discover
  already carries, so `--help` and the reference docs surface the
  AGENT_VAULT_VAULT requirement upfront instead of only at runtime.
Comment thread cmd/skill_cli.md
Comment thread cmd/credential.go
Comment thread cmd/skill_cli.md
- Document agent-mode vault requirement on the four credential subcommands
  (list/get/set/delete) — both Long help in cmd/credential.go and the matching
  cli.mdx accordions, mirroring the wording already on discover and
  proposal create.
- Broaden the AGENT_VAULT_VAULT row in skill_cli.md and skill_http.md to the
  same both-modes phrasing the AGENT_VAULT_TOKEN row already uses, so the
  table no longer contradicts the Containerized agent deployment block below.
- Fix the placeholder token shape in the deploy block of both skill files
  (av_agent_token_xxx → av_agt_xxx) to match newAgentToken()'s actual prefix.
Comment thread cmd/skill_http.md
Comment on lines 182 to 184
POST {AGENT_VAULT_ADDR}/v1/proposals
Authorization: Bearer {AGENT_VAULT_SESSION_TOKEN}
Authorization: Bearer {AGENT_VAULT_TOKEN}
Content-Type: application/json
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 POST /v1/proposals examples in cmd/skill_http.md (lines 182, 210, 263), AGENTS.md (line 64), and docs/agents/protocol.mdx (line 103) only show Authorization + Content-Type headers, but the prose immediately above each example (skill_http.md:98, AGENTS.md:24, protocol.mdx:30/37) — narrowed by this PR — explicitly states that instance-level agent tokens must include X-Vault on /v1/proposals. The /discover examples in the same files DO include X-Vault, establishing a convention the proposal blocks violate. Fix: add X-Vault: {vault_name} to each proposal example header block, mirroring the discover examples above them.

Extended reasoning...

What is wrong

This PR explicitly narrows the X-Vault scope wording in three closely-related agent-facing docs to specifically name /v1/proposals as a control-plane endpoint that requires the header for instance-level agent tokens:

  • cmd/skill_http.md:98 — "Instance-level agent tokens (persistent agents) must include the X-Vault: {vault_name} header on every control-plane call (/discover, /v1/proposals)"
  • AGENTS.md:24 — "you must include X-Vault: {vault_name} on all control-plane requests (/discover, /v1/proposals)"
  • docs/agents/protocol.mdx:30/37 — "must include an X-Vault header on /discover and /v1/proposals requests"

But every POST /v1/proposals example block in these same three files only shows Authorization + Content-Type headers — X-Vault is missing:

  • cmd/skill_http.md:182-184, 210-212, 263-265 (three proposal examples)
  • AGENTS.md:64-66
  • docs/agents/protocol.mdx:103-106

By contrast, the /discover examples in the same files DO include X-Vault (skill_http.md:92-95, AGENTS.md:30-33, protocol.mdx:53-56), establishing a convention the proposal blocks violate.

Why this matters

The PR is precisely what introduces the asymmetry. The diff hits the Authorization line in every one of these proposal examples (the AGENT_VAULT_SESSION_TOKENAGENT_VAULT_TOKEN rename), and the prose narrowing was applied at the same time — so the natural lockstep edit was already in flight, and adding X-Vault to each example would have been a one-line per-example change.

Functionally, an instance-level agent token without X-Vault on POST /v1/proposals is rejected by the broker. internal/server/vault_resolve.go:31-35:

vaultName := r.Header.Get("X-Vault")
if vaultName == "" {
    jsonError(w, http.StatusBadRequest, "Agent tokens require X-Vault header to specify which vault to use")
    return nil, "", fmt.Errorf("missing X-Vault header")
}

POST /v1/proposals routes through resolveVaultForSession (per the resolved bug already addressed for cmd/proposal_create.go), so the broker enforcement is unambiguous. Vault-scoped session tokens still work because the resolver short-circuits on sess.VaultID != "", but the agent-token path — newly first-class for these docs after this PR — breaks.

Step-by-step proof

Following the canonical container-deploy recipe this PR documents:

export AGENT_VAULT_TOKEN=av_agt_xxx          # instance-level agent token
export AGENT_VAULT_VAULT=production
  1. Agent reads cmd/skill_http.md (or AGENTS.md, or protocol.mdx) and copies the POST /v1/proposals example verbatim — only Authorization and Content-Type headers on the wire.
  2. Broker's resolveVaultForSession (vault_resolve.go:31-35) inspects the request: sess.AgentID != "", sess.VaultID == "", X-Vault == "".
  3. Broker returns HTTP 400: Agent tokens require X-Vault header to specify which vault to use.
  4. The agent has AGENT_VAULT_VAULT=production sitting in its env, but the example never told it to copy that value into X-Vault.

The skill_http.md note added by this PR (line 88) does say the agent is responsible for copying AGENT_VAULT_VAULT into X-Vault on each request — but the example two paragraphs below demonstrates the opposite shape.

Impact

Severity: nit. Doc-only, prose elsewhere in the same files is correct, attentive agents using SDKs (which set X-Vault programmatically) avoid the trap, and the broker's 400 is self-explanatory. But:

  • These three files are the agent-facing protocol references — skill_http.md and AGENTS.md ship as auto-installed Claude Code / Cursor skill registry entries via maybeInstallSkills (cmd/run.go), so the inconsistency propagates onto every developer machine.
  • A developer writing a custom agent who copies the literal example as a starting point will hit the rejection.
  • This PR is what introduces the asymmetry — pre-PR, the broader "every vault-scoped request" wording was at least defensible against the example; the narrowed wording explicitly contradicts the example.

Suggested fix

One-line addition to each proposal example header block, mirroring the discover examples right above them:

  POST {AGENT_VAULT_ADDR}/v1/proposals
  Authorization: Bearer {AGENT_VAULT_TOKEN}
+ X-Vault: {vault_name}
  Content-Type: application/json

Apply at:

  • cmd/skill_http.md:182-184, 210-212, 263-265
  • AGENTS.md:64-66
  • docs/agents/protocol.mdx:103-106

Mechanical edit, lockstep with the X-Vault prose narrowing already in this PR.

Comment thread cmd/proposal_create.go
Comment on lines +86 to +90
// Pass the resolved vault as X-Vault so instance-level agent tokens
// (which carry no baked-in vault) can create proposals here too — the
// broker rejects agent-token POST /v1/proposals calls without it.
url := fmt.Sprintf("%s/v1/proposals", sess.Address)
respBody, err := doAdminRequestWithBody("POST", url, sess.Token, reqBody)
respBody, err := doVaultScopedRequestWithBody("POST", url, sess.Token, vault, reqBody)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 When a vault-scoped session token is used with a non-matching AGENT_VAULT_VAULT (or --vault), vault discover and vault proposal create silently route at the session's baked-in vault — the broker ignores X-Vault for vault-scoped sessions (internal/server/vault_resolve.go:17-23), and these CLI peers don't check resp.vault against the requested vault. vault run already catches the same mismatch via validateEnvToken (cmd/helpers.go:593-599); discover and proposal create (extended in commit 609613d) skipped the symmetric check. Mirror the response-vault comparison here with the same wording — narrow scenario, but the asymmetry with vault run is what makes the gap visible.

Extended reasoning...

What is wrong

This PR (commit 609613d) threads the X-Vault header through to vault discover and vault proposal create via doVaultScopedRequestWithBody (cmd/discover.go:44, cmd/proposal_create.go:87), but neither caller verifies that the broker actually honored the X-Vault value. The broker's resolveVaultForSession (internal/server/vault_resolve.go:17-23) short-circuits to sess.VaultID for vault-scoped session tokens — X-Vault is only consulted for instance-level agent tokens. So a vault-scoped session token paired with a non-matching AGENT_VAULT_VAULT (typo, copy-paste from another env, etc.) silently lands the request at the session's baked-in vault.

By contrast, validateEnvToken (cmd/helpers.go:583-599) does perform exactly this mismatch check on the agent-mode startup probe and is what makes the vault run codepath safe:

if vault != "" && dr.Vault != "" && dr.Vault != vault {
    return fmt.Errorf("vault mismatch: requested vault %q does not match the supplied token's vault %q — set AGENT_VAULT_VAULT (or --vault) to %q, or use a token for %q",
        vault, dr.Vault, dr.Vault, vault)
}

The peer commands extended in this PR don't have an equivalent guard — they just trust the response and print it.

Step-by-step proof (proposal create — the higher-impact case)

# Mint a vault-scoped session token for vault-A
agent-vault vault token --vault vault-A   # -> av_sess_xxx

export AGENT_VAULT_TOKEN=av_sess_xxx
export AGENT_VAULT_ADDR=https://broker.example
export AGENT_VAULT_VAULT=vault-B          # typo / copy-paste mistake

agent-vault vault proposal create -m "need access"
  1. cmd/proposal_create.go:47 — resolveSession() returns the env-supplied vault-A session token, tokenSource="AGENT_VAULT_TOKEN".
  2. cmd/proposal_create.go:53 — resolveVaultForCommand(cmd, tokenSource) returns "vault-B" (from AGENT_VAULT_VAULT, agent-mode branch).
  3. cmd/proposal_create.go:87 — POST /v1/proposals with X-Vault: vault-B.
  4. Broker side: resolveVaultForSession sees sess.VaultID = "vault-A" (non-empty) → returns the vault-A row, ignoring the X-Vault: vault-B header.
  5. Proposal is created in vault-A. Response carries "vault": "vault-A".
  6. CLI prints Proposal #N created in vault "vault-A" (cmd/proposal_create.go:111).

The user believed they raised a proposal for vault-B; it was actually raised in vault-A. No error, no warning. vault discover has the same shape — the JSON output echoes vault-A while the user expected vault-B.

Why this is worth flagging despite being narrow

The mitigating factors are real: (a) the success message and --json output both echo resp.vault, so an attentive operator can spot the discrepancy; (b) for instance-level agent tokens the broker honors X-Vault and rejects mismatches naturally — only vault-scoped session tokens hit this path; (c) it requires a misconfigured env. So the practical blast radius is narrow.

But the asymmetry with vault run is what makes this visible: this PR added the response-vault comparison to the agent-mode startup probe specifically to close this footgun, and the same commit (609613d) that extended discover and proposal create to vault-scoped requests didn't extend the symmetric check. If the check is the right shape on vault run, it's the right shape on its peer commands.

How to fix

After parsing the response in both callers, mirror validateEnvToken's mismatch check. Concretely in cmd/proposal_create.go (and the symmetric spot in cmd/discover.go):

respBody, err := doVaultScopedRequestWithBody("POST", url, sess.Token, vault, reqBody)
if err != nil {
    return err
}

var resp struct {
    ID    int    `json:"id"`
    Vault string `json:"vault"`
    // ... other fields
}
if err := json.Unmarshal(respBody, &resp); err != nil {
    return fmt.Errorf("parsing response: %w", err)
}
if resp.Vault != "" && resp.Vault != vault {
    return fmt.Errorf("vault mismatch: requested vault %q does not match the supplied token's vault %q — set AGENT_VAULT_VAULT (or --vault) to %q, or use a token for %q",
        vault, resp.Vault, resp.Vault, vault)
}

Same wording validateEnvToken uses, applied to the existing parsed response. Roughly four lines per callsite.

Severity is nit matching the submitter's self-classification and all four verifier confirmations — narrow misconfig scenario, response carries the actual vault so an attentive user can spot it, but the asymmetry with vault run's response-vault check is exactly the kind of inconsistency this PR is otherwise polishing.

Comment thread cmd/run.go
Comment on lines +213 to +216
// gosec G702: this is an exec wrapper — the whole purpose is to run a
// user-specified binary with user-specified args, identical to the
// rationale for the G204 exclusion in .golangci.yml.
return syscall.Exec(binary, args, env) //nolint:gosec
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Pre-existing comment-only nit: the two new //nolint:gosec annotations cite gosec rule numbers that don't exist. cmd/run.go:213 says gosec G702 (the body correctly cites G204, the actual rule for syscall.Exec) and cmd/run.go:533 says gosec G703 (the body correctly cites G304, the rule for taint-based file paths). gosec has no G7xx range beyond G701/G704, so a future maintainer can't look these up. //nolint:gosec is a catch-all so the suppression still works — easy fix is to align the headers with the bodies (G204 and G304).

Extended reasoning...

What the bug is

The two new //nolint:gosec annotations introduced in this PR cite gosec rule numbers that do not exist in gosec's published rule catalog:

  • cmd/run.go:213 — header reads gosec G702: this is an exec wrapper (around syscall.Exec(binary, args, env)). The actual gosec rule that flags subprocess execution with variable arguments is G204 — and the body of that very comment correctly references G204 ("identical to the rationale for the G204 exclusion in .golangci.yml").
  • cmd/run.go:533 — header reads gosec G703: caPath is derived from... (around os.WriteFile(caPath, pem, 0o600)). The actual gosec rule for taint-based file paths is G304 — and the body of that comment correctly references G304 ("same rationale as the existing G304 exclusion in .golangci.yml").

Why the existing code does not catch this

gosec's rule namespace is structured as G1xx (general), G2xx (SQL), G3xx (file), G4xx (crypto), G5xx (blocklists), G6xx (slice). The G7xx range is sparse — only G701 (excessive memory allocation) and the deprecated G704 (used in this repo's .golangci.yml at line 28 for SSRF). G702 and G703 are not gosec rules, never have been. The //nolint:gosec directive itself is a bare catch-all that suppresses every gosec rule, so the annotation still functions correctly — golangci-lint does not validate the human-readable comment text against the gosec rule registry.

A repo-wide grep confirms G702 and G703 appear only in these two new comment headers; everywhere else the codebase uses real rule IDs (G204, G304, G306, G704). The two new bodies themselves cite the correct rule, so the headers are inconsistent with their own bodies.

Impact

Zero functional impact — //nolint:gosec (catch-all form) suppresses gosec regardless of the human-readable comment text. The damage is purely to maintainer comprehension: a future reviewer who tries to look up G702/G703 in gosec's rule catalog (or in the existing .golangci.yml G204 # subprocess with variable / G304 # file inclusion via variable exclusion list) will not find them, and may be confused about what was actually being suppressed.

Step-by-step proof

  1. Open cmd/run.go:213-216. The comment reads gosec G702: this is an exec wrapper. The line below is return syscall.Exec(binary, args, env) //nolint:gosec.
  2. Look up G702 in gosec's rule catalog — no such rule exists. The rule that fires on syscall.Exec(binary, args, env) is G204 ("Audit use of command execution").
  3. Open .golangci.yml lines 26-27: G204 # subprocess with variable — intentional exec wrapper / self-daemonize — confirming G204 is the real rule, and the body of the cmd/run.go:213 comment correctly names it.
  4. Open cmd/run.go:533-536. The comment reads gosec G703: caPath is derived from --mitm-ca flag or $HOME.... The line below is os.WriteFile(caPath, pem, 0o600) //nolint:gosec.
  5. Look up G703 in gosec — no such rule. The rule for taint-based file paths is G304 ("File path provided as taint input").
  6. Open .golangci.yml line 27: G304 # file inclusion via variable — intentional pidfile / session reading — confirming G304 is the real rule, and the body of the cmd/run.go:533 comment correctly names it.

Fix

One-line edit at each header. Replace G702 with G204 at cmd/run.go:213, and replace G703 with G304 at cmd/run.go:533. The bodies already name the correct rules, so the fix is purely aligning the header to the body.

@dangtony98 dangtony98 merged commit b4a832a into main May 9, 2026
10 checks passed
@dangtony98 dangtony98 deleted the run-agent-mode branch May 9, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants