Security audit follow-ups: 10 deferred findings (sec-03, sec-09, sec-10, sec-12, sec-13, sec-14, sec-16, sec-17, sec-18)

## Summary

Follow-up to security audit issue #269 / PR #270. The audit produced 20 findings; 9 verified bugs were fixed in #270. The remaining 10 are tracked here because each requires either a breaking change with a deprecation cycle, a design decision about the threat model, or is low-value hardening that may not justify the maintenance cost.

This issue is the canonical place to discuss what to do with each. It does not block any release.

---

## Group A — breaking change with deprecation cycle

### sec-03: Storage token in `--token` argv visible to `ps`

- **Today:** `kbagent project add --project prod --url URL --token 901-xxx-Secret` and `kbagent project edit --token NEW` accept the token as a CLI argument. Tokens passed this way appear in `ps aux`, `/proc/PID/cmdline`, and CI logs running `set -x`.
- **Pinch point:** `--token` is the easiest interface for CI/CD scripts. Removing it cold breaks every onboarding/rotation script in the wild.
- **Options:**
  1. **Deprecation cycle.** Emit a stderr warning when `--token` is used, document `KBC_TOKEN` env var and interactive prompt as alternatives. Remove the flag in 0.32.0.
  2. **Soft mitigation.** Read the token, then zero it out from `sys.argv` (the git/gpg pattern). Visible in `ps` only for the brief window before the zero-out — practical attackers are still blocked.
  3. **Wontfix.** Document the limitation in the security model section of the docs.
- **Recommendation:** Option 1, with option 2 as a backstop in the same release. Both are non-breaking by themselves; the breaking removal is one minor version away.
- **Affected files:** `src/keboola_agent_cli/commands/project.py:152-154`

### sec-14: Project alias not validated for env-var-safe characters

- **Today:** `kbagent project add --project "test=alias"` registers a project. Master-token resolution at `services/sharing_service.py:51` then constructs `KBC_MASTER_TOKEN_TEST=ALIAS`, which is not a valid env var name, so `os.environ.get()` silently returns `None` and falls back to the global `KBC_MASTER_TOKEN`. The user expects per-project token isolation but doesn't get it.
- **Pinch point:** Existing users may have aliases like `proj.1`, `team/x`, or `acme-test=v2`. Strict validation rejects them; their workspaces would suddenly need migration.
- **Options:**
  1. **Validate-only-new.** Require `^[a-zA-Z0-9_-]+$` for new aliases at `project add` time. Existing aliases load with a deprecation warning at startup.
  2. **Validate everywhere.** Reject load of any config.json with an invalid alias. Easier code but breaks workflows.
  3. **Document only.** Add a "Naming conventions" doc section, no code change.
- **Recommendation:** Option 1.
- **Affected files:** `src/keboola_agent_cli/commands/project.py:add_project()`, `src/keboola_agent_cli/services/sharing_service.py:51`

---

## Group B — design discussion (threat model)

### sec-09: `permissions reset` / `permissions set` PTY bypass

- **Today:** `_require_interactive_confirmation()` at `commands/permissions.py:39-40` checks `sys.stdin.isatty()` to prevent non-interactive bypass. The intent (per the inline comment) is "prevents AI agents from removing the policy programmatically." But an AI agent can wrap kbagent in a PTY (`pty.openpty()`, `script`, `expect`, `ptyprocess`) — `isatty()` then returns True, the agent reads the random code from stderr, writes it to stdin, and the guard is bypassed.
- **Threat model question:** Is this guard supposed to be (a) a friction-only deterrent, or (b) a hard lockout? The current code reads as (b) but only delivers (a).
- **Options:**
  1. **Document as best-effort.** Update the doc comment to say "deters non-interactive bypass; not a hard lockout against an attacker with shell access in the same user context." Add note to `permissions set` output: "Note: an attacker who can run kbagent in a PTY can bypass this confirmation."
  2. **Add second factor.** Require sudo password (Linux/macOS), Keychain auth (macOS), or a hardware token. All complex, OS-specific, and can be wrapped too.
  3. **Detect parent process.** Walk up `/proc/PID/comm` and accept only known interactive shells (bash, zsh, fish, tcsh). Adds platform code; pty/expect can fake parent process.
  4. **Wontfix.** Accept the limitation; in practice an AI agent with shell access can also `rm -rf` the user's home dir, so the threat model is incoherent.
- **Recommendation:** Option 1 plus option 4's reasoning in the doc. The whole `permissions` system is friction designed to prevent **AI agent typo / oversight**, not a malicious attacker — that's already explicit in the design but not in the bypass guard's wording.
- **Affected files:** `src/keboola_agent_cli/commands/permissions.py:39-67`

### sec-13: `--input @file` reads arbitrary path

- **Today:** `kbagent encrypt values --input @/home/user/.ssh/id_rsa` reads the SSH private key, sends it to Keboola Encryption API, stores the ciphertext in the project. Same pattern in `kbagent tool call --input @PATH`. An AI agent with shell-execution permission can use this to exfiltrate any file the user can read.
- **Threat model question:** What's the threat? An AI agent that already has shell access can `cat /home/user/.ssh/id_rsa` directly — the kbagent path doesn't make exfiltration possible, just routes it through Keboola.
- **Options:**
  1. **Restrict by default.** `@file` accepts only paths under CWD. Override with `--allow-absolute-paths` for the legitimate cases (CI/CD with absolute config paths). Breaking for some scripts.
  2. **Permission policy entry.** Add `cli:read-files-outside-cwd` operation. Read-only mode (`init --read-only`) denies it by default. Default-allow for non-locked installations.
  3. **Document only.** Add to the AI agent attack surface doc; let the operator restrict via `init --read-only` `.claude/settings.json`.
- **Recommendation:** Option 2. It puts the decision in the `permissions set` flow which already exists, and `init --read-only` automatically protects locked workspaces.
- **Affected files:** `src/keboola_agent_cli/commands/encrypt.py:133-137`, `src/keboola_agent_cli/commands/tool.py:46-54`

---

## Group C — low-value hardening (probably wontfix)

### sec-10: Silent branch-name collisions in `sanitize_name`

- **Today:** A git branch named `../etc` sanitizes to `etc`. If a Keboola branch is also named `etc`, both map to the same `etc/` directory in the sync workspace. The collision-suffix logic at `services/sync_service.py:2604` adds `-{branch_id}` so the manifest is consistent, but the user never sees the warning.
- **Real-world impact:** Low. Sanitization changes are visible in `branch-status` output and the directory naming.
- **Options:**
  1. Log a stderr warning when `sanitize_name(input) != input`. Easy.
  2. Wontfix.
- **Recommendation:** Option 1. Trivial to implement, helpful for debugging unexpected directory layout.

### sec-12: `version_cache.json` non-atomic write

- **Today:** `_write_cache()` at `auto_update.py:113-125` uses `cache_path.write_text()` with no locking and no atomic rename. Two parallel `kbagent` invocations both fetch PyPI, both write the cache (last-write-wins), and may run two simultaneous `uv tool upgrade` processes for keboola-mcp-server.
- **Real-world impact:** Low. Concurrent `kbagent` starts are uncommon. uv handles its own locking, so the double-upgrade is benign.
- **Options:**
  1. Atomic write (`.tmp` + `os.replace()`) and `fcntl` lock. Easy. Pattern already used by `config.json`.
  2. Wontfix.
- **Recommendation:** Option 1 because the pattern is already there and copy-paste cheap.

### sec-16: Lineage HTML loads Mermaid from CDN without SRI

- **Today:** `services/deep_lineage_service.py:1276` loads Mermaid.js from `https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js` without an `integrity=` attribute. A CDN compromise (or DNS hijack of jsdelivr) delivers attacker JS that runs in the user's browser when they open the lineage HTML.
- **Real-world impact:** Very low. jsdelivr has a strong security track record.
- **Options:**
  1. **Pin version + SRI hash.** Choose a Mermaid version (e.g. 11.x), compute SHA-384, embed `integrity="..." crossorigin="anonymous"`. Re-pin manually on Mermaid upgrades.
  2. **`--self-contained` flag.** Embed Mermaid.js inline in the generated HTML (~500 KB per file). Slower file generation, larger artifacts.
  3. **Wontfix.** CDN compromise is industry-wide problem; user-controlled HTML files are not a high-value attack target.
- **Recommendation:** Option 1 if anyone asks; otherwise wontfix.

### sec-17: `KBAGENT_AUTO_UPDATE` toggle staleness

- **Today:** User sets `KBAGENT_AUTO_UPDATE=false`, kbagent skips both upgrade stages. Cache TTL still ticks (cache is not invalidated). User later re-enables auto-update — cache shows recent timestamp from when kbagent last *thought* it had checked, so a fresh check is delayed up to one TTL window.
- **Real-world impact:** Negligible. User who disables auto-update typically knows what they're doing.
- **Options:**
  1. Detect `false → unset` transition and force a fresh check next startup. Adds state.
  2. Document the behavior.
  3. Wontfix.
- **Recommendation:** Option 2 in the changelog or auto-update doc.

### sec-18: `doctor --fix` resolves `uv` via PATH

- **Today:** `services/mcp_service.py:165-187` and `services/doctor_service.py` use `shutil.which("uv")`. On a compromised PATH, an attacker's `uv` binary is invoked.
- **Real-world impact:** If PATH is compromised, the attacker can also poison `kbagent` itself, or any other command the user runs. The threat is wider than this single call site.
- **Options:**
  1. **Pin standard install paths** (`~/.cargo/bin/uv`, `/usr/local/bin/uv`, `~/.local/bin/uv`) before falling back to PATH.
  2. **Verify uv binary checksum** against a pinned hash. Burdensome.
  3. **Wontfix.** Documentation in the security model.
- **Recommendation:** Option 1 for friendlier behavior on misconfigured PATH; not for security. Or wontfix.

---

## Recommended priority order

| Order | What | Effort | Justification |
|---|---|---|---|
| 1 | sec-10, sec-12 | 1 hour each | Cheap hardening, no design debate |
| 2 | sec-03 deprecation warning + argv-zero | half day | Step 1 of two-step token-flag deprecation |
| 3 | sec-09 doc update | 30 min | Set correct expectation about the guard |
| 4 | sec-13 + permission policy entry | 1 day | Put decision in `permissions set` flow |
| 5 | sec-14 validate-only-new | 2 hours | Defensive UX |
| 6 | sec-16 SRI / sec-17 doc / sec-18 path-prefer | 2 hours each | Belt-and-suspenders |

If we ship them, version bumps land naturally: 1+2 in 0.30.6, 3+4 in 0.31.0 (deprecation removal of `--token` and policy entry are both API changes), 5+6 as we have time.

Or a single "security audit follow-ups" PR that does 1, 3, 5, 6 in one go and leaves 2+4 as separate design issues.

## Audit artifacts

Original findings doc: `/tmp/kbagent-audit/issues.md` (kept locally; stale finding numbers preserved here).

State machine context: `/tmp/kbagent-audit/state-machine.md` documents `find_sync_workspace`, `cleanup_branch_id_from_mapping`, `_resolve_branch_id`, `apply_firewall_flags` — see if implementing any of the above touches those.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security audit follow-ups: 10 deferred findings (sec-03, sec-09, sec-10, sec-12, sec-13, sec-14, sec-16, sec-17, sec-18) #271

Summary

Group A — breaking change with deprecation cycle

sec-03: Storage token in `--token` argv visible to `ps`

sec-14: Project alias not validated for env-var-safe characters

Group B — design discussion (threat model)

sec-09: `permissions reset` / `permissions set` PTY bypass

sec-13: `--input @file` reads arbitrary path

Group C — low-value hardening (probably wontfix)

sec-10: Silent branch-name collisions in `sanitize_name`

sec-12: `version_cache.json` non-atomic write

sec-16: Lineage HTML loads Mermaid from CDN without SRI

sec-17: `KBAGENT_AUTO_UPDATE` toggle staleness

sec-18: `doctor --fix` resolves `uv` via PATH

Recommended priority order

Audit artifacts

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Order	What	Effort	Justification
1	sec-10, sec-12	1 hour each	Cheap hardening, no design debate
2	sec-03 deprecation warning + argv-zero	half day	Step 1 of two-step token-flag deprecation
3	sec-09 doc update	30 min	Set correct expectation about the guard
4	sec-13 + permission policy entry	1 day	Put decision in `permissions set` flow
5	sec-14 validate-only-new	2 hours	Defensive UX
6	sec-16 SRI / sec-17 doc / sec-18 path-prefer	2 hours each	Belt-and-suspenders

Security audit follow-ups: 10 deferred findings (sec-03, sec-09, sec-10, sec-12, sec-13, sec-14, sec-16, sec-17, sec-18) #271

Description

Summary

Group A — breaking change with deprecation cycle

sec-03: Storage token in --token argv visible to ps

sec-14: Project alias not validated for env-var-safe characters

Group B — design discussion (threat model)

sec-09: permissions reset / permissions set PTY bypass

sec-13: --input @file reads arbitrary path

Group C — low-value hardening (probably wontfix)

sec-10: Silent branch-name collisions in sanitize_name

sec-12: version_cache.json non-atomic write

sec-16: Lineage HTML loads Mermaid from CDN without SRI

sec-17: KBAGENT_AUTO_UPDATE toggle staleness

sec-18: doctor --fix resolves uv via PATH

Recommended priority order

Audit artifacts

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

sec-03: Storage token in `--token` argv visible to `ps`

sec-09: `permissions reset` / `permissions set` PTY bypass

sec-13: `--input @file` reads arbitrary path

sec-10: Silent branch-name collisions in `sanitize_name`

sec-12: `version_cache.json` non-atomic write

sec-17: `KBAGENT_AUTO_UPDATE` toggle staleness

sec-18: `doctor --fix` resolves `uv` via PATH