Skip to content

Security audit follow-ups: 10 deferred findings (sec-03, sec-09, sec-10, sec-12, sec-13, sec-14, sec-16, sec-17, sec-18) #271

@padak

Description

@padak

Summary

Follow-up to security audit issue #269 / PR #270. The audit produced 20 findings; 9 verified bugs were fixed in #270. The remaining 10 are tracked here because each requires either a breaking change with a deprecation cycle, a design decision about the threat model, or is low-value hardening that may not justify the maintenance cost.

This issue is the canonical place to discuss what to do with each. It does not block any release.


Group A — breaking change with deprecation cycle

sec-03: Storage token in --token argv visible to ps

  • Today: kbagent project add --project prod --url URL --token 901-xxx-Secret and kbagent project edit --token NEW accept the token as a CLI argument. Tokens passed this way appear in ps aux, /proc/PID/cmdline, and CI logs running set -x.
  • Pinch point: --token is the easiest interface for CI/CD scripts. Removing it cold breaks every onboarding/rotation script in the wild.
  • Options:
    1. Deprecation cycle. Emit a stderr warning when --token is used, document KBC_TOKEN env var and interactive prompt as alternatives. Remove the flag in 0.32.0.
    2. Soft mitigation. Read the token, then zero it out from sys.argv (the git/gpg pattern). Visible in ps only for the brief window before the zero-out — practical attackers are still blocked.
    3. Wontfix. Document the limitation in the security model section of the docs.
  • Recommendation: Option 1, with option 2 as a backstop in the same release. Both are non-breaking by themselves; the breaking removal is one minor version away.
  • Affected files: src/keboola_agent_cli/commands/project.py:152-154

sec-14: Project alias not validated for env-var-safe characters

  • Today: kbagent project add --project "test=alias" registers a project. Master-token resolution at services/sharing_service.py:51 then constructs KBC_MASTER_TOKEN_TEST=ALIAS, which is not a valid env var name, so os.environ.get() silently returns None and falls back to the global KBC_MASTER_TOKEN. The user expects per-project token isolation but doesn't get it.
  • Pinch point: Existing users may have aliases like proj.1, team/x, or acme-test=v2. Strict validation rejects them; their workspaces would suddenly need migration.
  • Options:
    1. Validate-only-new. Require ^[a-zA-Z0-9_-]+$ for new aliases at project add time. Existing aliases load with a deprecation warning at startup.
    2. Validate everywhere. Reject load of any config.json with an invalid alias. Easier code but breaks workflows.
    3. Document only. Add a "Naming conventions" doc section, no code change.
  • Recommendation: Option 1.
  • Affected files: src/keboola_agent_cli/commands/project.py:add_project(), src/keboola_agent_cli/services/sharing_service.py:51

Group B — design discussion (threat model)

sec-09: permissions reset / permissions set PTY bypass

  • Today: _require_interactive_confirmation() at commands/permissions.py:39-40 checks sys.stdin.isatty() to prevent non-interactive bypass. The intent (per the inline comment) is "prevents AI agents from removing the policy programmatically." But an AI agent can wrap kbagent in a PTY (pty.openpty(), script, expect, ptyprocess) — isatty() then returns True, the agent reads the random code from stderr, writes it to stdin, and the guard is bypassed.
  • Threat model question: Is this guard supposed to be (a) a friction-only deterrent, or (b) a hard lockout? The current code reads as (b) but only delivers (a).
  • Options:
    1. Document as best-effort. Update the doc comment to say "deters non-interactive bypass; not a hard lockout against an attacker with shell access in the same user context." Add note to permissions set output: "Note: an attacker who can run kbagent in a PTY can bypass this confirmation."
    2. Add second factor. Require sudo password (Linux/macOS), Keychain auth (macOS), or a hardware token. All complex, OS-specific, and can be wrapped too.
    3. Detect parent process. Walk up /proc/PID/comm and accept only known interactive shells (bash, zsh, fish, tcsh). Adds platform code; pty/expect can fake parent process.
    4. Wontfix. Accept the limitation; in practice an AI agent with shell access can also rm -rf the user's home dir, so the threat model is incoherent.
  • Recommendation: Option 1 plus option 4's reasoning in the doc. The whole permissions system is friction designed to prevent AI agent typo / oversight, not a malicious attacker — that's already explicit in the design but not in the bypass guard's wording.
  • Affected files: src/keboola_agent_cli/commands/permissions.py:39-67

sec-13: --input @file reads arbitrary path

  • Today: kbagent encrypt values --input @/home/user/.ssh/id_rsa reads the SSH private key, sends it to Keboola Encryption API, stores the ciphertext in the project. Same pattern in kbagent tool call --input @PATH. An AI agent with shell-execution permission can use this to exfiltrate any file the user can read.
  • Threat model question: What's the threat? An AI agent that already has shell access can cat /home/user/.ssh/id_rsa directly — the kbagent path doesn't make exfiltration possible, just routes it through Keboola.
  • Options:
    1. Restrict by default. @file accepts only paths under CWD. Override with --allow-absolute-paths for the legitimate cases (CI/CD with absolute config paths). Breaking for some scripts.
    2. Permission policy entry. Add cli:read-files-outside-cwd operation. Read-only mode (init --read-only) denies it by default. Default-allow for non-locked installations.
    3. Document only. Add to the AI agent attack surface doc; let the operator restrict via init --read-only .claude/settings.json.
  • Recommendation: Option 2. It puts the decision in the permissions set flow which already exists, and init --read-only automatically protects locked workspaces.
  • Affected files: src/keboola_agent_cli/commands/encrypt.py:133-137, src/keboola_agent_cli/commands/tool.py:46-54

Group C — low-value hardening (probably wontfix)

sec-10: Silent branch-name collisions in sanitize_name

  • Today: A git branch named ../etc sanitizes to etc. If a Keboola branch is also named etc, both map to the same etc/ directory in the sync workspace. The collision-suffix logic at services/sync_service.py:2604 adds -{branch_id} so the manifest is consistent, but the user never sees the warning.
  • Real-world impact: Low. Sanitization changes are visible in branch-status output and the directory naming.
  • Options:
    1. Log a stderr warning when sanitize_name(input) != input. Easy.
    2. Wontfix.
  • Recommendation: Option 1. Trivial to implement, helpful for debugging unexpected directory layout.

sec-12: version_cache.json non-atomic write

  • Today: _write_cache() at auto_update.py:113-125 uses cache_path.write_text() with no locking and no atomic rename. Two parallel kbagent invocations both fetch PyPI, both write the cache (last-write-wins), and may run two simultaneous uv tool upgrade processes for keboola-mcp-server.
  • Real-world impact: Low. Concurrent kbagent starts are uncommon. uv handles its own locking, so the double-upgrade is benign.
  • Options:
    1. Atomic write (.tmp + os.replace()) and fcntl lock. Easy. Pattern already used by config.json.
    2. Wontfix.
  • Recommendation: Option 1 because the pattern is already there and copy-paste cheap.

sec-16: Lineage HTML loads Mermaid from CDN without SRI

  • Today: services/deep_lineage_service.py:1276 loads Mermaid.js from https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js without an integrity= attribute. A CDN compromise (or DNS hijack of jsdelivr) delivers attacker JS that runs in the user's browser when they open the lineage HTML.
  • Real-world impact: Very low. jsdelivr has a strong security track record.
  • Options:
    1. Pin version + SRI hash. Choose a Mermaid version (e.g. 11.x), compute SHA-384, embed integrity="..." crossorigin="anonymous". Re-pin manually on Mermaid upgrades.
    2. --self-contained flag. Embed Mermaid.js inline in the generated HTML (~500 KB per file). Slower file generation, larger artifacts.
    3. Wontfix. CDN compromise is industry-wide problem; user-controlled HTML files are not a high-value attack target.
  • Recommendation: Option 1 if anyone asks; otherwise wontfix.

sec-17: KBAGENT_AUTO_UPDATE toggle staleness

  • Today: User sets KBAGENT_AUTO_UPDATE=false, kbagent skips both upgrade stages. Cache TTL still ticks (cache is not invalidated). User later re-enables auto-update — cache shows recent timestamp from when kbagent last thought it had checked, so a fresh check is delayed up to one TTL window.
  • Real-world impact: Negligible. User who disables auto-update typically knows what they're doing.
  • Options:
    1. Detect false → unset transition and force a fresh check next startup. Adds state.
    2. Document the behavior.
    3. Wontfix.
  • Recommendation: Option 2 in the changelog or auto-update doc.

sec-18: doctor --fix resolves uv via PATH

  • Today: services/mcp_service.py:165-187 and services/doctor_service.py use shutil.which("uv"). On a compromised PATH, an attacker's uv binary is invoked.
  • Real-world impact: If PATH is compromised, the attacker can also poison kbagent itself, or any other command the user runs. The threat is wider than this single call site.
  • Options:
    1. Pin standard install paths (~/.cargo/bin/uv, /usr/local/bin/uv, ~/.local/bin/uv) before falling back to PATH.
    2. Verify uv binary checksum against a pinned hash. Burdensome.
    3. Wontfix. Documentation in the security model.
  • Recommendation: Option 1 for friendlier behavior on misconfigured PATH; not for security. Or wontfix.

Recommended priority order

Order What Effort Justification
1 sec-10, sec-12 1 hour each Cheap hardening, no design debate
2 sec-03 deprecation warning + argv-zero half day Step 1 of two-step token-flag deprecation
3 sec-09 doc update 30 min Set correct expectation about the guard
4 sec-13 + permission policy entry 1 day Put decision in permissions set flow
5 sec-14 validate-only-new 2 hours Defensive UX
6 sec-16 SRI / sec-17 doc / sec-18 path-prefer 2 hours each Belt-and-suspenders

If we ship them, version bumps land naturally: 1+2 in 0.30.6, 3+4 in 0.31.0 (deprecation removal of --token and policy entry are both API changes), 5+6 as we have time.

Or a single "security audit follow-ups" PR that does 1, 3, 5, 6 in one go and leaves 2+4 as separate design issues.

Audit artifacts

Original findings doc: /tmp/kbagent-audit/issues.md (kept locally; stale finding numbers preserved here).

State machine context: /tmp/kbagent-audit/state-machine.md documents find_sync_workspace, cleanup_branch_id_from_mapping, _resolve_branch_id, apply_firewall_flags — see if implementing any of the above touches those.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions