Summary
Follow-up to security audit issue #269 / PR #270. The audit produced 20 findings; 9 verified bugs were fixed in #270. The remaining 10 are tracked here because each requires either a breaking change with a deprecation cycle, a design decision about the threat model, or is low-value hardening that may not justify the maintenance cost.
This issue is the canonical place to discuss what to do with each. It does not block any release.
Group A — breaking change with deprecation cycle
sec-03: Storage token in --token argv visible to ps
- Today:
kbagent project add --project prod --url URL --token 901-xxx-Secret and kbagent project edit --token NEW accept the token as a CLI argument. Tokens passed this way appear in ps aux, /proc/PID/cmdline, and CI logs running set -x.
- Pinch point:
--token is the easiest interface for CI/CD scripts. Removing it cold breaks every onboarding/rotation script in the wild.
- Options:
- Deprecation cycle. Emit a stderr warning when
--token is used, document KBC_TOKEN env var and interactive prompt as alternatives. Remove the flag in 0.32.0.
- Soft mitigation. Read the token, then zero it out from
sys.argv (the git/gpg pattern). Visible in ps only for the brief window before the zero-out — practical attackers are still blocked.
- Wontfix. Document the limitation in the security model section of the docs.
- Recommendation: Option 1, with option 2 as a backstop in the same release. Both are non-breaking by themselves; the breaking removal is one minor version away.
- Affected files:
src/keboola_agent_cli/commands/project.py:152-154
sec-14: Project alias not validated for env-var-safe characters
- Today:
kbagent project add --project "test=alias" registers a project. Master-token resolution at services/sharing_service.py:51 then constructs KBC_MASTER_TOKEN_TEST=ALIAS, which is not a valid env var name, so os.environ.get() silently returns None and falls back to the global KBC_MASTER_TOKEN. The user expects per-project token isolation but doesn't get it.
- Pinch point: Existing users may have aliases like
proj.1, team/x, or acme-test=v2. Strict validation rejects them; their workspaces would suddenly need migration.
- Options:
- Validate-only-new. Require
^[a-zA-Z0-9_-]+$ for new aliases at project add time. Existing aliases load with a deprecation warning at startup.
- Validate everywhere. Reject load of any config.json with an invalid alias. Easier code but breaks workflows.
- Document only. Add a "Naming conventions" doc section, no code change.
- Recommendation: Option 1.
- Affected files:
src/keboola_agent_cli/commands/project.py:add_project(), src/keboola_agent_cli/services/sharing_service.py:51
Group B — design discussion (threat model)
sec-09: permissions reset / permissions set PTY bypass
- Today:
_require_interactive_confirmation() at commands/permissions.py:39-40 checks sys.stdin.isatty() to prevent non-interactive bypass. The intent (per the inline comment) is "prevents AI agents from removing the policy programmatically." But an AI agent can wrap kbagent in a PTY (pty.openpty(), script, expect, ptyprocess) — isatty() then returns True, the agent reads the random code from stderr, writes it to stdin, and the guard is bypassed.
- Threat model question: Is this guard supposed to be (a) a friction-only deterrent, or (b) a hard lockout? The current code reads as (b) but only delivers (a).
- Options:
- Document as best-effort. Update the doc comment to say "deters non-interactive bypass; not a hard lockout against an attacker with shell access in the same user context." Add note to
permissions set output: "Note: an attacker who can run kbagent in a PTY can bypass this confirmation."
- Add second factor. Require sudo password (Linux/macOS), Keychain auth (macOS), or a hardware token. All complex, OS-specific, and can be wrapped too.
- Detect parent process. Walk up
/proc/PID/comm and accept only known interactive shells (bash, zsh, fish, tcsh). Adds platform code; pty/expect can fake parent process.
- Wontfix. Accept the limitation; in practice an AI agent with shell access can also
rm -rf the user's home dir, so the threat model is incoherent.
- Recommendation: Option 1 plus option 4's reasoning in the doc. The whole
permissions system is friction designed to prevent AI agent typo / oversight, not a malicious attacker — that's already explicit in the design but not in the bypass guard's wording.
- Affected files:
src/keboola_agent_cli/commands/permissions.py:39-67
sec-13: --input @file reads arbitrary path
- Today:
kbagent encrypt values --input @/home/user/.ssh/id_rsa reads the SSH private key, sends it to Keboola Encryption API, stores the ciphertext in the project. Same pattern in kbagent tool call --input @PATH. An AI agent with shell-execution permission can use this to exfiltrate any file the user can read.
- Threat model question: What's the threat? An AI agent that already has shell access can
cat /home/user/.ssh/id_rsa directly — the kbagent path doesn't make exfiltration possible, just routes it through Keboola.
- Options:
- Restrict by default.
@file accepts only paths under CWD. Override with --allow-absolute-paths for the legitimate cases (CI/CD with absolute config paths). Breaking for some scripts.
- Permission policy entry. Add
cli:read-files-outside-cwd operation. Read-only mode (init --read-only) denies it by default. Default-allow for non-locked installations.
- Document only. Add to the AI agent attack surface doc; let the operator restrict via
init --read-only .claude/settings.json.
- Recommendation: Option 2. It puts the decision in the
permissions set flow which already exists, and init --read-only automatically protects locked workspaces.
- Affected files:
src/keboola_agent_cli/commands/encrypt.py:133-137, src/keboola_agent_cli/commands/tool.py:46-54
Group C — low-value hardening (probably wontfix)
sec-10: Silent branch-name collisions in sanitize_name
- Today: A git branch named
../etc sanitizes to etc. If a Keboola branch is also named etc, both map to the same etc/ directory in the sync workspace. The collision-suffix logic at services/sync_service.py:2604 adds -{branch_id} so the manifest is consistent, but the user never sees the warning.
- Real-world impact: Low. Sanitization changes are visible in
branch-status output and the directory naming.
- Options:
- Log a stderr warning when
sanitize_name(input) != input. Easy.
- Wontfix.
- Recommendation: Option 1. Trivial to implement, helpful for debugging unexpected directory layout.
sec-12: version_cache.json non-atomic write
- Today:
_write_cache() at auto_update.py:113-125 uses cache_path.write_text() with no locking and no atomic rename. Two parallel kbagent invocations both fetch PyPI, both write the cache (last-write-wins), and may run two simultaneous uv tool upgrade processes for keboola-mcp-server.
- Real-world impact: Low. Concurrent
kbagent starts are uncommon. uv handles its own locking, so the double-upgrade is benign.
- Options:
- Atomic write (
.tmp + os.replace()) and fcntl lock. Easy. Pattern already used by config.json.
- Wontfix.
- Recommendation: Option 1 because the pattern is already there and copy-paste cheap.
sec-16: Lineage HTML loads Mermaid from CDN without SRI
- Today:
services/deep_lineage_service.py:1276 loads Mermaid.js from https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js without an integrity= attribute. A CDN compromise (or DNS hijack of jsdelivr) delivers attacker JS that runs in the user's browser when they open the lineage HTML.
- Real-world impact: Very low. jsdelivr has a strong security track record.
- Options:
- Pin version + SRI hash. Choose a Mermaid version (e.g. 11.x), compute SHA-384, embed
integrity="..." crossorigin="anonymous". Re-pin manually on Mermaid upgrades.
--self-contained flag. Embed Mermaid.js inline in the generated HTML (~500 KB per file). Slower file generation, larger artifacts.
- Wontfix. CDN compromise is industry-wide problem; user-controlled HTML files are not a high-value attack target.
- Recommendation: Option 1 if anyone asks; otherwise wontfix.
sec-17: KBAGENT_AUTO_UPDATE toggle staleness
- Today: User sets
KBAGENT_AUTO_UPDATE=false, kbagent skips both upgrade stages. Cache TTL still ticks (cache is not invalidated). User later re-enables auto-update — cache shows recent timestamp from when kbagent last thought it had checked, so a fresh check is delayed up to one TTL window.
- Real-world impact: Negligible. User who disables auto-update typically knows what they're doing.
- Options:
- Detect
false → unset transition and force a fresh check next startup. Adds state.
- Document the behavior.
- Wontfix.
- Recommendation: Option 2 in the changelog or auto-update doc.
sec-18: doctor --fix resolves uv via PATH
- Today:
services/mcp_service.py:165-187 and services/doctor_service.py use shutil.which("uv"). On a compromised PATH, an attacker's uv binary is invoked.
- Real-world impact: If PATH is compromised, the attacker can also poison
kbagent itself, or any other command the user runs. The threat is wider than this single call site.
- Options:
- Pin standard install paths (
~/.cargo/bin/uv, /usr/local/bin/uv, ~/.local/bin/uv) before falling back to PATH.
- Verify uv binary checksum against a pinned hash. Burdensome.
- Wontfix. Documentation in the security model.
- Recommendation: Option 1 for friendlier behavior on misconfigured PATH; not for security. Or wontfix.
Recommended priority order
| Order |
What |
Effort |
Justification |
| 1 |
sec-10, sec-12 |
1 hour each |
Cheap hardening, no design debate |
| 2 |
sec-03 deprecation warning + argv-zero |
half day |
Step 1 of two-step token-flag deprecation |
| 3 |
sec-09 doc update |
30 min |
Set correct expectation about the guard |
| 4 |
sec-13 + permission policy entry |
1 day |
Put decision in permissions set flow |
| 5 |
sec-14 validate-only-new |
2 hours |
Defensive UX |
| 6 |
sec-16 SRI / sec-17 doc / sec-18 path-prefer |
2 hours each |
Belt-and-suspenders |
If we ship them, version bumps land naturally: 1+2 in 0.30.6, 3+4 in 0.31.0 (deprecation removal of --token and policy entry are both API changes), 5+6 as we have time.
Or a single "security audit follow-ups" PR that does 1, 3, 5, 6 in one go and leaves 2+4 as separate design issues.
Audit artifacts
Original findings doc: /tmp/kbagent-audit/issues.md (kept locally; stale finding numbers preserved here).
State machine context: /tmp/kbagent-audit/state-machine.md documents find_sync_workspace, cleanup_branch_id_from_mapping, _resolve_branch_id, apply_firewall_flags — see if implementing any of the above touches those.
Summary
Follow-up to security audit issue #269 / PR #270. The audit produced 20 findings; 9 verified bugs were fixed in #270. The remaining 10 are tracked here because each requires either a breaking change with a deprecation cycle, a design decision about the threat model, or is low-value hardening that may not justify the maintenance cost.
This issue is the canonical place to discuss what to do with each. It does not block any release.
Group A — breaking change with deprecation cycle
sec-03: Storage token in
--tokenargv visible topskbagent project add --project prod --url URL --token 901-xxx-Secretandkbagent project edit --token NEWaccept the token as a CLI argument. Tokens passed this way appear inps aux,/proc/PID/cmdline, and CI logs runningset -x.--tokenis the easiest interface for CI/CD scripts. Removing it cold breaks every onboarding/rotation script in the wild.--tokenis used, documentKBC_TOKENenv var and interactive prompt as alternatives. Remove the flag in 0.32.0.sys.argv(the git/gpg pattern). Visible inpsonly for the brief window before the zero-out — practical attackers are still blocked.src/keboola_agent_cli/commands/project.py:152-154sec-14: Project alias not validated for env-var-safe characters
kbagent project add --project "test=alias"registers a project. Master-token resolution atservices/sharing_service.py:51then constructsKBC_MASTER_TOKEN_TEST=ALIAS, which is not a valid env var name, soos.environ.get()silently returnsNoneand falls back to the globalKBC_MASTER_TOKEN. The user expects per-project token isolation but doesn't get it.proj.1,team/x, oracme-test=v2. Strict validation rejects them; their workspaces would suddenly need migration.^[a-zA-Z0-9_-]+$for new aliases atproject addtime. Existing aliases load with a deprecation warning at startup.src/keboola_agent_cli/commands/project.py:add_project(),src/keboola_agent_cli/services/sharing_service.py:51Group B — design discussion (threat model)
sec-09:
permissions reset/permissions setPTY bypass_require_interactive_confirmation()atcommands/permissions.py:39-40checkssys.stdin.isatty()to prevent non-interactive bypass. The intent (per the inline comment) is "prevents AI agents from removing the policy programmatically." But an AI agent can wrap kbagent in a PTY (pty.openpty(),script,expect,ptyprocess) —isatty()then returns True, the agent reads the random code from stderr, writes it to stdin, and the guard is bypassed.permissions setoutput: "Note: an attacker who can run kbagent in a PTY can bypass this confirmation."/proc/PID/command accept only known interactive shells (bash, zsh, fish, tcsh). Adds platform code; pty/expect can fake parent process.rm -rfthe user's home dir, so the threat model is incoherent.permissionssystem is friction designed to prevent AI agent typo / oversight, not a malicious attacker — that's already explicit in the design but not in the bypass guard's wording.src/keboola_agent_cli/commands/permissions.py:39-67sec-13:
--input @filereads arbitrary pathkbagent encrypt values --input @/home/user/.ssh/id_rsareads the SSH private key, sends it to Keboola Encryption API, stores the ciphertext in the project. Same pattern inkbagent tool call --input @PATH. An AI agent with shell-execution permission can use this to exfiltrate any file the user can read.cat /home/user/.ssh/id_rsadirectly — the kbagent path doesn't make exfiltration possible, just routes it through Keboola.@fileaccepts only paths under CWD. Override with--allow-absolute-pathsfor the legitimate cases (CI/CD with absolute config paths). Breaking for some scripts.cli:read-files-outside-cwdoperation. Read-only mode (init --read-only) denies it by default. Default-allow for non-locked installations.init --read-only.claude/settings.json.permissions setflow which already exists, andinit --read-onlyautomatically protects locked workspaces.src/keboola_agent_cli/commands/encrypt.py:133-137,src/keboola_agent_cli/commands/tool.py:46-54Group C — low-value hardening (probably wontfix)
sec-10: Silent branch-name collisions in
sanitize_name../etcsanitizes toetc. If a Keboola branch is also namedetc, both map to the sameetc/directory in the sync workspace. The collision-suffix logic atservices/sync_service.py:2604adds-{branch_id}so the manifest is consistent, but the user never sees the warning.branch-statusoutput and the directory naming.sanitize_name(input) != input. Easy.sec-12:
version_cache.jsonnon-atomic write_write_cache()atauto_update.py:113-125usescache_path.write_text()with no locking and no atomic rename. Two parallelkbagentinvocations both fetch PyPI, both write the cache (last-write-wins), and may run two simultaneousuv tool upgradeprocesses for keboola-mcp-server.kbagentstarts are uncommon. uv handles its own locking, so the double-upgrade is benign..tmp+os.replace()) andfcntllock. Easy. Pattern already used byconfig.json.sec-16: Lineage HTML loads Mermaid from CDN without SRI
services/deep_lineage_service.py:1276loads Mermaid.js fromhttps://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.jswithout anintegrity=attribute. A CDN compromise (or DNS hijack of jsdelivr) delivers attacker JS that runs in the user's browser when they open the lineage HTML.integrity="..." crossorigin="anonymous". Re-pin manually on Mermaid upgrades.--self-containedflag. Embed Mermaid.js inline in the generated HTML (~500 KB per file). Slower file generation, larger artifacts.sec-17:
KBAGENT_AUTO_UPDATEtoggle stalenessKBAGENT_AUTO_UPDATE=false, kbagent skips both upgrade stages. Cache TTL still ticks (cache is not invalidated). User later re-enables auto-update — cache shows recent timestamp from when kbagent last thought it had checked, so a fresh check is delayed up to one TTL window.false → unsettransition and force a fresh check next startup. Adds state.sec-18:
doctor --fixresolvesuvvia PATHservices/mcp_service.py:165-187andservices/doctor_service.pyuseshutil.which("uv"). On a compromised PATH, an attacker'suvbinary is invoked.kbagentitself, or any other command the user runs. The threat is wider than this single call site.~/.cargo/bin/uv,/usr/local/bin/uv,~/.local/bin/uv) before falling back to PATH.Recommended priority order
permissions setflowIf we ship them, version bumps land naturally: 1+2 in 0.30.6, 3+4 in 0.31.0 (deprecation removal of
--tokenand policy entry are both API changes), 5+6 as we have time.Or a single "security audit follow-ups" PR that does 1, 3, 5, 6 in one go and leaves 2+4 as separate design issues.
Audit artifacts
Original findings doc:
/tmp/kbagent-audit/issues.md(kept locally; stale finding numbers preserved here).State machine context:
/tmp/kbagent-audit/state-machine.mddocumentsfind_sync_workspace,cleanup_branch_id_from_mapping,_resolve_branch_id,apply_firewall_flags— see if implementing any of the above touches those.