Add anonymous usage telemetry collection#66
Conversation
Model after the Supabase CLI: ship a write-only Datadog client token
(pub…, embeddable by design — deliberately committed and allowlisted in
gitleaks; never an API key) and send one allow-listed event per command
run to the Datadog logs intake.
- aai_cli/telemetry.py: consent (AAI_TELEMETRY_DISABLED / DO_NOT_TRACK /
persisted choice), event build (command path, outcome class, exit code,
duration, version/OS/CI — never args, paths, or account data), and a
detached flusher subprocess so commands never wait on telemetry; all
send-side failures are swallowed.
- context.run_command wraps each command body in telemetry.track(), so
CLIError outcomes keep their machine-readable error_type.
- config.py: random anonymous device_id + telemetry_enabled persisted in
config.toml.
- aai telemetry status/enable/disable: the user-facing consent surface,
documented in README ("Telemetry") with the full opt-out story.
- Test suite blanks the shipped token via an autouse fixture so no test
spawns a real flusher; telemetry paths are covered by dedicated unit,
command, and run_command-integration tests.
https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE
| # `phc_` keys ship in open-source CLIs. An API key (account secret) must never | ||
| # appear here. Rotate in Datadog (Organization Settings → Client Tokens) if abused; | ||
| # AAI_TELEMETRY_CLIENT_TOKEN overrides without a release. | ||
| SHIPPED_CLIENT_TOKEN = "pub0d633113b9f7d22faff215fefaf30b43" |
There was a problem hiding this comment.
Exposed secret in aai_cli/telemetry.py - high severity
Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More Info
There was a problem hiding this comment.
@AikidoSec ignore: This is a Datadog client token (pub…) — the write-only credential class Datadog designs to be embedded in shipped client apps. It can submit telemetry events but cannot read any data, so it is not an account secret. It is committed deliberately (the same model as PostHog phc_ keys in open-source CLIs, e.g. the Supabase CLI) and is allowlisted by exact value in .gitleaks.toml; any other real-looking token still fails the secret-scan gate.
Generated by Claude Code
There was a problem hiding this comment.
✅ Based on your feedback, we ignored this issue because of the following reason:
This is a Datadog client token (
pub…) — the write-only credential class Datadog designs to be embedded in shipped client apps. It can submit telemetry events but cannot read any data, so it is not an account secret. It is committed deliberately (the same model as PostHogphc_keys in open-source CLIs, e.g. the Supabase CLI) and is allowlisted by exact value in.gitleaks.toml; any other real-looking token still fails the secret-scan gate.
Generated by Claude Code
| # The committed credential must be a Datadog *client* token (pub…, write-only, | ||
| # embeddable by design) — never an API key. The autouse fixture blanks it for | ||
| # the suite and hands back the real value for exactly this assertion. | ||
| assert neutralize_shipped_token == "pub0d633113b9f7d22faff215fefaf30b43" |
There was a problem hiding this comment.
Exposed secret in tests/test_telemetry.py - low severity
Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More Info
There was a problem hiding this comment.
@AikidoSec ignore: Same value as the finding in aai_cli/telemetry.py — a write-only Datadog client token (pub…), committed deliberately and allowlisted by exact value in .gitleaks.toml. The test asserts the shipped constant is exactly this public client token (and never an API-key shape).
Generated by Claude Code
There was a problem hiding this comment.
✅ Based on your feedback, we ignored this issue because of the following reason:
Same value as the finding in
aai_cli/telemetry.py— a write-only Datadog client token (pub…), committed deliberately and allowlisted by exact value in.gitleaks.toml. The test asserts the shipped constant is exactly this public client token (and never an API-key shape).
Generated by Claude Code
| # network. The payload travels via argv (the Vercel CLI hands its flusher the | ||
| # serialized request the same way); it carries only the event + the write-only | ||
| # public token, so argv visibility is acceptable. | ||
| _FLUSH_SNIPPET = "from aai_cli import telemetry; telemetry.flush_argv()" |
There was a problem hiding this comment.
Spawning a detached Python subprocess via a constructed -c snippet and discarding stdio hides runtime behavior; consider avoiding dynamic -c execution or make the subprocess invocation explicit and visible to reviewers.
Details
✨ AI Reasoning
The change introduces a detached flusher subprocess that runs Python code via the command-line -c snippet and discards stdio. The payload is passed as an argv string and then parsed in the child. Using subprocess.Popen with start_new_session=True and redirecting stdout/stderr to DEVNULL hides the child process's activity from the user and logs. Executing a dynamically-constructed snippet with -c increases the difficulty of static review and can conceal side effects. These patterns (dynamic -c execution, payload via argv, suppressed stdio, detached session) were not present before and were added by this diff. This raises a transparency/obfuscation concern under the rule: intent of that runtime behavior is hidden from observers and reviewers.
🔧 How do I fix it?
Ensure code is transparent and not intentionally obfuscated. Avoid hiding functionality from code review. Focus on intent and deception, not specific patterns.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| def dispatch(event: Mapping[str, object]) -> None: | ||
| """Hand one event to a detached flusher process and return immediately.""" | ||
| payload = json.dumps({"url": intake_url(), "token": client_token(), "event": event}) | ||
| subprocess.Popen( |
There was a problem hiding this comment.
Dispatch uses subprocess.Popen to run a hidden flusher (-c snippet) with the payload passed via argv and stdio discarded; this conceals execution and data flow from reviewers.
Details
✨ AI Reasoning
dispatch() now serializes event+token into JSON and passes it to a detached child via process argv (subprocess.Popen([... , '-c', _FLUSH_SNIPPET, payload], ...)). Passing sensitive or operational data via dynamically passed argv to an -c snippet and running it with stdio suppressed obscures what is being sent and executed. While the payload here is intended to be public telemetry, this pattern was added by the PR and introduces an obfuscation-like execution path that makes review and auditing harder.
🔧 How do I fix it?
Ensure code is transparent and not intentionally obfuscated. Avoid hiding functionality from code review. Focus on intent and deception, not specific patterns.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| ) | ||
|
|
||
|
|
||
| def flush_argv() -> None: |
There was a problem hiding this comment.
flush_argv() reads and executes serialized payloads from sys.argv[1], invoked by a dynamically-executed -c snippet; this runtime indirection conceals the request flow and complicates auditability.
Details
✨ AI Reasoning
flush_argv() reads sys.argv[1] (the payload) and then loads and posts it. This function is intended to be called by the dynamically-executed -c snippet in the detached child. That dynamic invocation pattern (assembling a snippet referencing module functions and passing a serialized payload through argv) is a deliberate runtime indirection added by this diff, which obscures the actual network request flow from casual code review.
🔧 How do I fix it?
Ensure code is transparent and not intentionally obfuscated. Avoid hiding functionality from code review. Focus on intent and deception, not specific patterns.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
_fake_ytdlp replaces sys.modules["yt_dlp"] with a non-package namespace, so the --download-sections path's `from yt_dlp.utils import …` only resolved when an earlier test had already imported the real module — passing or failing with pytest-randomly's seed. Cache the real yt_dlp.utils first so the import is deterministic. https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE
Address the Aikido review findings on the dispatch path: spawning a dynamically-constructed `python -c` snippet is needlessly opaque. The detached flusher is now the CLI's own hidden `telemetry flush` command (the same shape the Vercel CLI uses) — an explicit, reviewable entry point. The child env sets AAI_TELEMETRY_DISABLED=1 so a flush can never spawn another flusher, and a test pins that `flush` stays hidden from help. https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE
Resolves the tests/test_youtube.py conflict by taking main's version of the flake fix (both sides fixed the same order-dependent yt_dlp.utils import identically, differing only in the comment), keeps the telemetry bullet in AGENTS.md, and follows the CLI rename through the telemetry surface: command examples, the status hint, README disclosure, and the command-path assertions (events now record "assembly telemetry status"). Internals are unchanged — the package is still aai_cli, so the detached flusher's `python -m aai_cli telemetry flush` spawn and the AAI_* telemetry env vars keep working. https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE
Adds opt-out anonymous usage telemetry to track command invocations, outcomes, and performance metrics. Telemetry is collected via a detached background process using a Datadog write-only client token, ensuring it never blocks or breaks commands.
Key Changes
Core telemetry module (
aai_cli/telemetry.py): Implements event collection, consent management, and background flushingAAI_TELEMETRY_DISABLED=1,DO_NOT_TRACK=1, oraai telemetry disablepub…) designed for embeddingTelemetry command (
aai_cli/commands/telemetry.py): User-facing consent surfaceaai telemetry status: Show whether telemetry is active and whyaai telemetry enable/disable: Persistently opt in or outConfig persistence (
aai_cli/config.py):device_id: Random UUID minted locally on first use, persisted in config.tomltelemetry_enabled: Persisted opt-out choice (None = never chosen, treated as enabled)Command integration (
aai_cli/context.py):track()context manager wraps command execution to capture outcome and durationrun_command()to track all CLI invocationsComprehensive test coverage (
tests/test_telemetry.py,tests/test_telemetry_command.py):aai telemetrycommandsDocumentation: Updated README with telemetry overview and opt-out instructions
Implementation Details
neutralize_shipped_tokenfixture that blanks the shipped token suite-wide, allowing tests to opt in explicitlysubprocess.Popenwithstart_new_session=Trueand stdio discarded to ensure the flusher never blocks the commandconsent_granted()returns True unless env kill-switches or persisted False choice existis_enabled()requires both a token and consent to be truehttps://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE