Add anonymous usage telemetry collection by alexkroman · Pull Request #66 · AssemblyAI/cli

alexkroman · 2026-06-11T17:31:09Z

Adds opt-out anonymous usage telemetry to track command invocations, outcomes, and performance metrics. Telemetry is collected via a detached background process using a Datadog write-only client token, ensuring it never blocks or breaks commands.

Key Changes

Core telemetry module (aai_cli/telemetry.py): Implements event collection, consent management, and background flushing
- Collects command name, outcome, exit code, duration, CLI version, OS, Python version, CI detection, and a random device ID
- Never collects arguments, file paths, transcripts, API keys, or account data
- Delivery runs in a detached subprocess with stdio discarded, so failures are swallowed and never impact the command
- Opt-out via AAI_TELEMETRY_DISABLED=1, DO_NOT_TRACK=1, or aai telemetry disable
- Uses Datadog's browser-intake API with a public client token (pub…) designed for embedding
Telemetry command (aai_cli/commands/telemetry.py): User-facing consent surface
- aai telemetry status: Show whether telemetry is active and why
- aai telemetry enable/disable: Persistently opt in or out
- Environment kill-switches always win over persisted choice
Config persistence (aai_cli/config.py):
- device_id: Random UUID minted locally on first use, persisted in config.toml
- telemetry_enabled: Persisted opt-out choice (None = never chosen, treated as enabled)
Command integration (aai_cli/context.py):
- track() context manager wraps command execution to capture outcome and duration
- Maps CLIError types to outcome strings, handles typer.Exit codes, catches unexpected exceptions as internal_error
- Integrates with run_command() to track all CLI invocations
Comprehensive test coverage (tests/test_telemetry.py, tests/test_telemetry_command.py):
- Token/URL resolution, consent logic, event shape validation
- Detached flusher subprocess spawning and payload posting
- Command tracking with error classification and duration measurement
- Integration tests for aai telemetry commands
Documentation: Updated README with telemetry overview and opt-out instructions

Implementation Details

Telemetry is disabled by default in tests via neutralize_shipped_token fixture that blanks the shipped token suite-wide, allowing tests to opt in explicitly
Event dispatch uses subprocess.Popen with start_new_session=True and stdio discarded to ensure the flusher never blocks the command
Payload travels via argv as JSON (acceptable since it only contains the public client token and event data)
Consent model is opt-out: consent_granted() returns True unless env kill-switches or persisted False choice exist
is_enabled() requires both a token and consent to be true

https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE

Model after the Supabase CLI: ship a write-only Datadog client token (pub…, embeddable by design — deliberately committed and allowlisted in gitleaks; never an API key) and send one allow-listed event per command run to the Datadog logs intake. - aai_cli/telemetry.py: consent (AAI_TELEMETRY_DISABLED / DO_NOT_TRACK / persisted choice), event build (command path, outcome class, exit code, duration, version/OS/CI — never args, paths, or account data), and a detached flusher subprocess so commands never wait on telemetry; all send-side failures are swallowed. - context.run_command wraps each command body in telemetry.track(), so CLIError outcomes keep their machine-readable error_type. - config.py: random anonymous device_id + telemetry_enabled persisted in config.toml. - aai telemetry status/enable/disable: the user-facing consent surface, documented in README ("Telemetry") with the full opt-out story. - Test suite blanks the shipped token via an autouse fixture so no test spawns a real flusher; telemetry paths are covered by dedicated unit, command, and run_command-integration tests. https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE

aikido-pr-checks · 2026-06-11T17:31:39Z

+# `phc_` keys ship in open-source CLIs. An API key (account secret) must never
+# appear here. Rotate in Datadog (Organization Settings → Client Tokens) if abused;
+# AAI_TELEMETRY_CLIENT_TOKEN overrides without a release.
+SHIPPED_CLIENT_TOKEN = "pub0d633113b9f7d22faff215fefaf30b43"


Exposed secret in aai_cli/telemetry.py - high severity
Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More Info}

@AikidoSec ignore: This is a Datadog client token (pub…) — the write-only credential class Datadog designs to be embedded in shipped client apps. It can submit telemetry events but cannot read any data, so it is not an account secret. It is committed deliberately (the same model as PostHog phc_ keys in open-source CLIs, e.g. the Supabase CLI) and is allowlisted by exact value in .gitleaks.toml; any other real-looking token still fails the secret-scan gate.

Generated by Claude Code

✅ Based on your feedback, we ignored this issue because of the following reason:

This is a Datadog client token (pub…) — the write-only credential class Datadog designs to be embedded in shipped client apps. It can submit telemetry events but cannot read any data, so it is not an account secret. It is committed deliberately (the same model as PostHog phc_ keys in open-source CLIs, e.g. the Supabase CLI) and is allowlisted by exact value in .gitleaks.toml; any other real-looking token still fails the secret-scan gate.

Generated by Claude Code

aikido-pr-checks · 2026-06-11T17:31:40Z

+    # The committed credential must be a Datadog *client* token (pub…, write-only,
+    # embeddable by design) — never an API key. The autouse fixture blanks it for
+    # the suite and hands back the real value for exactly this assertion.
+    assert neutralize_shipped_token == "pub0d633113b9f7d22faff215fefaf30b43"


Exposed secret in tests/test_telemetry.py - low severity
Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More Info}

@AikidoSec ignore: Same value as the finding in aai_cli/telemetry.py — a write-only Datadog client token (pub…), committed deliberately and allowlisted by exact value in .gitleaks.toml. The test asserts the shipped constant is exactly this public client token (and never an API-key shape).

Generated by Claude Code

✅ Based on your feedback, we ignored this issue because of the following reason:

Same value as the finding in aai_cli/telemetry.py — a write-only Datadog client token (pub…), committed deliberately and allowlisted by exact value in .gitleaks.toml. The test asserts the shipped constant is exactly this public client token (and never an API-key shape).

Generated by Claude Code

aikido-pr-checks · 2026-06-11T17:31:41Z

+# network. The payload travels via argv (the Vercel CLI hands its flusher the
+# serialized request the same way); it carries only the event + the write-only
+# public token, so argv visibility is acceptable.
+_FLUSH_SNIPPET = "from aai_cli import telemetry; telemetry.flush_argv()"


Spawning a detached Python subprocess via a constructed -c snippet and discarding stdio hides runtime behavior; consider avoiding dynamic -c execution or make the subprocess invocation explicit and visible to reviewers.

Details

✨ AI Reasoning
The change introduces a detached flusher subprocess that runs Python code via the command-line -c snippet and discards stdio. The payload is passed as an argv string and then parsed in the child. Using subprocess.Popen with start_new_session=True and redirecting stdout/stderr to DEVNULL hides the child process's activity from the user and logs. Executing a dynamically-constructed snippet with -c increases the difficulty of static review and can conceal side effects. These patterns (dynamic -c execution, payload via argv, suppressed stdio, detached session) were not present before and were added by this diff. This raises a transparency/obfuscation concern under the rule: intent of that runtime behavior is hidden from observers and reviewers.

🔧 How do I fix it?
Ensure code is transparent and not intentionally obfuscated. Avoid hiding functionality from code review. Focus on intent and deception, not specific patterns.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

aikido-pr-checks · 2026-06-11T17:31:42Z

+def dispatch(event: Mapping[str, object]) -> None:
+    """Hand one event to a detached flusher process and return immediately."""
+    payload = json.dumps({"url": intake_url(), "token": client_token(), "event": event})
+    subprocess.Popen(


Dispatch uses subprocess.Popen to run a hidden flusher (-c snippet) with the payload passed via argv and stdio discarded; this conceals execution and data flow from reviewers.

Details

✨ AI Reasoning
dispatch() now serializes event+token into JSON and passes it to a detached child via process argv (subprocess.Popen([... , '-c', _FLUSH_SNIPPET, payload], ...)). Passing sensitive or operational data via dynamically passed argv to an -c snippet and running it with stdio suppressed obscures what is being sent and executed. While the payload here is intended to be public telemetry, this pattern was added by the PR and introduces an obfuscation-like execution path that makes review and auditing harder.

🔧 How do I fix it?
Ensure code is transparent and not intentionally obfuscated. Avoid hiding functionality from code review. Focus on intent and deception, not specific patterns.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

aikido-pr-checks · 2026-06-11T17:31:43Z

+    )
+
+
+def flush_argv() -> None:


flush_argv() reads and executes serialized payloads from sys.argv[1], invoked by a dynamically-executed -c snippet; this runtime indirection conceals the request flow and complicates auditability.

Details

✨ AI Reasoning
flush_argv() reads sys.argv[1] (the payload) and then loads and posts it. This function is intended to be called by the dynamically-executed -c snippet in the detached child. That dynamic invocation pattern (assembling a snippet referencing module functions and passing a serialized payload through argv) is a deliberate runtime indirection added by this diff, which obscures the actual network request flow from casual code review.

🔧 How do I fix it?
Ensure code is transparent and not intentionally obfuscated. Avoid hiding functionality from code review. Focus on intent and deception, not specific patterns.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

_fake_ytdlp replaces sys.modules["yt_dlp"] with a non-package namespace, so the --download-sections path's `from yt_dlp.utils import …` only resolved when an earlier test had already imported the real module — passing or failing with pytest-randomly's seed. Cache the real yt_dlp.utils first so the import is deterministic. https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE

Address the Aikido review findings on the dispatch path: spawning a dynamically-constructed `python -c` snippet is needlessly opaque. The detached flusher is now the CLI's own hidden `telemetry flush` command (the same shape the Vercel CLI uses) — an explicit, reviewable entry point. The child env sets AAI_TELEMETRY_DISABLED=1 so a flush can never spawn another flusher, and a test pins that `flush` stays hidden from help. https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE

Resolves the tests/test_youtube.py conflict by taking main's version of the flake fix (both sides fixed the same order-dependent yt_dlp.utils import identically, differing only in the comment), keeps the telemetry bullet in AGENTS.md, and follows the CLI rename through the telemetry surface: command examples, the status hint, README disclosure, and the command-path assertions (events now record "assembly telemetry status"). Internals are unchanged — the package is still aai_cli, so the detached flusher's `python -m aai_cli telemetry flush` spawn and the AAI_* telemetry env vars keep working. https://claude.ai/code/session_017kPHAMnWAUs6XHVGk4FyWE

aikido-pr-checks Bot reviewed Jun 11, 2026

View reviewed changes

alexkroman enabled auto-merge (squash) June 11, 2026 17:31

aikido-pr-checks Bot reviewed Jun 11, 2026

View reviewed changes

claude added 3 commits June 11, 2026 17:41

alexkroman merged commit 461862f into main Jun 11, 2026
11 checks passed

alexkroman deleted the claude/sleepy-allen-yuemuz branch June 11, 2026 17:54

alexkroman mentioned this pull request Jun 11, 2026

Add Datadog CLI-usage dashboard + JSON validity gate #67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add anonymous usage telemetry collection#66

Add anonymous usage telemetry collection#66
alexkroman merged 4 commits into
mainfrom
claude/sleepy-allen-yuemuz

alexkroman commented Jun 11, 2026

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Uh oh!

alexkroman Jun 11, 2026

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Uh oh!

alexkroman Jun 11, 2026

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alexkroman commented Jun 11, 2026

Key Changes

Implementation Details

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

alexkroman Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

alexkroman Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants