Full-spectrum review: P0/P1/P2 fixes (security, release-eng, perf, DX, tests) by alphaonedev · Pull Request #94 · alphaonedev/grok-cli

alphaonedev · 2026-04-28T17:53:14Z

Summary

Full-spectrum review of grok-cli by 6 parallel review agents (security, architecture, code quality, performance, UX/DX, reliability) followed by execution of every actionable P0/P1/P2 finding except multi-day refactors.

Outcomes

257 → 281 tests pass; tsc --noEmit clean; biome adds zero new warnings.
−8,120 / +1,500 lines net; 25 deps removed; binary should drop ~15–20 MB.
9 commits, each scoped to one concern.

Commits

Commit	Highlights
`chore: remove dead OpenTUI UI subsystem`	Deletes `src/ui/` (7,541 LOC, 12 files), `patches/`, `@opentui/*`, 21 unused tree-sitter packages, `web-tree-sitter`, the brittle `postinstall` hook patching `@opentui/core`'s bundle by hash. The Ink UI in `src/ui-ink/` is the only one ever loaded (`src/index.ts:68`); nothing outside `src/ui/` ever imported from it.
`chore: enable noUncheckedIndexedAccess + clean stale AGENTS.md`	TypeScript hardening across 17 files (82 violations → 0). Removes two stale "Known issues" entries in AGENTS.md whose target files no longer exist.
`chore: replace silent .catch(()=>{}) with debugLogger`	New `src/utils/debug-log.ts`: gated logger writes to `~/.grok/debug.log` only when `GROK_DEBUG` is set. 26 silent fire-and-forget catches now leave breadcrumbs without changing behavior.
`perf(ui): O(N)→O(1) tool lookup, debounce markdown parses`	Tool result lookup uses `Map<id, entry>` instead of `tools.find`. `MarkdownView` accepts `streaming` prop that debounces re-parses to 120ms during streaming (≈8/sec instead of ≈50/sec).
`feat: API key onboarding, exit codes, crash log`	Missing-API-key error now points to https://console.x.ai with formatted setup instructions. Differentiated exit codes (0/1/2/3/4) with named constants. New `src/utils/crash-log.ts` — sanitized snapshots to `~/.grok/crash.log` (mode 0600) with secret redaction (`GROK_API_KEY`, `TELEGRAM_BOT_TOKEN`, `sk-`, `xai-`, etc).
`security(P0): wallet encryption + schedule env allowlist + dir validation`	(1) `~/.grok/wallet.json` privateKey now AES-256-GCM-encrypted via existing `crypto.ts`; backward-compatible migration. (2) `schedule.ts` daemon/headless spawn no longer spreads `process.env`; explicit allowlist + blocklist for `TELEGRAM_BOT_TOKEN`/`OPENAI_API_KEY`/`AWS_*`. (3) New `validateScheduleDirectory()`: realpath-resolves, requires `isDirectory`, rejects `/etc /usr /sbin /bin /System /Library /Applications` and `/private` mirrors.
`release(P0): atomic install + size-verified downloads`	Root-causes the truncated-binary failure mode you hit (71 MB release landed as 21 MB → Gatekeeper SIGKILL). `install.sh` now: HEAD probes Content-Length, downloads with retry, verifies size on disk, refuses empty checksums.txt, atomic `.new` → mv install. `install-manager.ts` mirrors: Content-Length verify in `downloadBinary`, retry with exponential backoff for both binary and text fetches, atomic `.part` → rename.
`feat: CI test gate, macOS notarize scaffold, sandbox warning, telegram rate-limit, JSONL spec`	(1) Vitest now runs in CI between typecheck and binary build. (2) Optional codesign + notarytool job gated on five GitHub Actions secrets (documented in `docs/RELEASE_SIGNING.md`). (3) Yellow stderr banner when sandbox is OFF (suppressible via `GROK_SUPPRESS_SANDBOX_WARNING=1`). (4) Per-user sliding-window rate limit (10/60s default) in Telegram bridge. (5) New `docs/HEADLESS_JSON_SPEC.md` documenting all five JSONL event types.
`test: add coverage for payments/hooks/migrations`	24 new tests in 3 new files for modules that previously had zero coverage.

Notable findings flagged but NOT auto-executed

src/utils/install-manager.ts:10 references superagent-ai/grok-cli as the auto-update repo, but install.sh and this org are alphaonedev/grok-cli. Looks like a stale fork reference. Did not auto-flip — this changes runtime behavior (where users' grok update looks).
Sandbox-off default is unchanged. The warning banner ships in this PR; flipping the default to --sandbox is user-visible and belongs in a major version bump.
Settings facade and ToolRegistry refactors (Architecture agent P2) deliberately deferred — multi-day refactors that need careful migration across 15+ modules.
macOS notarization workflow YAML ships in this PR but is gated on five GitHub Actions secrets that need to be added: APPLE_DEVELOPER_ID_APPLICATION_CERT_BASE64, APPLE_DEVELOPER_ID_APPLICATION_CERT_PASSWORD, APPLE_ID, APPLE_APP_SPECIFIC_PASSWORD, APPLE_TEAM_ID. See docs/RELEASE_SIGNING.md for setup. Without these, the workflow silently skips signing and ships unsigned binaries the way it does today.

Test plan

bun install (lockfile is regenerated; 25 packages removed)
bun run typecheck → expect clean
bun run test → expect 281/281
bun run lint → expect ≤10 pre-existing warnings (none added by this PR)
bun run dev --help → expect normal commander help output (also confirms the AGENTS.md "dev mode broken" entry was indeed stale)
bun build:binary → standalone binary still compiles (no longer depends on @opentui/*)
Manual: grok (interactive Ink UI) launches normally
Manual: existing ~/.grok/wallet.json (plaintext-on-disk pre-encryption) continues to load and gets transparently re-encrypted on first read

Review guidance

Each commit is independently reviewable. Recommended order: 1 (cleanup, biggest diff but lowest risk), 2 (TS hardening), 3 (logger plumbing), 4 (perf), 5 (UX), 6 (security), 7 (release engineering), 8 (CI/runtime/docs batch), 9 (tests).

🤖 Generated with Claude Code

Nothing outside src/ui/ imports anything from src/ui/. The live UI is src/ui-ink/ (loaded at src/index.ts:68 via dynamic import). The OpenTUI implementation was fully self-contained dead code. - Delete src/ui/ (12 files, ~7,541 LOC) - Delete patches/ (apply-languages.cjs, markdown-highlights.scm) - Drop @opentui/core, @opentui/react from package.json - Drop 21 unused tree-sitter-* deps and web-tree-sitter (only consumer was the deleted patches/apply-languages.cjs script) - Drop postinstall hook (was patching @opentui/core's bundle, brittle against a hardcoded chunk filename: index-kgg0v67t.js) - Remove jsxImportSource: "@opentui/react" from tsconfig.json (Ink uses standard React JSX) - Fix CLI --description string and stale comments - 25 packages removed from bun.lock All 257 remaining tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds "noUncheckedIndexedAccess": true to tsconfig.json so TypeScript flags potentially-undefined array/record accesses. Resolved all 82 surfaced violations across 17 files: - Refactored bounded for-loops to for..of (terminal-markdown.ts:155, removing 23 violations in one change) - Added explicit guards or ?? "" fallbacks at parse boundaries (skills.ts, toon.ts, schedule.ts cron parser, install-manager.ts checksum parser, verify/recipes.ts, verify/checkpoint.ts) - Tests use non-null assertions where the index is known-valid (tools.test.ts, lsp-tools.test.ts, task-tracker.test.ts, manager.test.ts, checkpoint.test.ts) - compaction.ts findCutPoint: explicit guard for the outer for-loop and ?? fallbacks on cutPoints[0]/cutPoints.at(-1) Also removes two stale "Known issues" entries from AGENTS.md: - The .eslintrc.js issue (no such file exists; Biome is the only linter) - The bun run dev import-type bug (target files no longer exist — src/utils/model-config.ts and src/utils/settings-manager.ts; bun run dev --help works clean on this commit) All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

26 fire-and-forget call sites swallowed errors silently, making hook/MCP/clipboard failures undebuggable. Added a tiny gated logger that writes to ~/.grok/debug.log only when GROK_DEBUG is set; default behavior is unchanged (no log file unless opted in). - New: src/utils/debug-log.ts — debugLogger(scope) returns a catch-safe (err) => void that appends timestamped lines. - Replaced .catch(() => {}) at 26 sites across agent.ts, bash.ts, checkpoint.ts, instructions.ts, npm-cache.ts, mcp/runtime.ts, grok/tools.ts, telegram/headless-bridge.ts, typing-refresh.ts. - Replaced empty catch {} block in headless/output.ts arg-display fallback. The behavior preserved: hook failures still don't crash the agent, MCP cleanup errors still don't propagate. They just leave a breadcrumb when a user wants to debug. All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two perf hotspots flagged in the spectrum review: 1. tools.find by call.id during tool_result events ran in O(N) per chunk. With 100+ tool calls in a long session, every result scanned the full array. Replaced with a Map<string, entry> indexed by call id; tools[] array kept for ordered render. 2. MarkdownView re-parsed the entire accumulated stream buffer on every token (~50/sec) because content changed every chunk. Now accepts an optional `streaming` prop: - streaming=true (the live stream block): debounce parses to 120ms, show last successful parse (or raw text until first parse completes) in between. - streaming=false (Static completed messages): parse synchronously as before. Static already memoizes per message, so no regression. Net: streaming responses now parse ~8 times/sec instead of ~50/sec without changing the user-visible end state. All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three small UX/observability fixes from the spectrum review: 1. Missing API key error now points users at https://console.x.ai and shows all three valid setup paths formatted readably. 2. Differentiated exit codes (named constants in index.ts): 0 success, 1 user error, 2 transient, 3 agent error, 4 panic The bare `process.exit(1)` calls now use the right code; CI pipelines can distinguish "user typo" from "model API down" from "internal panic". 3. New src/utils/crash-log.ts: on uncaughtException / unhandledRejection, writes a sanitized snapshot (timestamp, kind, version, node, platform, argv, cwd, env subset, full stack) to ~/.grok/crash.log (mode 0600). Redacts: - GROK_API_KEY, TELEGRAM_BOT_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY env values entirely - sk-*, xai-*, ghp_*, gho_*, and Telegram bot-token shapes found anywhere in stack/argv via regex Only PATH/HOME/SHELL and GROK_*/TELEGRAM_* env keys are recorded (so an unsanitized random env var can't leak). The panic handlers now also tell the user where the log lives. All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tion Three high-severity findings from the spectrum review. 1. Wallet private key now encrypted at rest (src/wallet/manager.ts). Previously ~/.grok/wallet.json stored the privateKey in plaintext (mode 0600). Anyone with read access to the home directory could drain the wallet. Now the privateKey field passes through utils/crypto.ts (AES-256-GCM, key derived from GROK_STORAGE_KEY or per-machine fallback). Address/chain/createdAt remain plaintext for visibility — addresses are public anyway. Backward-compatible: pre-encryption wallet files still parse and are migrated to encrypted form on first read. 2. Daemon and detached headless spawns no longer spread {...process.env} (src/tools/schedule.ts). Replaced with buildSpawnEnv() that: - Allowlists what the child needs: PATH/HOME/SHELL/USER/LANG/TERM/ TMPDIR/TZ/EDITOR + GROK_*/NODE_*/BUN_*/LC_*/XDG_* prefixes. - Blocks TELEGRAM_BOT_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN even if they slip past. On Linux, /proc/<pid>/environ is readable to the same UID; on shared machines and audit pipelines this stops unrelated secrets from leaking into a process whose only job is running scheduled prompts. 3. Schedule directory traversal hardened. New validateScheduleDirectory() - realpath-resolves the path (defeats symlink escapes) - requires it to exist and be a directory - rejects sensitive system roots: /etc /usr /sbin /bin /boot /proc /sys /dev /root /System /Library /Applications and their /private mirrors on macOS. Wired into both schedule create (resolveScheduleDirectory) and the detached spawn path (startDetachedHeadlessRun) so a poisoned stored schedule re-validates at run time, not just create time. Test updated to expect realpath-canonicalized directory (macOS: /var/folders/... -> /private/var/folders/...). All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The truncated-binary failure mode is now blocked at both install paths. Real-world signal: a user hit a 71MB darwin-arm64 release that landed on disk as 21MB. macOS Gatekeeper SIGKILL'd the binary because the Mach-O LINKEDIT segment referenced data past EOF. The installer never caught the truncation: curl -fSL returns exit 0 on incomplete downloads (it only fails on HTTP errors), and arrayBuffer() in install-manager silently buffered whatever bytes arrived before the connection dropped. install.sh: - New file_size() helper (portable across BSD stat / GNU stat / wc). - New download_with_retry(): probes Content-Length via HEAD, then curls with --retry 3 --retry-connrefused --connect-timeout 30 --max-time 1800. After each successful HTTP exchange, verifies bytes-on-disk == Content-Length; on mismatch, deletes and retries. - install_downloaded_release(): refuses to proceed if checksums.txt came back empty (was previously silent and let the install pass with no real verification when the metadata fetch failed). - Atomic install: copy → .new staging → mv -f. No more half-written binary in INSTALL_DIR if the script is interrupted mid-copy. src/utils/install-manager.ts (auto-update path, same hardening): - downloadBinary(): reads Content-Length header, enforces buf.length === expectedSize after arrayBuffer(), retries up to 3× with exponential backoff (500ms / 1s / 2s), writes to .part staging file then renames atomically. - downloadText(): same retry policy for the checksums.txt fetch. All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…m rate-limit, JSONL spec Closes the rest of the spectrum-review backlog except the multi-day refactors (Settings facade / ToolRegistry). CI: - .github/workflows/typecheck.yml now runs `bun run test` (vitest) between typecheck and build:binary. 47 test files were never exercised on PR before this. Release pipeline: - New optional codesign + notarytool job in .github/workflows/release.yml for macos-latest. Gated on APPLE_DEVELOPER_ID_APPLICATION_CERT_BASE64 presence — silently skipped if secrets aren't set, so the workflow still ships unsigned binaries the way it did before. - New docs/RELEASE_SIGNING.md walks through the five required GitHub Actions secrets and how to export the .p12 from Keychain. Apply these and macOS first-run Gatekeeper warnings go away. Runtime safety: - src/index.ts: warnIfSandboxOff() prints a yellow-ANSI stderr banner when sandbox is OFF (current default). Suppressible via GROK_SUPPRESS_SANDBOX_WARNING=1 for users who've made an informed choice. Did NOT flip the default — that is a breaking change that belongs in a major version bump. - src/telegram/bridge.ts: per-user sliding-window rate limit (default 10 messages per 60s). Tunable via GROK_TELEGRAM_RATE_LIMIT_MAX and GROK_TELEGRAM_RATE_LIMIT_WINDOW_MS. Bounds API spend if the bot token leaks or an approved user goes rogue. Rejection reply uses the same "Rate limit reached" message format. Docs: - docs/HEADLESS_JSON_SPEC.md: full JSONL stream schema for `--format json`. All five event types (step_start, text, tool_use, step_finish, error) documented with field tables, ordering guarantees, the new exit-code matrix, and pipe-friendly jq examples. Source of truth still src/headless/output.ts; this doc is the human-readable mirror. All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The reliability agent flagged these three modules had zero tests. Added 24 tests across 3 new files: - src/storage/migrations.test.ts (4 tests): exercises applyMigrations via an in-memory SQLiteDatabase mock that records exec/pragma/ transaction calls. Verifies fresh-DB sequence (user_version 0→2), no-op at latest, partial v1→v2 upgrade only adds compactions, and the whole sequence stays inside one db.transaction() boundary. (bun:sqlite isn't available under vitest's Node runtime, so symbolic mock instead of a real ":memory:" DB.) - src/hooks/config.test.ts (12 tests): isHookEvent against every HOOK_EVENTS entry, getMatchingHooks with no-matcher wildcards / specific matchers / multiple-collected entries, and getMatchQuery's switch-by-event-name (PreToolUse → tool_name, SessionStart → source, subagent/task → agent_type, compaction → trigger, default → undefined). - src/payments/service.test.ts (8 tests): formatInspectionOutput for non-payment responses (string vs object data), 402 responses with multiple options, the amount → maxAmountRequired → price → "0" fallback chain, and Brin security block rendering with subScore null-skipping and threat formatting. 49 test files now (was 46); 281 tests pass (was 257). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bundles the spectrum-review fixes into a release: - package.json: 1.6.0 -> 1.7.0 - CHANGELOG.md: detailed v1.7.0 entry covering security, release-eng, performance, code quality, UX, docs, and test deltas - README.md: env vars table (GROK_DEBUG, sandbox warning toggle, Telegram rate limit), exit-code matrix, "Built with" section crediting React Ink (vadimdemedes/ink), Bun, marked, marked-terminal, chalk, Vercel AI SDK, zod, grammY, Vitest, Biome - docs/index.html (GitHub Pages): meta description corrected to "Bun and React Ink" (was "Bun and Ink"), new "Built with" section with attribution links, links to HEADLESS_JSON_SPEC and RELEASE_SIGNING - src/utils/install-manager.ts:10 + tests: stale superagent-ai/grok-cli reference fixed to alphaonedev/grok-cli (auto-update was looking at the wrong repo). README/Pages footer still credits superagent-ai as fork upstream — that attribution is intentional. All 281 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fate and others added 10 commits April 28, 2026 12:54

alphaonedev merged commit dca3d1a into main Apr 28, 2026
2 checks passed

alphaonedev deleted the ai-nhi-fullspectrum-fixes branch April 28, 2026 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full-spectrum review: P0/P1/P2 fixes (security, release-eng, perf, DX, tests)#94

Full-spectrum review: P0/P1/P2 fixes (security, release-eng, perf, DX, tests)#94
alphaonedev merged 10 commits intomainfrom
ai-nhi-fullspectrum-fixes

alphaonedev commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alphaonedev commented Apr 28, 2026

Summary

Commits

Notable findings flagged but NOT auto-executed

Test plan

Review guidance

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant