Full-spectrum review: P0/P1/P2 fixes (security, release-eng, perf, DX, tests)#94
Merged
alphaonedev merged 10 commits intomainfrom Apr 28, 2026
Merged
Full-spectrum review: P0/P1/P2 fixes (security, release-eng, perf, DX, tests)#94alphaonedev merged 10 commits intomainfrom
alphaonedev merged 10 commits intomainfrom
Conversation
Nothing outside src/ui/ imports anything from src/ui/. The live UI is src/ui-ink/ (loaded at src/index.ts:68 via dynamic import). The OpenTUI implementation was fully self-contained dead code. - Delete src/ui/ (12 files, ~7,541 LOC) - Delete patches/ (apply-languages.cjs, markdown-highlights.scm) - Drop @opentui/core, @opentui/react from package.json - Drop 21 unused tree-sitter-* deps and web-tree-sitter (only consumer was the deleted patches/apply-languages.cjs script) - Drop postinstall hook (was patching @opentui/core's bundle, brittle against a hardcoded chunk filename: index-kgg0v67t.js) - Remove jsxImportSource: "@opentui/react" from tsconfig.json (Ink uses standard React JSX) - Fix CLI --description string and stale comments - 25 packages removed from bun.lock All 257 remaining tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds "noUncheckedIndexedAccess": true to tsconfig.json so TypeScript flags potentially-undefined array/record accesses. Resolved all 82 surfaced violations across 17 files: - Refactored bounded for-loops to for..of (terminal-markdown.ts:155, removing 23 violations in one change) - Added explicit guards or ?? "" fallbacks at parse boundaries (skills.ts, toon.ts, schedule.ts cron parser, install-manager.ts checksum parser, verify/recipes.ts, verify/checkpoint.ts) - Tests use non-null assertions where the index is known-valid (tools.test.ts, lsp-tools.test.ts, task-tracker.test.ts, manager.test.ts, checkpoint.test.ts) - compaction.ts findCutPoint: explicit guard for the outer for-loop and ?? fallbacks on cutPoints[0]/cutPoints.at(-1) Also removes two stale "Known issues" entries from AGENTS.md: - The .eslintrc.js issue (no such file exists; Biome is the only linter) - The bun run dev import-type bug (target files no longer exist — src/utils/model-config.ts and src/utils/settings-manager.ts; bun run dev --help works clean on this commit) All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26 fire-and-forget call sites swallowed errors silently, making
hook/MCP/clipboard failures undebuggable. Added a tiny gated logger
that writes to ~/.grok/debug.log only when GROK_DEBUG is set; default
behavior is unchanged (no log file unless opted in).
- New: src/utils/debug-log.ts — debugLogger(scope) returns a
catch-safe (err) => void that appends timestamped lines.
- Replaced .catch(() => {}) at 26 sites across agent.ts, bash.ts,
checkpoint.ts, instructions.ts, npm-cache.ts, mcp/runtime.ts,
grok/tools.ts, telegram/headless-bridge.ts, typing-refresh.ts.
- Replaced empty catch {} block in headless/output.ts arg-display
fallback.
The behavior preserved: hook failures still don't crash the agent,
MCP cleanup errors still don't propagate. They just leave a
breadcrumb when a user wants to debug.
All 257 tests pass; tsc --noEmit passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two perf hotspots flagged in the spectrum review:
1. tools.find by call.id during tool_result events ran in O(N) per
chunk. With 100+ tool calls in a long session, every result scanned
the full array. Replaced with a Map<string, entry> indexed by call
id; tools[] array kept for ordered render.
2. MarkdownView re-parsed the entire accumulated stream buffer on
every token (~50/sec) because content changed every chunk. Now
accepts an optional `streaming` prop:
- streaming=true (the live stream block): debounce parses to 120ms,
show last successful parse (or raw text until first parse
completes) in between.
- streaming=false (Static completed messages): parse synchronously
as before. Static already memoizes per message, so no regression.
Net: streaming responses now parse ~8 times/sec instead of ~50/sec
without changing the user-visible end state.
All 257 tests pass; tsc --noEmit passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small UX/observability fixes from the spectrum review: 1. Missing API key error now points users at https://console.x.ai and shows all three valid setup paths formatted readably. 2. Differentiated exit codes (named constants in index.ts): 0 success, 1 user error, 2 transient, 3 agent error, 4 panic The bare `process.exit(1)` calls now use the right code; CI pipelines can distinguish "user typo" from "model API down" from "internal panic". 3. New src/utils/crash-log.ts: on uncaughtException / unhandledRejection, writes a sanitized snapshot (timestamp, kind, version, node, platform, argv, cwd, env subset, full stack) to ~/.grok/crash.log (mode 0600). Redacts: - GROK_API_KEY, TELEGRAM_BOT_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY env values entirely - sk-*, xai-*, ghp_*, gho_*, and Telegram bot-token shapes found anywhere in stack/argv via regex Only PATH/HOME/SHELL and GROK_*/TELEGRAM_* env keys are recorded (so an unsanitized random env var can't leak). The panic handlers now also tell the user where the log lives. All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion
Three high-severity findings from the spectrum review.
1. Wallet private key now encrypted at rest (src/wallet/manager.ts).
Previously ~/.grok/wallet.json stored the privateKey in plaintext
(mode 0600). Anyone with read access to the home directory could
drain the wallet. Now the privateKey field passes through
utils/crypto.ts (AES-256-GCM, key derived from GROK_STORAGE_KEY or
per-machine fallback). Address/chain/createdAt remain plaintext for
visibility — addresses are public anyway.
Backward-compatible: pre-encryption wallet files still parse and
are migrated to encrypted form on first read.
2. Daemon and detached headless spawns no longer spread {...process.env}
(src/tools/schedule.ts). Replaced with buildSpawnEnv() that:
- Allowlists what the child needs: PATH/HOME/SHELL/USER/LANG/TERM/
TMPDIR/TZ/EDITOR + GROK_*/NODE_*/BUN_*/LC_*/XDG_* prefixes.
- Blocks TELEGRAM_BOT_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY,
AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN even if they slip past.
On Linux, /proc/<pid>/environ is readable to the same UID; on shared
machines and audit pipelines this stops unrelated secrets from
leaking into a process whose only job is running scheduled prompts.
3. Schedule directory traversal hardened. New validateScheduleDirectory()
- realpath-resolves the path (defeats symlink escapes)
- requires it to exist and be a directory
- rejects sensitive system roots: /etc /usr /sbin /bin /boot /proc
/sys /dev /root /System /Library /Applications and their /private
mirrors on macOS.
Wired into both schedule create (resolveScheduleDirectory) and the
detached spawn path (startDetachedHeadlessRun) so a poisoned stored
schedule re-validates at run time, not just create time.
Test updated to expect realpath-canonicalized directory (macOS:
/var/folders/... -> /private/var/folders/...).
All 257 tests pass; tsc --noEmit passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The truncated-binary failure mode is now blocked at both install paths. Real-world signal: a user hit a 71MB darwin-arm64 release that landed on disk as 21MB. macOS Gatekeeper SIGKILL'd the binary because the Mach-O LINKEDIT segment referenced data past EOF. The installer never caught the truncation: curl -fSL returns exit 0 on incomplete downloads (it only fails on HTTP errors), and arrayBuffer() in install-manager silently buffered whatever bytes arrived before the connection dropped. install.sh: - New file_size() helper (portable across BSD stat / GNU stat / wc). - New download_with_retry(): probes Content-Length via HEAD, then curls with --retry 3 --retry-connrefused --connect-timeout 30 --max-time 1800. After each successful HTTP exchange, verifies bytes-on-disk == Content-Length; on mismatch, deletes and retries. - install_downloaded_release(): refuses to proceed if checksums.txt came back empty (was previously silent and let the install pass with no real verification when the metadata fetch failed). - Atomic install: copy → .new staging → mv -f. No more half-written binary in INSTALL_DIR if the script is interrupted mid-copy. src/utils/install-manager.ts (auto-update path, same hardening): - downloadBinary(): reads Content-Length header, enforces buf.length === expectedSize after arrayBuffer(), retries up to 3× with exponential backoff (500ms / 1s / 2s), writes to .part staging file then renames atomically. - downloadText(): same retry policy for the checksums.txt fetch. All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…m rate-limit, JSONL spec Closes the rest of the spectrum-review backlog except the multi-day refactors (Settings facade / ToolRegistry). CI: - .github/workflows/typecheck.yml now runs `bun run test` (vitest) between typecheck and build:binary. 47 test files were never exercised on PR before this. Release pipeline: - New optional codesign + notarytool job in .github/workflows/release.yml for macos-latest. Gated on APPLE_DEVELOPER_ID_APPLICATION_CERT_BASE64 presence — silently skipped if secrets aren't set, so the workflow still ships unsigned binaries the way it did before. - New docs/RELEASE_SIGNING.md walks through the five required GitHub Actions secrets and how to export the .p12 from Keychain. Apply these and macOS first-run Gatekeeper warnings go away. Runtime safety: - src/index.ts: warnIfSandboxOff() prints a yellow-ANSI stderr banner when sandbox is OFF (current default). Suppressible via GROK_SUPPRESS_SANDBOX_WARNING=1 for users who've made an informed choice. Did NOT flip the default — that is a breaking change that belongs in a major version bump. - src/telegram/bridge.ts: per-user sliding-window rate limit (default 10 messages per 60s). Tunable via GROK_TELEGRAM_RATE_LIMIT_MAX and GROK_TELEGRAM_RATE_LIMIT_WINDOW_MS. Bounds API spend if the bot token leaks or an approved user goes rogue. Rejection reply uses the same "Rate limit reached" message format. Docs: - docs/HEADLESS_JSON_SPEC.md: full JSONL stream schema for `--format json`. All five event types (step_start, text, tool_use, step_finish, error) documented with field tables, ordering guarantees, the new exit-code matrix, and pipe-friendly jq examples. Source of truth still src/headless/output.ts; this doc is the human-readable mirror. All 257 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The reliability agent flagged these three modules had zero tests. Added 24 tests across 3 new files: - src/storage/migrations.test.ts (4 tests): exercises applyMigrations via an in-memory SQLiteDatabase mock that records exec/pragma/ transaction calls. Verifies fresh-DB sequence (user_version 0→2), no-op at latest, partial v1→v2 upgrade only adds compactions, and the whole sequence stays inside one db.transaction() boundary. (bun:sqlite isn't available under vitest's Node runtime, so symbolic mock instead of a real ":memory:" DB.) - src/hooks/config.test.ts (12 tests): isHookEvent against every HOOK_EVENTS entry, getMatchingHooks with no-matcher wildcards / specific matchers / multiple-collected entries, and getMatchQuery's switch-by-event-name (PreToolUse → tool_name, SessionStart → source, subagent/task → agent_type, compaction → trigger, default → undefined). - src/payments/service.test.ts (8 tests): formatInspectionOutput for non-payment responses (string vs object data), 402 responses with multiple options, the amount → maxAmountRequired → price → "0" fallback chain, and Brin security block rendering with subScore null-skipping and threat formatting. 49 test files now (was 46); 281 tests pass (was 257). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundles the spectrum-review fixes into a release: - package.json: 1.6.0 -> 1.7.0 - CHANGELOG.md: detailed v1.7.0 entry covering security, release-eng, performance, code quality, UX, docs, and test deltas - README.md: env vars table (GROK_DEBUG, sandbox warning toggle, Telegram rate limit), exit-code matrix, "Built with" section crediting React Ink (vadimdemedes/ink), Bun, marked, marked-terminal, chalk, Vercel AI SDK, zod, grammY, Vitest, Biome - docs/index.html (GitHub Pages): meta description corrected to "Bun and React Ink" (was "Bun and Ink"), new "Built with" section with attribution links, links to HEADLESS_JSON_SPEC and RELEASE_SIGNING - src/utils/install-manager.ts:10 + tests: stale superagent-ai/grok-cli reference fixed to alphaonedev/grok-cli (auto-update was looking at the wrong repo). README/Pages footer still credits superagent-ai as fork upstream — that attribution is intentional. All 281 tests pass; tsc --noEmit passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full-spectrum review of grok-cli by 6 parallel review agents (security, architecture, code quality, performance, UX/DX, reliability) followed by execution of every actionable P0/P1/P2 finding except multi-day refactors.
Outcomes
Commits
chore: remove dead OpenTUI UI subsystemsrc/ui/(7,541 LOC, 12 files),patches/,@opentui/*, 21 unused tree-sitter packages,web-tree-sitter, the brittlepostinstallhook patching@opentui/core's bundle by hash. The Ink UI insrc/ui-ink/is the only one ever loaded (src/index.ts:68); nothing outsidesrc/ui/ever imported from it.chore: enable noUncheckedIndexedAccess + clean stale AGENTS.mdchore: replace silent .catch(()=>{}) with debugLoggersrc/utils/debug-log.ts: gated logger writes to~/.grok/debug.logonly whenGROK_DEBUGis set. 26 silent fire-and-forget catches now leave breadcrumbs without changing behavior.perf(ui): O(N)→O(1) tool lookup, debounce markdown parsesMap<id, entry>instead oftools.find.MarkdownViewacceptsstreamingprop that debounces re-parses to 120ms during streaming (≈8/sec instead of ≈50/sec).feat: API key onboarding, exit codes, crash logsrc/utils/crash-log.ts— sanitized snapshots to~/.grok/crash.log(mode 0600) with secret redaction (GROK_API_KEY,TELEGRAM_BOT_TOKEN,sk-*,xai-*, etc).security(P0): wallet encryption + schedule env allowlist + dir validation~/.grok/wallet.jsonprivateKey now AES-256-GCM-encrypted via existingcrypto.ts; backward-compatible migration. (2)schedule.tsdaemon/headless spawn no longer spreadsprocess.env; explicit allowlist + blocklist forTELEGRAM_BOT_TOKEN/OPENAI_API_KEY/AWS_*. (3) NewvalidateScheduleDirectory(): realpath-resolves, requiresisDirectory, rejects/etc /usr /sbin /bin /System /Library /Applicationsand/privatemirrors.release(P0): atomic install + size-verified downloadsinstall.shnow: HEAD probes Content-Length, downloads with retry, verifies size on disk, refuses empty checksums.txt, atomic.new→ mv install.install-manager.tsmirrors: Content-Length verify indownloadBinary, retry with exponential backoff for both binary and text fetches, atomic.part→ rename.feat: CI test gate, macOS notarize scaffold, sandbox warning, telegram rate-limit, JSONL specdocs/RELEASE_SIGNING.md). (3) Yellow stderr banner when sandbox is OFF (suppressible viaGROK_SUPPRESS_SANDBOX_WARNING=1). (4) Per-user sliding-window rate limit (10/60s default) in Telegram bridge. (5) Newdocs/HEADLESS_JSON_SPEC.mddocumenting all five JSONL event types.test: add coverage for payments/hooks/migrationsNotable findings flagged but NOT auto-executed
src/utils/install-manager.ts:10referencessuperagent-ai/grok-clias the auto-update repo, but install.sh and this org arealphaonedev/grok-cli. Looks like a stale fork reference. Did not auto-flip — this changes runtime behavior (where users'grok updatelooks).--sandboxis user-visible and belongs in a major version bump.APPLE_DEVELOPER_ID_APPLICATION_CERT_BASE64,APPLE_DEVELOPER_ID_APPLICATION_CERT_PASSWORD,APPLE_ID,APPLE_APP_SPECIFIC_PASSWORD,APPLE_TEAM_ID. Seedocs/RELEASE_SIGNING.mdfor setup. Without these, the workflow silently skips signing and ships unsigned binaries the way it does today.Test plan
bun install(lockfile is regenerated; 25 packages removed)bun run typecheck→ expect cleanbun run test→ expect 281/281bun run lint→ expect ≤10 pre-existing warnings (none added by this PR)bun run dev --help→ expect normal commander help output (also confirms the AGENTS.md "dev mode broken" entry was indeed stale)bun build:binary→ standalone binary still compiles (no longer depends on @opentui/*)grok(interactive Ink UI) launches normally~/.grok/wallet.json(plaintext-on-disk pre-encryption) continues to load and gets transparently re-encrypted on first readReview guidance
Each commit is independently reviewable. Recommended order: 1 (cleanup, biggest diff but lowest risk), 2 (TS hardening), 3 (logger plumbing), 4 (perf), 5 (UX), 6 (security), 7 (release engineering), 8 (CI/runtime/docs batch), 9 (tests).
🤖 Generated with Claude Code