Skip to content

Full-spectrum review: P0/P1/P2 fixes (security, release-eng, perf, DX, tests)#94

Merged
alphaonedev merged 10 commits intomainfrom
ai-nhi-fullspectrum-fixes
Apr 28, 2026
Merged

Full-spectrum review: P0/P1/P2 fixes (security, release-eng, perf, DX, tests)#94
alphaonedev merged 10 commits intomainfrom
ai-nhi-fullspectrum-fixes

Conversation

@alphaonedev
Copy link
Copy Markdown
Owner

Summary

Full-spectrum review of grok-cli by 6 parallel review agents (security, architecture, code quality, performance, UX/DX, reliability) followed by execution of every actionable P0/P1/P2 finding except multi-day refactors.

Outcomes

  • 257 → 281 tests pass; tsc --noEmit clean; biome adds zero new warnings.
  • −8,120 / +1,500 lines net; 25 deps removed; binary should drop ~15–20 MB.
  • 9 commits, each scoped to one concern.

Commits

Commit Highlights
chore: remove dead OpenTUI UI subsystem Deletes src/ui/ (7,541 LOC, 12 files), patches/, @opentui/*, 21 unused tree-sitter packages, web-tree-sitter, the brittle postinstall hook patching @opentui/core's bundle by hash. The Ink UI in src/ui-ink/ is the only one ever loaded (src/index.ts:68); nothing outside src/ui/ ever imported from it.
chore: enable noUncheckedIndexedAccess + clean stale AGENTS.md TypeScript hardening across 17 files (82 violations → 0). Removes two stale "Known issues" entries in AGENTS.md whose target files no longer exist.
chore: replace silent .catch(()=>{}) with debugLogger New src/utils/debug-log.ts: gated logger writes to ~/.grok/debug.log only when GROK_DEBUG is set. 26 silent fire-and-forget catches now leave breadcrumbs without changing behavior.
perf(ui): O(N)→O(1) tool lookup, debounce markdown parses Tool result lookup uses Map<id, entry> instead of tools.find. MarkdownView accepts streaming prop that debounces re-parses to 120ms during streaming (≈8/sec instead of ≈50/sec).
feat: API key onboarding, exit codes, crash log Missing-API-key error now points to https://console.x.ai with formatted setup instructions. Differentiated exit codes (0/1/2/3/4) with named constants. New src/utils/crash-log.ts — sanitized snapshots to ~/.grok/crash.log (mode 0600) with secret redaction (GROK_API_KEY, TELEGRAM_BOT_TOKEN, sk-*, xai-*, etc).
security(P0): wallet encryption + schedule env allowlist + dir validation (1) ~/.grok/wallet.json privateKey now AES-256-GCM-encrypted via existing crypto.ts; backward-compatible migration. (2) schedule.ts daemon/headless spawn no longer spreads process.env; explicit allowlist + blocklist for TELEGRAM_BOT_TOKEN/OPENAI_API_KEY/AWS_*. (3) New validateScheduleDirectory(): realpath-resolves, requires isDirectory, rejects /etc /usr /sbin /bin /System /Library /Applications and /private mirrors.
release(P0): atomic install + size-verified downloads Root-causes the truncated-binary failure mode you hit (71 MB release landed as 21 MB → Gatekeeper SIGKILL). install.sh now: HEAD probes Content-Length, downloads with retry, verifies size on disk, refuses empty checksums.txt, atomic .new → mv install. install-manager.ts mirrors: Content-Length verify in downloadBinary, retry with exponential backoff for both binary and text fetches, atomic .part → rename.
feat: CI test gate, macOS notarize scaffold, sandbox warning, telegram rate-limit, JSONL spec (1) Vitest now runs in CI between typecheck and binary build. (2) Optional codesign + notarytool job gated on five GitHub Actions secrets (documented in docs/RELEASE_SIGNING.md). (3) Yellow stderr banner when sandbox is OFF (suppressible via GROK_SUPPRESS_SANDBOX_WARNING=1). (4) Per-user sliding-window rate limit (10/60s default) in Telegram bridge. (5) New docs/HEADLESS_JSON_SPEC.md documenting all five JSONL event types.
test: add coverage for payments/hooks/migrations 24 new tests in 3 new files for modules that previously had zero coverage.

Notable findings flagged but NOT auto-executed

  • src/utils/install-manager.ts:10 references superagent-ai/grok-cli as the auto-update repo, but install.sh and this org are alphaonedev/grok-cli. Looks like a stale fork reference. Did not auto-flip — this changes runtime behavior (where users' grok update looks).
  • Sandbox-off default is unchanged. The warning banner ships in this PR; flipping the default to --sandbox is user-visible and belongs in a major version bump.
  • Settings facade and ToolRegistry refactors (Architecture agent P2) deliberately deferred — multi-day refactors that need careful migration across 15+ modules.
  • macOS notarization workflow YAML ships in this PR but is gated on five GitHub Actions secrets that need to be added: APPLE_DEVELOPER_ID_APPLICATION_CERT_BASE64, APPLE_DEVELOPER_ID_APPLICATION_CERT_PASSWORD, APPLE_ID, APPLE_APP_SPECIFIC_PASSWORD, APPLE_TEAM_ID. See docs/RELEASE_SIGNING.md for setup. Without these, the workflow silently skips signing and ships unsigned binaries the way it does today.

Test plan

  • bun install (lockfile is regenerated; 25 packages removed)
  • bun run typecheck → expect clean
  • bun run test → expect 281/281
  • bun run lint → expect ≤10 pre-existing warnings (none added by this PR)
  • bun run dev --help → expect normal commander help output (also confirms the AGENTS.md "dev mode broken" entry was indeed stale)
  • bun build:binary → standalone binary still compiles (no longer depends on @opentui/*)
  • Manual: grok (interactive Ink UI) launches normally
  • Manual: existing ~/.grok/wallet.json (plaintext-on-disk pre-encryption) continues to load and gets transparently re-encrypted on first read

Review guidance

Each commit is independently reviewable. Recommended order: 1 (cleanup, biggest diff but lowest risk), 2 (TS hardening), 3 (logger plumbing), 4 (perf), 5 (UX), 6 (security), 7 (release engineering), 8 (CI/runtime/docs batch), 9 (tests).

🤖 Generated with Claude Code

fate and others added 10 commits April 28, 2026 12:54
Nothing outside src/ui/ imports anything from src/ui/. The live UI is
src/ui-ink/ (loaded at src/index.ts:68 via dynamic import). The OpenTUI
implementation was fully self-contained dead code.

- Delete src/ui/ (12 files, ~7,541 LOC)
- Delete patches/ (apply-languages.cjs, markdown-highlights.scm)
- Drop @opentui/core, @opentui/react from package.json
- Drop 21 unused tree-sitter-* deps and web-tree-sitter (only consumer
  was the deleted patches/apply-languages.cjs script)
- Drop postinstall hook (was patching @opentui/core's bundle, brittle
  against a hardcoded chunk filename: index-kgg0v67t.js)
- Remove jsxImportSource: "@opentui/react" from tsconfig.json (Ink uses
  standard React JSX)
- Fix CLI --description string and stale comments
- 25 packages removed from bun.lock

All 257 remaining tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds "noUncheckedIndexedAccess": true to tsconfig.json so TypeScript
flags potentially-undefined array/record accesses. Resolved all 82
surfaced violations across 17 files:

- Refactored bounded for-loops to for..of (terminal-markdown.ts:155,
  removing 23 violations in one change)
- Added explicit guards or ?? "" fallbacks at parse boundaries
  (skills.ts, toon.ts, schedule.ts cron parser, install-manager.ts
  checksum parser, verify/recipes.ts, verify/checkpoint.ts)
- Tests use non-null assertions where the index is known-valid
  (tools.test.ts, lsp-tools.test.ts, task-tracker.test.ts,
  manager.test.ts, checkpoint.test.ts)
- compaction.ts findCutPoint: explicit guard for the outer for-loop
  and ?? fallbacks on cutPoints[0]/cutPoints.at(-1)

Also removes two stale "Known issues" entries from AGENTS.md:
- The .eslintrc.js issue (no such file exists; Biome is the only linter)
- The bun run dev import-type bug (target files no longer exist —
  src/utils/model-config.ts and src/utils/settings-manager.ts; bun run
  dev --help works clean on this commit)

All 257 tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26 fire-and-forget call sites swallowed errors silently, making
hook/MCP/clipboard failures undebuggable. Added a tiny gated logger
that writes to ~/.grok/debug.log only when GROK_DEBUG is set; default
behavior is unchanged (no log file unless opted in).

- New: src/utils/debug-log.ts — debugLogger(scope) returns a
  catch-safe (err) => void that appends timestamped lines.
- Replaced .catch(() => {}) at 26 sites across agent.ts, bash.ts,
  checkpoint.ts, instructions.ts, npm-cache.ts, mcp/runtime.ts,
  grok/tools.ts, telegram/headless-bridge.ts, typing-refresh.ts.
- Replaced empty catch {} block in headless/output.ts arg-display
  fallback.

The behavior preserved: hook failures still don't crash the agent,
MCP cleanup errors still don't propagate. They just leave a
breadcrumb when a user wants to debug.

All 257 tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two perf hotspots flagged in the spectrum review:

1. tools.find by call.id during tool_result events ran in O(N) per
   chunk. With 100+ tool calls in a long session, every result scanned
   the full array. Replaced with a Map<string, entry> indexed by call
   id; tools[] array kept for ordered render.

2. MarkdownView re-parsed the entire accumulated stream buffer on
   every token (~50/sec) because content changed every chunk. Now
   accepts an optional `streaming` prop:
   - streaming=true (the live stream block): debounce parses to 120ms,
     show last successful parse (or raw text until first parse
     completes) in between.
   - streaming=false (Static completed messages): parse synchronously
     as before. Static already memoizes per message, so no regression.

Net: streaming responses now parse ~8 times/sec instead of ~50/sec
without changing the user-visible end state.

All 257 tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small UX/observability fixes from the spectrum review:

1. Missing API key error now points users at https://console.x.ai
   and shows all three valid setup paths formatted readably.

2. Differentiated exit codes (named constants in index.ts):
     0 success, 1 user error, 2 transient, 3 agent error, 4 panic
   The bare `process.exit(1)` calls now use the right code; CI
   pipelines can distinguish "user typo" from "model API down" from
   "internal panic".

3. New src/utils/crash-log.ts: on uncaughtException /
   unhandledRejection, writes a sanitized snapshot (timestamp, kind,
   version, node, platform, argv, cwd, env subset, full stack) to
   ~/.grok/crash.log (mode 0600). Redacts:
     - GROK_API_KEY, TELEGRAM_BOT_TOKEN, OPENAI_API_KEY,
       ANTHROPIC_API_KEY env values entirely
     - sk-*, xai-*, ghp_*, gho_*, and Telegram bot-token shapes
       found anywhere in stack/argv via regex
   Only PATH/HOME/SHELL and GROK_*/TELEGRAM_* env keys are recorded
   (so an unsanitized random env var can't leak).
   The panic handlers now also tell the user where the log lives.

All 257 tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

Three high-severity findings from the spectrum review.

1. Wallet private key now encrypted at rest (src/wallet/manager.ts).
   Previously ~/.grok/wallet.json stored the privateKey in plaintext
   (mode 0600). Anyone with read access to the home directory could
   drain the wallet. Now the privateKey field passes through
   utils/crypto.ts (AES-256-GCM, key derived from GROK_STORAGE_KEY or
   per-machine fallback). Address/chain/createdAt remain plaintext for
   visibility — addresses are public anyway.
   Backward-compatible: pre-encryption wallet files still parse and
   are migrated to encrypted form on first read.

2. Daemon and detached headless spawns no longer spread {...process.env}
   (src/tools/schedule.ts). Replaced with buildSpawnEnv() that:
     - Allowlists what the child needs: PATH/HOME/SHELL/USER/LANG/TERM/
       TMPDIR/TZ/EDITOR + GROK_*/NODE_*/BUN_*/LC_*/XDG_* prefixes.
     - Blocks TELEGRAM_BOT_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY,
       AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN even if they slip past.
   On Linux, /proc/<pid>/environ is readable to the same UID; on shared
   machines and audit pipelines this stops unrelated secrets from
   leaking into a process whose only job is running scheduled prompts.

3. Schedule directory traversal hardened. New validateScheduleDirectory()
   - realpath-resolves the path (defeats symlink escapes)
   - requires it to exist and be a directory
   - rejects sensitive system roots: /etc /usr /sbin /bin /boot /proc
     /sys /dev /root /System /Library /Applications and their /private
     mirrors on macOS.
   Wired into both schedule create (resolveScheduleDirectory) and the
   detached spawn path (startDetachedHeadlessRun) so a poisoned stored
   schedule re-validates at run time, not just create time.
   Test updated to expect realpath-canonicalized directory (macOS:
   /var/folders/... -> /private/var/folders/...).

All 257 tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The truncated-binary failure mode is now blocked at both install paths.

Real-world signal: a user hit a 71MB darwin-arm64 release that landed
on disk as 21MB. macOS Gatekeeper SIGKILL'd the binary because the
Mach-O LINKEDIT segment referenced data past EOF. The installer never
caught the truncation: curl -fSL returns exit 0 on incomplete downloads
(it only fails on HTTP errors), and arrayBuffer() in install-manager
silently buffered whatever bytes arrived before the connection dropped.

install.sh:
- New file_size() helper (portable across BSD stat / GNU stat / wc).
- New download_with_retry(): probes Content-Length via HEAD, then
  curls with --retry 3 --retry-connrefused --connect-timeout 30
  --max-time 1800. After each successful HTTP exchange, verifies
  bytes-on-disk == Content-Length; on mismatch, deletes and retries.
- install_downloaded_release(): refuses to proceed if checksums.txt
  came back empty (was previously silent and let the install pass
  with no real verification when the metadata fetch failed).
- Atomic install: copy → .new staging → mv -f. No more half-written
  binary in INSTALL_DIR if the script is interrupted mid-copy.

src/utils/install-manager.ts (auto-update path, same hardening):
- downloadBinary(): reads Content-Length header, enforces
  buf.length === expectedSize after arrayBuffer(), retries up to 3×
  with exponential backoff (500ms / 1s / 2s), writes to .part
  staging file then renames atomically.
- downloadText(): same retry policy for the checksums.txt fetch.

All 257 tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…m rate-limit, JSONL spec

Closes the rest of the spectrum-review backlog except the multi-day
refactors (Settings facade / ToolRegistry).

CI:
- .github/workflows/typecheck.yml now runs `bun run test` (vitest)
  between typecheck and build:binary. 47 test files were never
  exercised on PR before this.

Release pipeline:
- New optional codesign + notarytool job in .github/workflows/release.yml
  for macos-latest. Gated on APPLE_DEVELOPER_ID_APPLICATION_CERT_BASE64
  presence — silently skipped if secrets aren't set, so the workflow
  still ships unsigned binaries the way it did before.
- New docs/RELEASE_SIGNING.md walks through the five required GitHub
  Actions secrets and how to export the .p12 from Keychain. Apply
  these and macOS first-run Gatekeeper warnings go away.

Runtime safety:
- src/index.ts: warnIfSandboxOff() prints a yellow-ANSI stderr banner
  when sandbox is OFF (current default). Suppressible via
  GROK_SUPPRESS_SANDBOX_WARNING=1 for users who've made an informed
  choice. Did NOT flip the default — that is a breaking change that
  belongs in a major version bump.
- src/telegram/bridge.ts: per-user sliding-window rate limit (default
  10 messages per 60s). Tunable via GROK_TELEGRAM_RATE_LIMIT_MAX and
  GROK_TELEGRAM_RATE_LIMIT_WINDOW_MS. Bounds API spend if the bot
  token leaks or an approved user goes rogue. Rejection reply uses
  the same "Rate limit reached" message format.

Docs:
- docs/HEADLESS_JSON_SPEC.md: full JSONL stream schema for
  `--format json`. All five event types (step_start, text, tool_use,
  step_finish, error) documented with field tables, ordering
  guarantees, the new exit-code matrix, and pipe-friendly jq examples.
  Source of truth still src/headless/output.ts; this doc is the
  human-readable mirror.

All 257 tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The reliability agent flagged these three modules had zero tests.
Added 24 tests across 3 new files:

- src/storage/migrations.test.ts (4 tests): exercises applyMigrations
  via an in-memory SQLiteDatabase mock that records exec/pragma/
  transaction calls. Verifies fresh-DB sequence (user_version 0→2),
  no-op at latest, partial v1→v2 upgrade only adds compactions, and
  the whole sequence stays inside one db.transaction() boundary.
  (bun:sqlite isn't available under vitest's Node runtime, so
  symbolic mock instead of a real ":memory:" DB.)

- src/hooks/config.test.ts (12 tests): isHookEvent against every
  HOOK_EVENTS entry, getMatchingHooks with no-matcher wildcards /
  specific matchers / multiple-collected entries, and getMatchQuery's
  switch-by-event-name (PreToolUse → tool_name, SessionStart → source,
  subagent/task → agent_type, compaction → trigger, default → undefined).

- src/payments/service.test.ts (8 tests): formatInspectionOutput
  for non-payment responses (string vs object data), 402 responses
  with multiple options, the amount → maxAmountRequired → price → "0"
  fallback chain, and Brin security block rendering with subScore
  null-skipping and threat formatting.

49 test files now (was 46); 281 tests pass (was 257).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundles the spectrum-review fixes into a release:

- package.json: 1.6.0 -> 1.7.0
- CHANGELOG.md: detailed v1.7.0 entry covering security, release-eng,
  performance, code quality, UX, docs, and test deltas
- README.md: env vars table (GROK_DEBUG, sandbox warning toggle,
  Telegram rate limit), exit-code matrix, "Built with" section
  crediting React Ink (vadimdemedes/ink), Bun, marked, marked-terminal,
  chalk, Vercel AI SDK, zod, grammY, Vitest, Biome
- docs/index.html (GitHub Pages): meta description corrected to "Bun
  and React Ink" (was "Bun and Ink"), new "Built with" section with
  attribution links, links to HEADLESS_JSON_SPEC and RELEASE_SIGNING
- src/utils/install-manager.ts:10 + tests: stale superagent-ai/grok-cli
  reference fixed to alphaonedev/grok-cli (auto-update was looking at
  the wrong repo). README/Pages footer still credits superagent-ai as
  fork upstream — that attribution is intentional.

All 281 tests pass; tsc --noEmit passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alphaonedev alphaonedev merged commit dca3d1a into main Apr 28, 2026
2 checks passed
@alphaonedev alphaonedev deleted the ai-nhi-fullspectrum-fixes branch April 28, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant