feat(cli): chfx — decode / query / capture ClickHouse wire formats to JSON#43
Merged
Conversation
A publish-ready npm bin (`chfx`) that decodes ClickHouse wire-format dumps to structured JSON for humans and agents. Reuses the src/core decoders (DOM-free) and bundles to a single ESM file via esbuild. - `chfx decode [file]`: decode a .chproto capture, raw Native body, or raw RowBinaryWithNamesAndTypes body. Autodetects .chproto by magic and raw bodies by trial decode; `--format` forces it. Reads stdin when no path (or `-`) is given. `--protocol-version` sets the Native client version. - Output: the web ParsedData/AstNode tree as JSON, a top-level `bytesHex` (whole decoded buffer once), and per-node inline `bytes` by default so a consumer can read a value's bytes without slicing by range (`--no-node-bytes` to omit). bigints → decimal strings, byte blobs → hex. - Agent-friendly: deterministic JSON on stdout, JSON error envelope on stderr, exit codes (0 ok / 2 usage / 1 io|decode), `--help`/`--version`, non-interactive. Packaging: `bin`/`files`/`prepublishOnly` wired; `npm run cli` (tsx, dev) and `npm run cli:build` (esbuild → dist/cli/index.js). Adds esbuild, tsx, @types/node devDeps. Tests: src/cli/cli.test.ts — arg parsing, decode of every protocol fixture (+ bigint-safe serialization), hand-built Native/RowBinary bodies, autodetect + override, per-node bytes match their range, and tsx end-to-end (stdin, exit codes). README + AGENTS.md + docs/cli-spec.md updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collapse the capture→file→decode dance into single commands under chfx,
and make the dump file optional.
- `chfx query --query "<sql>"`: run AND decode in one step (no temp file).
- `--protocol tcp` (default): drive clickhouse-client through the capturing
proxy and decode the native packet stream; `--save <f>` keeps the .chproto.
- `--protocol http`: POST to ClickHouse HTTP requesting `--format`
(native | RowBinaryWithNamesAndTypes, default native) and decode the body;
`--protocol-version` sets the Native client version. Port defaults 8123;
auth via X-ClickHouse-User/-Key headers.
- `chfx capture --query "<sql>"`: capture to a .chproto dump only; `--out <f>`
writes a file (+ JSON summary), otherwise streams raw bytes to stdout so
`chfx capture … | chfx decode` works. `npm run capture` is now an alias to it;
the standalone scripts/capture-native.mjs is folded in and removed.
- Shared connection flags with env fallbacks (CH_NATIVE_HOST, CH_NATIVE_PORT /
CH_HTTP_PORT, CH_USER, CH_PASSWORD, CH_DATABASE, CLICKHOUSE_CLIENT) and
experimental type settings on by default (--no-experimental-settings,
repeatable --setting k=v).
Refactor: extract decodeCaptureStreams + buildDecodeEnvelope (shared by decode
and query); commands return a JSON|raw CommandOutput union the entry point
renders. Arg parser gains repeatable multiFlags (--setting). The TS CLI imports
the JS proxy via a new scripts/native-proxy.d.mts declaration; query/capture
reuse the same captureQuery the web/Electron paths use.
Docs: README quick start now leads with `chfx query` and `npm link`; full
transport/connection option tables. AGENTS.md + docs/cli-spec.md updated.
Tests: query (tcp via injected capture; http via injected fetch for Native +
RowBinary + error + flag-validation), capture (raw stdout + file summary),
repeatable-flag parsing, and connection/env resolution. 44 tests pass; verified
end-to-end against a live server on both transports.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hema help) - Blocker: a 0-column RowBinary header made decodeRows loop forever (offset never advances while remaining > 0), exhausting memory — trivially reachable since `decode` autodetect trials RowBinary first. `printf '\x00\x02' | chfx decode -` OOM'd the process. Guard the row loop to break on a non-advancing iteration (rowbinary-decoder.ts). Now terminates with a clean usage error. - High: piping decode output into a consumer that closes early (`… | head`) threw an unhandled EPIPE and dumped a Node stack trace to stderr, violating the clean-exit contract. Handle EPIPE on stdout/stderr and exit 0. - Stale `schema` references: general --help advertised a `chfx schema` command that was dropped; removed it and the registry comment. - Classify --save / --out write failures as io errors (not decode); wrap both writeFile calls. - README: build before `npm link` (link points at the not-yet-built binary). - Tidy an orphaned doc comment in connection.ts. Tests: regression for degenerate/tiny inputs terminating (no OOM). 45 CLI tests + 86 core unit tests pass; lint + tsc clean; both blockers verified fixed end-to-end (2-byte input → exit 2 usage error; `| head` → no stack trace). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…quest shape Fill the gaps from the coverage review (45 → 67 tests): - decode edge/error: empty input, missing file, invalid --protocol-version, ambiguous raw-body autodetect. - query http request construction (spied fetch): default_format, --setting and --database query params, X-ClickHouse-User/-Key headers, body, port 8123, and --protocol-version → client_protocol_version; RowBinaryWithNamesAndTypes format. - query/capture failure modes: tcp capture throw → io, --save write failure → io, empty http body → decode, http transport throw → io, unknown --protocol, unknown http --format, capture-command throw → io. - capture: -o alias, --out - raw stdout. - connection: CH_NATIVE_HOST/PORT + CH_HTTP_PORT env fallbacks and flag precedence, resolveHttpConnection defaults, --setting overriding an experimental default (added a withEnv save/restore helper and a shared fakeCaptureOf). - e2e via tsx: --version, unknown-command exit 2, and a clean-exit-on-EPIPE check (decode | head closes early → no stack trace). eslint + tsc clean; 67 CLI tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fuzz testing found that a TCP `chfx query`/`capture` hangs forever on any flag
the client rejects at startup (e.g. --setting totally_fake_setting_xyz=1 or a
bad setting value): clickhouse-client exits before opening the proxied TCP
connection, so the proxy's `done` promise (which only resolves once both ends
of a connection close) never settles and `await done` blocks indefinitely.
Fix in scripts/native-proxy.mjs (shared by the web/Electron/CLI capture paths):
- Settle `done` exactly once via finishOk/finishErr, and make the proxy's
close() resolve `done` with whatever was captured so far.
- In captureQuery, on a non-zero client exit, race `done` against a 100ms grace
window then force-close — so a pre-connect failure yields a clean io error
("clickhouse-client exited 40: …") instead of a hang. The happy path and the
connected-but-failed path (server Exception captured) are unchanged.
Regression test uses `false` as the client (exits before connecting) so it
needs no server. 68 CLI tests pass; lint + tsc clean; verified against the live
server that the original repros now return a clean error and normal queries
still work.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new publishable CLI, chfx, that can decode ClickHouse wire-format captures/bodies into the same structured JSON AST the web UI renders, and can also query (TCP capture via proxy or HTTP body) and capture .chproto dumps. It also hardens core decoding and the native capture proxy to avoid hangs / OOM on degenerate inputs.
Changes:
- Added
src/cli/*implementingchfx decode|query|capture, deterministic JSON output + JSON error envelopes, and shared connection/env flag handling. - Added extensive Vitest coverage for CLI parsing, decoding, query/capture flows, and e2e execution via
tsx. - Improved robustness in core RowBinary decoding (non-advancing loop guard) and in
scripts/native-proxy.mjs(ensuredonealways settles; avoid hang when client exits pre-connect), plus npm packaging/bundling for publishing.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| todo.md | Updates roadmap status to reflect implemented CLI/query/capture work. |
| src/core/decoder/rowbinary-decoder.ts | Adds a guard to prevent non-advancing decode loops (OOM prevention). |
| src/cli/version.ts | Build-injected CLI version + CLI schema version constant. |
| src/cli/registry.ts | Command/option registry used as single source of truth for --help. |
| src/cli/output.ts | CLI error type + JSON-safe serialization (bigint/bytes/Map/Set) + error envelope emission. |
| src/cli/index.ts | CLI entrypoint: dispatch, help/version, JSON vs raw stdout, EPIPE handling. |
| src/cli/connection.ts | Shared connection/query option resolution with env fallbacks and experimental settings defaults. |
| src/cli/commands/decode.ts | Implements decode + shared decode envelope + autodetect logic + stdin/file input. |
| src/cli/commands/query.ts | Implements query over TCP capture or HTTP body and wraps into decode envelope. |
| src/cli/commands/capture.ts | Implements capture to file or raw stdout bytes. |
| src/cli/args.ts | Minimal argument parser + helpers. |
| src/cli/cli.test.ts | Unit/integration/e2e tests for CLI behavior and failure modes. |
| scripts/native-proxy.mjs | Hardens proxy lifecycle to avoid hangs when client exits early; ensures done settles. |
| scripts/native-proxy.d.mts | Adds TS declarations for the JS proxy so the TS CLI can import it safely. |
| scripts/build-cli.mjs | Adds esbuild bundling script for a single-file publishable CLI binary. |
| scripts/capture-native.mjs | Removes legacy standalone capture script (folded into chfx capture). |
| README.md | Documents chfx usage, commands, options, and output envelope shape. |
| docs/cli-spec.md | Updates CLI spec to reflect the implemented command set and output decisions. |
| AGENTS.md | Adds contributor-facing notes for running/building the CLI and repo layout updates. |
| package.json | Adds bin, publish files, CLI scripts, prepublishOnly, and dev deps (esbuild/tsx/@types/node). |
| package-lock.json | Lockfile updates reflecting new dependencies and updated esbuild. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address Copilot review on PR #43: the permissive parser meant commands silently ignored unknown flags (a typo like `--protcol http` ran the default tcp path) and extra positionals. Add rejectUnknownArgs(allowed, maxPositionals) and call it in decode/query/capture so unrecognized options or surplus arguments fail fast as `usage` errors (exit 2), matching the documented contract. Also correct the stringOption doc comment (it errors on a repeated *multi* flag, not any repeat). Tests: unknown-flag + extra-positional rejection for decode/query/capture. 73 CLI tests pass; lint + tsc clean; verified `--protcol` now errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines
+90
to
+92
| const save = stringOption(args, 'save'); | ||
| const captureOpts = resolveCaptureOptions(args); | ||
|
|
Comment on lines
+68
to
+75
| function parsePort(raw: string | undefined): number | undefined { | ||
| if (raw === undefined) return undefined; | ||
| const port = Number(raw); | ||
| if (!Number.isInteger(port) || port <= 0) { | ||
| throw new CliError('usage', `--port must be a positive integer, got: ${raw}`); | ||
| } | ||
| return port; | ||
| } |
Address the second Copilot review on PR #43: - query --save '-' previously wrote a file literally named "-"; since stdout carries the decoded JSON, reject "-" as a usage error with a clear message. - --port accepted values > 65535 (only >0 was checked); require 1..65535 so invalid ports fail fast with a clear message instead of at connect time. Tests added for both. 75 CLI tests pass; lint + tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
chfx, an agent-friendly CLI that runs or decodes ClickHouse wire-format data and prints the same AST the web UI renders as structured JSON (delivers roadmap items #1 and #4, plus the query/capture UX).Commands
chfx decode [file]— decode a.chprotocapture, raw Native body, or raw RowBinaryWithNamesAndTypes body. Autodetects.chprotoby magic and raw bodies by trial decode (--formatforces it); reads stdin when no path (or-) is given.chfx query --query "<sql>"— run and decode in one step, no intermediate file.--protocol tcp(default): driveclickhouse-clientthrough the capturing proxy and decode the native packet stream;--save <f>also keeps the.chproto.--protocol http: POST to ClickHouse HTTP requesting--format native|RowBinaryWithNamesAndTypesand decode the body;--protocol-versionsets the Native client version.chfx capture --query "<sql>"— capture to a.chprotodump only (--out <f>, or raw bytes to stdout sochfx capture … | chfx decodeworks).npm run captureis now an alias (the oldscripts/capture-native.mjswas folded in).Output
The web
ParsedData/AstNodeJSON, wrapped with tool/format/protocol metadata, a top-levelbytesHex(whole buffer once), and — by default — each node's own rawbytesinline so a consumer doesn't have to slice by range (--no-node-bytesto drop them). Values are JSON-safe (bigint → decimal string, byte blobs → hex).Agent-friendliness
Deterministic JSON on stdout, diagnostics + a JSON error envelope on stderr, exit codes (
0ok /2usage /1io|decode), non-interactive,--help/--version. Shared connection flags with env fallbacks (CH_NATIVE_HOST,CH_NATIVE_PORT/CH_HTTP_PORT,CH_USER,CH_PASSWORD,CH_DATABASE,CLICKHOUSE_CLIENT); experimental type settings on by default (--no-experimental-settings, repeatable--setting k=v).Packaging
Publish-ready npm
binbundled to a single ESM file via esbuild (npm run cli:build→dist/cli/index.js);npm linkfor a PATHchfx, ornpm run cli -- …(tsx) for no-build dev. Reuses thesrc/coredecoders (DOM-free) and the existingscripts/native-proxy.mjscapture.Notable decisions
schemacommand —--help+ the self-describingdecodeoutput suffice for now (revisit when there are more commands).clickhouse-clientdependency and revive configurable protocol version) was considered and shelved.docs/cli-spec.mdrecords the full requirements session;todo.mdtracks remaining items (e.g. 100% AST coverage for RowBinary #5chfx proxy).Testing
src/cli/cli.test.ts): arg parsing (incl. repeatable flags), decode of every protocol fixture (+ bigint-safe serialization), forced/auto format detection, per-node bytes matching their range, query tcp (injected capture) + http (spied fetch: request params, auth headers, formats), failure modes (capture/fetch/write throws, empty input/body, invalid protocol/format/protocol-version),captureraw-stdout vs file, env fallbacks, and tsx e2e (stdin, exit codes,--version, unknown command, clean-exit-on-EPIPE).eslint .+tsc -bclean. Verified end-to-end against a live server on both transports.Review + hardening already applied on this branch
schemahelp text;--save/--outfailures reclassified asio.clickhouse-clientrejects a flag pre-connect (now a cleanioerror) — a fix that also hardens the web/Electron capture paths.🤖 Generated with Claude Code