Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ A tool for visualizing ClickHouse RowBinary and Native wire format data. Feature

The `NativeProtocol` format decodes a whole connection's packet stream rather than a single HTTP format body. A small TCP proxy (`scripts/native-proxy.mjs`, mirrored for the app in `electron/native-capture.ts`) sits between `clickhouse-client` and the server, forwarding bytes and teeing both directions into a capture. On localhost clickhouse-client disables compression, so the capture is plaintext, uncompressed packets (TLS/compression are out of scope).

- Capture a dump for tests/inspection: `npm run capture -- --query "SELECT 1" --out cap.chproto` (see `scripts/capture-native.mjs`).
- Capture a dump for tests/inspection: `npm run capture -- --query "SELECT 1" --out cap.chproto` (alias for `chfx capture`, in `src/cli/commands/capture.ts`). Or `chfx query --query "SELECT 1"` to capture **and** decode in one step.
- `.chproto` dump format and parsing: `scripts/native-proxy.mjs` (writer) and `src/core/decoder/protocol-dump.ts` (reader).
- Decoder: `src/core/decoder/protocol-decoder.ts` (`ProtocolDecoder`) reuses `NativeDecoder.decodeProtocolBlock()` for the Block inside Data-family packets. It derives the negotiated version from `min(client, server)` Hello and gates every field per `docs/full_native_protocol_spec.md`.
- Capture from the **web UI**: the browser POSTs SQL to a `/capture` endpoint that runs the proxy server-side and returns the `.chproto` dump (the browser can't open raw TCP itself). Served by: `npm run dev` / `vite preview` (Vite plugin in `vite.config.ts` → `scripts/capture-middleware.mjs`), and the **Docker** image (standalone `scripts/capture-server.mjs` under supervisord, proxied by nginx). Native-connection defaults come from env: `CH_NATIVE_HOST`/`CH_NATIVE_PORT`/`CH_USER`/`CH_PASSWORD`/`CLICKHOUSE_CLIENT`; `CAPTURE_EXPERIMENTAL_SETTINGS=0` stops sending experimental type settings per-query (for read-only users that reject them — rely on the profile instead).
Expand All @@ -36,6 +36,12 @@ npm run test # Run integration tests (uses testcontainers)
npm run lint # ESLint check
npm run test:e2e # Build Electron + run Playwright e2e tests

# CLI (chfx) — run/decode wire-format data to structured JSON
npm run cli -- query --query "SELECT 1" # capture over native protocol + decode (one step)
npm run cli -- decode capture.chproto # decode an existing dump (run from source via tsx)
npm run capture -- --query "SELECT 1" -o cap.chproto # capture only (alias for `chfx capture`)
npm run cli:build # bundle the publishable binary → dist/cli/index.js

# Electron desktop app
npm run electron:dev # Dev mode with hot reload
npm run electron:build # Package desktop installer for current platform
Expand Down Expand Up @@ -78,6 +84,18 @@ src/
│ └── request-params.ts # Shared request parameter builder
├── store/
│ └── store.ts # Zustand store (query, parsed data, UI state)
├── cli/ # chfx CLI (Node, bundled via esbuild; reuses src/core decoders)
│ ├── index.ts # Entry: command dispatch, --help/--version, JSON|raw stdout, error envelope
│ ├── commands/decode.ts# `decode` — decodeBuffer/decodeCaptureStreams + shared buildDecodeEnvelope
│ ├── commands/query.ts # `query` — capture (proxy) + decode in one step; --save keeps the dump
│ ├── commands/capture.ts# `capture` — capture to .chproto (file or raw stdout); `npm run capture` alias
│ ├── connection.ts # Shared --host/port/user/... resolution + env fallbacks + experimental settings
│ ├── args.ts # Dependency-free arg parser (value + repeatable flags)
│ ├── output.ts # CliError, JSON-safe serializer (bigint→string, bytes→hex), CommandOutput
│ ├── registry.ts # Command metadata for --help
│ ├── version.ts # Build-injected version (esbuild define)
│ └── cli.test.ts # Vitest unit + tsx e2e tests (uses fixtures/protocol/*.chproto)
│ # query/capture reuse scripts/native-proxy.mjs (+ native-proxy.d.mts for types)
└── styles/ # CSS files
electron/
├── main.ts # Electron main process (window, IPC handlers)
Expand Down
95 changes: 95 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ A tool for visualizing ClickHouse RowBinary and Native format data. Features an
- **Interactive Highlighting**: Selecting a node in the tree highlights corresponding bytes in the hex view (and vice versa)
- **Full Type Support**: All ClickHouse types including Variant, Dynamic, JSON, Geo types, Nested, etc.
- **Desktop App**: Electron app that connects to your existing ClickHouse server (no bundled DB)
- **CLI (`chfx`)**: Decode `.chproto` / Native / RowBinary dumps to structured JSON from the terminal — agent-friendly

## Quick Start (Docker)

Expand Down Expand Up @@ -41,6 +42,100 @@ CH_VERSION=24.3 docker compose build

The version is baked into the image — rebuild to change it.

## CLI (`chfx`)

A command-line tool that runs or decodes ClickHouse wire-format data and prints
structured JSON — the same AST the web UI renders, plus the raw bytes — so it
can be scripted or driven by an agent.

### Quick start

```bash
npm install
npm run cli:build # build the binary → dist/cli/index.js
npm link # makes `chfx` available on your PATH (points at the built binary)

# Run a query and see it decoded — one step, no intermediate file:
chfx query --query "SELECT number AS n, [number] AS arr FROM numbers(3)"

# Decode a dump you already have (or pipe one in):
chfx decode capture.chproto
clickhouse-client -q "SELECT 1 FORMAT Native" | chfx decode -f native -
```

> Prefer not to `npm link`? Use `npm run cli -- <args>` (runs from source via
> tsx, no build) or `node dist/cli/index.js <args>` after `cli:build`.

Output is a single JSON document on **stdout**; diagnostics and a JSON error
envelope go to **stderr**. Exit codes: `0` success, `2` usage error, `1` I/O or
decode error.

### Commands

| Command | Description |
|---------|-------------|
| `chfx query --query "<sql>"` | Run a query **and decode it** in one step (no file). `--protocol tcp` (default) captures the native packet stream via `clickhouse-client`; `--protocol http` POSTs to ClickHouse HTTP and decodes the `--format` body. `--save <f>` keeps the `.chproto` dump (tcp). |
| `chfx capture --query "<sql>"` | Capture a query to a `.chproto` dump only (native protocol). Writes `--out <f>`, or streams raw bytes to stdout (so `chfx capture … \| chfx decode` works). `npm run capture` is an alias. |
| `chfx decode [file]` | Decode a `.chproto`, Native, or RowBinary dump to JSON. Reads stdin when no file (or `-`) is given. |
| `chfx --help` / `chfx <cmd> --help` | Human-readable help. |
| `chfx --version` | Print the version. |

### `query` transport options

| Option | Description |
|--------|-------------|
| `--protocol tcp\|http` | Transport. `tcp` (default) = native capture via `clickhouse-client`. `http` = HTTP request. |
| `--format native\|RowBinaryWithNamesAndTypes` | **http only** — the body format to request and decode (default `native`). |
| `--protocol-version <N>` | `client_protocol_version` for an http Native query (default `0`). |
| `--save <file>` | **tcp only** — also write the raw `.chproto` capture. |

### Connection options (`query` / `capture`)

| Option | Description |
|--------|-------------|
| `--query <sql>` | SQL to run (required). |
| `--host` / `--port` | Server host / port. Env: `CH_NATIVE_HOST`, `CH_NATIVE_PORT` (tcp) / `CH_HTTP_PORT` (http). Default `127.0.0.1`, port `9000` (tcp) / `8123` (http). |
| `--user` / `--password` | Credentials. Env: `CH_USER` / `CH_PASSWORD`. |
| `--database <db>` | Default database. Env: `CH_DATABASE`. |
| `--setting k=v` | Per-query setting; repeatable. |
| `--no-experimental-settings` | Don't send the Variant/Dynamic/JSON/QBit enabling settings (sent by default). |
| `--client <path>` | Path to `clickhouse-client` (tcp only). Env: `CLICKHOUSE_CLIENT`. |
| `--out <file>` (`capture`) | Where to write the `.chproto` dump. |

### `decode` options

| Option | Description |
|--------|-------------|
| `--format`, `-f` `<chproto\|native\|rowbinary>` | Force the decoder. Omitted → autodetect: `.chproto` by magic header, raw bodies by trial decode (ambiguous input errors and asks for `--format`). |
| `--protocol-version <N>` | Native `client_protocol_version` used to interpret a raw Native body (default `0`). |
| `--no-node-bytes` | Omit each node's inline raw bytes (consumers slice `bytesHex` by range instead). Smaller output. |
| `--compact` | Emit single-line JSON instead of pretty-printed. |

### Output shape

```jsonc
{
"chfx": { "tool": "chfx", "version": "...", "schemaVersion": 1, "command": "decode" }, // or "query"
"source": { "kind": "file", "path": "...", "byteLength": 2417 }, // kind "stdin" | "query" too
"format": "NativeProtocol", // | Native | RowBinaryWithNamesAndTypes
"formatDetected": true, // false when forced via --format
"protocolVersion": 54482, // negotiated (chproto) / requested (native) / null (rowbinary)
"nodeBytes": true, // false when --no-node-bytes was passed
"protocol": { "negotiatedVersion": 54482, "c2sLength": 191, "dumpMeta": { ... } },
"bytesHex": "0011436c...", // the whole decoded buffer, encoded once
"data": { /* ParsedData: header, rows|blocks, trailingNodes, metadata */ }
}
```

Every node has a `byteRange` of `{start, end}` byte offsets into `bytesHex` (two
hex chars per byte; `start` inclusive, `end` exclusive). By default each node
**also carries its own raw bytes inline** as a `bytes` hex string, so a consumer
can read the bytes behind any value without slicing `bytesHex` itself — pass
`--no-node-bytes` to drop them for smaller output.

> Decoded values are JSON-safe: 64-bit and larger integers become decimal
> strings, and raw byte blobs become hex.

## Desktop App

For developers who already run ClickHouse locally. Download the latest release for your platform from the [Releases](../../releases) page:
Expand Down
60 changes: 41 additions & 19 deletions docs/cli-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,31 @@ Import a binary dump from a file **or stdin** and emit structured JSON.
- Accepts binary on **stdin** (e.g. piped from clickhouse-client) as well as a
file path argument.

#### `chfx query`
Run SQL against a server and decode the result in one step.
- Transport: **both, `--transport http|native`.**
- `http`: POST to ClickHouse HTTP, request the chosen format, decode the body.
- `native`: drive clickhouse-client through the capture proxy and decode the
full `.chproto` packet stream.
- Default `--format native` (richest). **Experimental type settings**
(Variant/Dynamic/JSON enablement) are **sent by default**, with
`--no-experimental-settings` to disable for read-only/strict servers that
reject them.
- Remote connection flags: `--host/--port/--user/--password`, env-var fallbacks,
and HTTPS/TLS where applicable.
#### `chfx query` (implemented)
Run a query **and decode it in one step** — no intermediate file — over either
transport, emitting the same envelope as `decode`:
- **`--protocol tcp`** (default): drives `clickhouse-client` through the
capturing proxy (`scripts/native-proxy.mjs`) and decodes the native packet
stream. `--save <file>` also writes the raw `.chproto` dump.
- **`--protocol http`**: POSTs to ClickHouse HTTP requesting `--format`
(`native` | `RowBinaryWithNamesAndTypes`, default native) and decodes the
body. `--protocol-version <N>` sets the Native `client_protocol_version`.
Port defaults to 8123 (env `CH_HTTP_PORT`); user/password go via
`X-ClickHouse-User`/`-Key` headers.
- SQL via the **`--query` flag**. **Experimental type settings** sent by default
(`--no-experimental-settings` to disable); `--setting k=v` repeatable.
- Connection flags `--host/--port/--user/--password/--database/--client` with
env fallbacks (`CH_NATIVE_HOST`, `CH_NATIVE_PORT`/`CH_HTTP_PORT`, `CH_USER`,
`CH_PASSWORD`, `CH_DATABASE`, `CLICKHOUSE_CLIENT`).
- **Deferred:** TLS. (The shelved own-TCP-client would remove the
`clickhouse-client` dependency for tcp and could revive item 3.)

#### `chfx capture` (implemented)
Capture a query to a `.chproto` dump **without decoding**. `--out <file>` (`-o`)
writes the dump; omitted, it streams the raw dump bytes to stdout so
`chfx capture … | chfx decode` works. Shares all `query` connection flags.
**`npm run capture` is a thin alias** to `chfx capture` (the standalone
`scripts/capture-native.mjs` was folded in and removed).

#### `chfx proxy` (item 5 — standalone capture proxy)
A listener that forwards to a target server and captures the native TCP stream.
Expand All @@ -66,17 +79,26 @@ through it — the proxy does not spawn the client itself.
- **Plaintext/uncompressed only** (same constraint as today). TLS/compressed
streams are unsupported — error clearly and document it.

#### `chfx schema` / `--help`
Machine-readable description of commands, flags, and the output JSON shape for
agent discovery.
#### `--help`
Human-readable help (`chfx --help`, `chfx <command> --help`). A standalone
machine-readable `schema` command was considered but **dropped** while the CLI
has a single real command: `--help` covers human discovery and the `decode`
output is already self-describing (carries `schemaVersion` and the byteRange/bytes
conventions inline). Revisit a structured-discovery surface (a `schema` command
or `--help --json`) once `query`/`proxy` add a multi-command contract.

### Output (item 4)
- **Reuse the web `ParsedData`/`AstNode` shape verbatim**, serialized to JSON,
wrapped with top-level metadata: tool/schema version, format, negotiated
protocol version (for captures), and the raw bytes.
- **Raw bytes inline, once**, as a top-level **hex** string. Agents read a node's
`byteRange {start, end}` (exclusive end) and slice the hex to inspect bytes —
no second command or sidecar file.
protocol version (for captures), `nodeBytes`, and the raw bytes.
- **Top-level `bytesHex`**: the whole decoded buffer encoded once as hex
(for NativeProtocol this is the combined c2s+s2c stream the ranges index into).
- **Per-node inline bytes (default on)**: every node with a `byteRange {start,
end}` (exclusive end) also carries its own raw bytes as a `bytes` hex string,
so a consumer reads a value's bytes directly without slicing `bytesHex`. This
trades output size (parents duplicate children's bytes) for convenience;
`--no-node-bytes` omits them and falls back to range lookups against `bytesHex`.
- JSON-safe values: bigints → decimal strings, byte blobs → hex.

### Tests & docs (cross-cutting)
- **Thorough CLI tests/fixtures**: decode against the existing
Expand Down
Loading
Loading