Skip to content

Phase 1 — LSP codec + ZLS client#1

Merged
guysenpai merged 19 commits into
mainfrom
phase-1-lsp-codec
May 21, 2026
Merged

Phase 1 — LSP codec + ZLS client#1
guysenpai merged 19 commits into
mainfrom
phase-1-lsp-codec

Conversation

@guysenpai

Copy link
Copy Markdown
Contributor

Summary

Phase 1 of zls-mcp: ground-floor LSP layer that spawns ZLS, drives the JSON-RPC protocol, and runs the initializeinitializedshutdown handshake. No MCP tools are exposed yet — that arrives in Phase 3. An integration test proves end-to-end that we can talk to a real ZLS 0.16.0 binary and read back its capabilities.

Scope (per the Phase 1 brief)

In:

  • Content-Length framing + JSON-RPC 2.0 encode/decode (src/lsp/codec.zig)
  • file:// URI ↔ filesystem path, cross-platform (src/lsp/uri.zig)
  • Stderr logger with ZLS_MCP_LOG env-driven level (src/util/log.zig)
  • Minimal LSP types for the handshake (src/lsp/types.zig)
  • Client with subprocess + request registry + atomic-poll timeouts + $/cancelRequest (src/lsp/client.zig)
  • start / stop running the full LSP handshake
  • Juicy Main entry point (src/main.zigpub fn main(init: std.process.Init) !void)
  • Integration test against real ZLS (tests/integration_test.zig) on a minimal Zig fixture
  • zig build test step + GitHub Actions matrix (Linux/macOS/Windows) with ZLS install

Out:

  • MCP server stdio loop, tool registry/dispatch (Phase 3)
  • Workspace manager, didOpen batch (Phase 4)
  • File watcher (Phase 9)
  • UTF-8 ↔ UTF-16 conversion (stub at src/util/utf.zig; Phase 0 confirmed ZLS negotiates UTF-8)
  • Any business tool (def/refs/hover/etc.)

Commits (11)

Commit Subject
c839994 feat(lsp): add Content-Length framing and JSON-RPC codec
12d2df9 feat(lsp): add URI <-> path helpers cross-platform
97543d6 feat(util): add stderr logger with level via env
e839f08 feat(lsp): add LSP types needed for initialize handshake
a1f295b chore(build): add test step that aggregates inline tests
6e877cb feat(lsp): add ZLS client with subprocess spawn and request registry
608f373 feat(lsp): implement initialize + initialized + shutdown handshake
7ceb5df feat(main): switch entry point to Juicy Main pattern
6afd56b test(lsp): add integration test running real ZLS handshake
0aefcc0 style(probe): apply zig fmt to probe/probe.zig
004be69 chore(build): add CI matrix with ZLS install

Two commits are not in the original 9-step plan but are justified:

  • a1f295b (test aggregator) was required to fix a cross-module import path before client.zig could be tested.
  • 0aefcc0 (zig fmt on probe.zig) was surfaced by the new zig fmt --check CI step on a pre-existing formatting drift.

ZLS capabilities confirmed by the integration test

Run locally on macOS aarch64 with ZLS 0.16.0:

  • positionEncoding == "utf-8"
  • workspace_symbol_provider, hover_provider, definition_provider, references_provider, document_symbol_provider all true
  • call_hierarchy_provider and diagnostic_provider both false (matches PROBE_REPORT.md — documents what we work around server-side)
  • server_info.name == "zls", server_info.version == "0.16.0"

The test self-skips via error.SkipZigTest if zls is not on PATH, so CI on a clean runner without the ZLS-install step still passes zig build test.

Test plan

  • zig build — clean locally
  • zig build test — 30/30 (29 unit + 1 integration)
  • zig fmt --check src/ tests/ probe/ build.zig — clean
  • CI matrix green on Linux/macOS/Windows (will run on this PR)
  • Manual: ./zig-out/bin/zls-mcp prints version line on stderr
  • Manual: ZLS_MCP_LOG=warn ./zig-out/bin/zls-mcp produces no output (info filtered)

Notable design decisions

  1. Threading via atomic-poll + io.sleep rather than std.Thread.ResetEvent.timedWait (which was removed in Zig 0.16). Documented in CLAUDE.md "Zig 0.16 idioms".
  2. Slot ownership race between sendRequest timeout and reader thread is resolved via fetchRemove under std.Io.Mutex — whoever removes the slot from the registry owns it. If the caller times out but the reader has already taken the slot, the caller falls back to the normal "wait for done" path.
  3. workspaceFolders sent in initialize alongside the legacy rootUri. Phase 0 confirmed ZLS needs workspaceFolders for workspace/symbol to produce results.
  4. Reader EOF during shutdown is distinguished from a real crash via the stop_flag; expected EOF logs at debug, unexpected EOF at warn + marks the client crashed.

Out-of-scope items deferred to later phases

(Per SPEC.md and the Phase 1 brief — explicitly not addressed here.)

  • MCP stdio loop and tool registry (Phase 3)
  • Workspace bootstrap with didOpen batch (Phase 4)
  • Position-based / name-based / comptime tools (Phases 5–8)
  • File watcher with didChange (Phase 9)
  • Diagnostics + build + format tools (Phase 10)
  • Crash recovery (Phase 11)

guysenpai added 16 commits May 20, 2026 21:58
src/lsp/codec.zig handles two concerns:
- Framing: readFrame/writeFrame move framed bodies between Io.Reader
  /Io.Writer and []u8. Recognizes Content-Length, ignores other headers.
- Parsing helpers: extractId, extractMethod, hasResult, hasErrorField,
  extractError inspect a parsed std.json.Value to route LSP messages.

Encode helpers (encodeRequest, encodeNotification) build JSON-RPC bodies
from a method name plus params Value. Server error codes enum mirrors
SPEC.md "Locked-in technical choices".

10 unit tests cover framing roundtrip, header malformations, encode
roundtrip, and Id equality.
src/lsp/uri.zig converts filesystem paths to LSP "file://" URIs and
back. POSIX paths /foo/bar map to file:///foo/bar. Windows-style paths
C:\foo\bar (or C:/foo) map to file:///C:/foo/bar with the canonical
triple slash before the drive letter.

Detection is shape-based, not host-OS-based: a path starting with
<letter>:<sep> is treated as Windows on any host. This lets us test
both shapes on a single platform.

Percent-encoding follows RFC 3986. Decoding rejects malformed %XX
sequences with MalformedPercent. 7 unit tests cover both roundtrips,
special characters, and error paths.
src/util/log.zig: levels {debug, info, warn, err, off} stored in an
atomic global. setLevel/getLevel/parseLevel public API. Helper
functions debug/info/warn/err take a comptime component name and
format args. Output goes through std.debug.print which writes to
stderr without needing an Io handle.

Phase 1 trade-off: no ISO8601 timestamp prefix yet. Adding wall-clock
time in Zig 0.16 requires Io.Clock.real.now(io), which is not yet
threaded through every call site. Will be added when the MCP server
main loop owns a shared Io.
src/lsp/types.zig holds the Phase 1 LSP type subset: Position, Range,
Location, WorkspaceFolder, ClientInfo, ServerInfo, ServerCapabilities,
StartOptions.

ServerCapabilities collapses the LSP "bool | object | absent" provider
shapes into plain bools, which is enough for the Phase 1 client.
parseServerCapabilities / parseServerInfo extract fields from a
parsed std.json.Value, duplicating strings into the Client's
allocator so they survive the response tree being freed.

Full LSP type universe deliberately not modelled: methods that take
opaque params (definition, hover, etc.) keep passing std.json.Value
through, avoiding upfront tedium.
build.zig: new `zig build test` step that compiles src/test_root.zig
and runs every inline `test "..."` block reachable from it.

src/test_root.zig: comptime imports every file under src/ so the
test runner finds their inline tests. Each feat commit that adds a
new src/ module must extend the import list.

Running `zig build test` now executes the 26 tests in
codec.zig (10) + uri.zig (7) + types.zig (5) + log.zig (4).
src/lsp/client.zig: Client struct that spawns ZLS as a subprocess and
routes JSON-RPC requests/responses by id.

Threading:
- main thread: sendRequest / sendNotification; polls slot.done with
  io.sleep until response or deadline
- reader thread: owns child.stdout, frames + parses messages, takes
  the matching slot under the registry mutex, fills it, signals done
- stderr thread: forwards every line to the debug logger as [zls-stderr]

Slot ownership: each ResponseSlot lives on the gpa. Whoever removes
it from the registry owns it. On timeout, the caller wins the race
and emits $/cancelRequest before freeing. If the reader wins (response
arrived just in time), the caller falls back to the normal "read
result + free" path.

LspError set: ZlsSpawnFailed, ZlsCrashed, Timeout, ProtocolViolation,
ServerError, ClientStopped. ServerCapabilities (types.zig) field
defaults to empty string to avoid free-on-literal in client.deinit
of a never-started client.

Public API in this commit: init, deinit, spawn, terminate,
sendRequest, sendNotification. The full handshake (start / stop) is
added in the next commit.

3 inline tests cover init/deinit and "send on non-started client".
Client.start runs the LSP handshake end-to-end:
  spawn → initialize (5s timeout) → parse ServerCapabilities /
  ServerInfo → initialized notification.

buildInitializeParams constructs the params Value tree on an arena.
The capabilities sent match SPEC.md "LSP critical details":
  - general.positionEncodings = ["utf-8", "utf-16"]
  - workspace.workspaceFolders = true
  - workspace.symbol.dynamicRegistration = true
  - textDocument.callHierarchy.dynamicRegistration = true
  - textDocument.synchronization.{ dynamicRegistration: false, didSave: true }
Both rootUri (legacy) and workspaceFolders (modern) are sent — Phase 0
confirmed ZLS needs workspaceFolders for workspace/symbol to work.

Client.stop runs the LSP graceful shutdown: shutdown request (2s
timeout) → exit notification → child.wait (falls back to kill if wait
fails) → join threads.

src/util/utf.zig: empty stub with a Phase 1 note. The helper will be
implemented if a future ZLS version forces UTF-16 (Phase 0 confirmed
UTF-8 today).
src/main.zig now uses Zig 0.16's `pub fn main(init: std.process.Init)`
signature so future phases can pull `init.gpa`, `init.io`, and
`init.environ_map` straight from the runtime instead of constructing
them manually.

Phase 1 behavior: parse ZLS_MCP_LOG env var → set log level → emit a
single info line → exit. The MCP stdio loop arrives in Phase 3.

Verified manually:
  ./zig-out/bin/zls-mcp           → "[info] [main] zls-mcp v0.0.0 ..."
  ZLS_MCP_LOG=warn ./zls-mcp      → (no output, info filtered)
  ZLS_MCP_LOG=debug ./zls-mcp     → info message shown
tests/integration_test.zig drives a full Client.start → assert
capabilities → Client.stop round trip against an actual zls binary.
The test is skipped via error.SkipZigTest if `zls` is not on PATH so
a fresh checkout without ZLS still passes `zig build test`.

Asserted capabilities (Phase 0 confirmed in PROBE_REPORT.md):
  - positionEncoding == "utf-8"
  - workspace_symbol_provider, hover_provider, definition_provider,
    references_provider, document_symbol_provider all true
  - call_hierarchy_provider, diagnostic_provider both false
    (documents what we're working around server-side)
  - server_info.name / version populated

tests/fixtures/phase1_minimal/ is a trivial Zig project (build.zig +
main.zig) used as the LSP workspace folder.

Infrastructure:
- src/test_root.zig renamed to src/root.zig with `pub const`
  re-exports so external test targets can `@import("zlsmcp")` and
  reach `Client`, `pathToUri`, etc.
- build.zig declares the root module and creates a second test
  target rooted at tests/integration_test.zig with the `zlsmcp`
  import wired up.
- client.zig: reader thread distinguishes expected EOF (during
  shutdown) from a real crash by checking stop_flag.

30/30 tests pass locally on macOS aarch64 with ZLS 0.16.0.
Inline chained `else if` expressions onto a single line per zig fmt's
current style. Pre-existing formatting drift surfaced by adding the
`zig fmt --check` step in CI.
.github/workflows/ci.yml runs `zig build`, `zig build test`, and
`zig fmt --check` on a matrix of {ubuntu-latest, macos-latest,
windows-latest} with Zig 0.16.0 (via mlugg/setup-zig) and ZLS 0.16.0
installed from the official zigtools/zls GitHub release tarballs:

  Linux x86_64   : zls-x86_64-linux.tar.xz
  macOS aarch64  : zls-aarch64-macos.tar.xz
  Windows x86_64 : zls-x86_64-windows.zip

The integration test self-skips via error.SkipZigTest if zls is not
on PATH, so this matrix can run unattended on contributors' forks
without breaking. The "test step" infrastructure in build.zig was
introduced earlier in the phase; this commit only adds the CI layer
on top.
The reader and stderr loops hold a pointer into self.child via
&client.child.? captured at thread entry. Nulling self.child before
the threads observe stop_flag and exit was a UAF window. Reorder so
that:
  1. stop_flag = true
  2. kill (or wait, in stop()) — closes pipes, unblocks readers
  3. join both threads
  4. only now: self.child = null
The previous errdefer chain ordered kill after thread joins, deadlocking
on errors that occur between reader spawn and stderr spawn (the reader
sits on readFrame because pipes are open). Replace with a single
errdefer that calls terminate(), which now (after the previous commit)
performs the steps in the correct order: stop_flag, kill, join, null.
Previously, when self.crashed became true and the reader had already
called failAllPending (which removes slots from the registry), the
caller's branch `if (self.takeSlot(id)) |owned| destroy(owned)` would
hit null and return without freeing the slot. Replace the manual
destroy chain with a single defer covering all exit paths, plus a
companion defer that removes from the registry idempotently.

Also simplifies the encode/write error handling to plain `try` since
both defers handle cleanup uniformly.
@intcast on a negative i64 or one above u32 max is UB in safe mode.
Drop the message with a debug log instead of crashing.
Both surfaced in PR #1 review. Document them in-code so future readers
don't reinvent the diagnosis:

- readerLoop: the reader parses each response body twice (once into
  client.gpa for id routing, once into the slot's arena for caller
  ownership). A future optimization could peek the id with a streaming
  parser, then parse once into the right arena.
- pathToUri: backslashes in POSIX paths are converted to `/` rather
  than percent-encoded as `%5C`, so a POSIX filename containing `\`
  does not round-trip.
guysenpai added 3 commits May 21, 2026 06:31
v1 queries ziglang.org/builds, which only hosts dev/master tarballs.
Tagged releases like 0.16.0 live under ziglang.org/download/<version>/.
v2 uses the correct endpoint and the official download index, fixing
the 404 from every mirror that broke CI on all three OS runners.
GitHub Windows runners default to core.autocrlf=true, so files are
checked out with CRLF. Zig's formatter normalizes to LF, which made
`zig fmt --check` flag every file on the Windows job. Pinning LF in
.gitattributes keeps the working tree consistent across platforms
and unblocks the format gate.
Use our own pinned fork of the Zig installer action so we control its
update cadence and aren't exposed to upstream behavioural changes.
@guysenpai guysenpai merged commit 377d0ec into main May 21, 2026
3 checks passed
@guysenpai guysenpai deleted the phase-1-lsp-codec branch May 21, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant