Skip to content

Promote Go backend to primary (v0.3.0): migration, CLI review, /index/cancel#21

Merged
dvcdsys merged 15 commits intomainfrom
chore/go-primary
Apr 25, 2026
Merged

Promote Go backend to primary (v0.3.0): migration, CLI review, /index/cancel#21
dvcdsys merged 15 commits intomainfrom
chore/go-primary

Conversation

@dvcdsys
Copy link
Copy Markdown
Owner

@dvcdsys dvcdsys commented Apr 24, 2026

Summary

v0.3.0 release PR. Promotes the Go cix-server to primary backend (phases T1–T8), finalizes the CLI review, and adds the missing /index/cancel endpoint so the watcher can guard against stale sessions.

Go migration (earlier commits on this branch)

  • Python→Go server rewrite landed across phases 0–6 (see commit history)
  • Two Docker images: CPU (distroless/static-debian12, ~40 MB) and CUDA (distroless/cc-debian13 + CUDA libs from nvidia/cuda, ~1 GB), hardened to 0 HIGH/CRITICAL CVEs
  • make scout-cuda / make scout-cpu / make promote-cuda workflow added for safe pre-push scanning

CLI review — last-mile fixes (top commit)

Watcher silent-failure (P0):

  • Stderr INDEXING BROKEN after 3 retries so daemon mode stops hiding failures
  • Health probe every 30s + auto-resume on recovery
  • Stale-session guard: CancelIndex at Start() releases any server-side session left over from a crashed prior run
  • Foreground watcher exits non-zero when stopped while broken

Watcher macOS misses (P1):

  • Exclude editor temp files (.swp, .tmp, *~, .#*, 4913) at both extension and pattern level
  • Event-channel buffer 256 → 4096 with stderr overflow warning
  • Max-wait cap on debounce (10× debounce from first event) so long write streams actually flush
  • IsRegular() filter on Stat, while preserving Remove/Rename tracking
  • Default SyncIntervalMins 5 → 2

Server endpoint parity (P1):

  • POST /api/v1/projects/{path}/index/cancel implemented (idempotent: {cancelled: false} when no session). Unblocks the CLI stale-session guard.
  • Unit tests cover no-session + active-session-then-re-begin.

Client cleanup (P2):

  • ProjectSummary.TopDirectories/RecentSymbols typed (were []map[string]interface{})
  • Dead code removed: DeleteProject, TriggerIndex family

Test plan

  • go test ./... green in both cli/ and server/
  • go vet ./... clean
  • CI passes (release-server.yml + any PR checks)
  • make scout-cuda → 0 HIGH/CRITICAL CVEs on the CUDA image
  • make scout-cpu → 0 HIGH/CRITICAL CVEs on the CPU image
  • Watcher manual E2E: kill server mid-reindex, verify INDEXING BROKEN on stderr, bring server back, verify INDEXING RESTORED + pending changes flush
  • cix watch stale-session guard: cix reindex &; Ctrl-C; cix watch — no 409
  • After merge: bump server/cmd/cix-server/version.go to 0.3.0, tag server/v0.3.0, let release workflow publish :0.3.0, :latest, :0.3.0-cu128, :cu128

🤖 Generated with Claude Code

dvcdsys and others added 9 commits April 23, 2026 17:24
FileResultItem and FileSearchResponse used {path, language, files} but
the backend (both Python and Go ports) actually returns
{file_path, language, results}. Parsing was silently broken.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- T1: add backend=go to /api/v1/status; -v flag to cix-server
- T4: rename api-go-poc/ → server/, fix module path to
  github.com/dvcdsys/code-index/server, archive Python backend to
  legacy/python-api/, merge root Makefile to delegator
- T3: add doc/DOCKER_TAGS.md with tag lifecycle policy
- T5: rewrite .env.example with CIX_* vars; add
  vectorstore.DetectLegacyAndBackup for auto-migration from ChromaDB;
  add doc/MIGRATION_FROM_PYTHON.md
- T6: rewrite README architecture section (Go stack), update
  CONTRIBUTING.md and ONBOARDING.md to Go dev setup, add
  doc/DEPRECATION_POLICY.md
- T2: add .github/workflows/ci-go.yml (go test -race + vet + build);
  add release-server.yml (server/v* tag → Docker CPU multi-arch +
  CUDA cu128); replace pip-audit with govulncheck in security.yml
- T8: gitignore server/dist/ + server/exec.log; update docker-compose
  files to CIX_* env vars and cu128 image; update portainer-stack.yml

Python backend preserved in legacy/python-api/ until server/v0.4.0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
macOS Metal:
- server/Makefile: add `make run` target — reads .env from repo root,
  sets CIX_LLAMA_BIN_DIR to bundled llama-server, launches cix-server
- README.md: add "Native macOS (Apple Silicon — Metal GPU)" section with
  step-by-step guide, make run usage, and launchd plist template
- README.md: update Quick Start table to three deployment options
- README.md: replace stale llama-cpp-python references with llama-server sidecar
- README.md: fix Apple Silicon tip (Docker cannot access Metal GPU)

Docker Scout hardening (security):
- server/Dockerfile: add `apk upgrade` in builder stage to patch Alpine CVEs
- server/Dockerfile.cuda: add `apt-get upgrade -y`, drop `curl` (now using
  self-healthcheck), add non-root user cix:cix (uid/gid 1001), mkdir /data
  with correct ownership, switch to `USER cix:cix`
- server/Dockerfile.cuda, docker-compose.{yml,cuda}, portainer-stack*.yml:
  replace `curl -f /health` healthcheck with `/cix-server -healthcheck`
  (self-contained, no curl dependency in the runtime image)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nore)

The generic `cix-server` pattern in server/.gitignore was matching the
cmd/cix-server/ source directory, not just the compiled binary.
Replaced with the specific path /cmd/cix-server/cix-server so only the
executable is ignored.

Also adds the -healthcheck flag in main.go so HEALTHCHECK in Dockerfiles
can use /cix-server -healthcheck instead of curl.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Workflow for iterating before push:
  make scout-cuda               → builds on native x86 builder (cix-builder),
                                   pushes :scout-YYYYMMDD-HHMM, scans with
                                   docker scout --platform linux/amd64
  make promote-cuda SCOUT_TAG=… → imagetools retag → :go-cu128 + :cu128
  make scout-cpu                → local CPU image build + scout scan

Also wires docker-build-cuda to use BUILDER=cix-builder by default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions

Brings Docker Scout HIGH/CRITICAL count from 4H+12M+3L (19 fixable+unfixable)
down to 0 across all severities (8 LOW remaining are debian "won't-fix"
historical glibc/openssl CVEs from 2010-2019).

Changes:
- Dockerfile.cuda: rewrite final stage from nvidia/cuda:12.8.1-base-ubuntu24.04
  to gcr.io/distroless/cc-debian13:nonroot. CUDA libs (libcudart, libcublas,
  libcublasLt, libnccl, libgomp) are extracted from an intermediate
  nvidia/cuda stage via cp -d (preserves symlinks, keeps size at ~1.0 GB).
  Final image has no shell, apt, tar, dpkg, util-linux, shadow, or libgcrypt.
  Uses numeric USER 1001:1001 to match prior cix:cix runtime — no volume
  migration needed for existing deployments.
- Dockerfile (CPU): bump builder to golang:1.25-alpine.
- go.mod: Go 1.25, chi 5.2.4 (clears CVE-2026-32* + GHSA-vrw8-fxc6-2r93).
- Makefile: add --pull --provenance=mode=max --sbom=true to scout-cuda
  and docker-build-cuda. Provides Scout supply-chain attestations
  (lifts health score from C).
- doc/DOCKER_TAGS.md: document distroless rewrite, debian13 reasoning,
  symlink preservation note, and image size delta.

Verified on RTX 3090 host: ggml_cuda_init found 1 CUDA device, 13/13
layers offloaded to GPU, /health 200, llama-server spawns cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cleanup

cli/watcher (P0 silent-failure):
- surface INDEXING BROKEN to stderr after 3 retries so daemon mode no longer
  hides the failure in a log file
- probe /health every 30s while broken and auto-resume indexing on recovery
- stale-session guard: call CancelIndex at startup to release any server-side
  session left over from a crashed prior run (would otherwise 409 on /begin)
- foreground watcher exits non-zero when Stop is triggered while broken

cli/watcher (P1 macOS misses):
- exclude editor temp artefacts: .swp, .swx, .swo, .tmp, .bak + globs *~, .#*, 4913
- event buffer 256 -> 4096 with stderr overflow warning at 75% fill
- max-wait cap on debounce (10x debounce from first event) so long write
  streams flush instead of extending the timer indefinitely
- IsRegular() filter on Stat for symlink-cycle/socket/pipe events, while
  still tracking Remove/Rename events whose paths are gone
- default SyncIntervalMins 5 -> 2

server (P1 endpoint parity):
- implement POST /api/v1/projects/{path}/index/cancel (idempotent: returns
  {cancelled: false} when no session is active). Resolves the broken contract
  the CLI was already calling; required by the new stale-session guard.
- indexer.Service.CancelIndexing clears the in-memory session, marks the
  index_runs row cancelled, and restores the project to status=indexed.
- unit tests cover no-session + active-session-then-re-begin flows.

cli client cleanup:
- ProjectSummary.TopDirectories/RecentSymbols typed (was []map[string]interface{})
  so server schema drift surfaces as a JSON decode error instead of silent nulls
- CancelIndex returns (*CancelIndexResponse, error) with typed body
- remove dead code: DeleteProject (no CLI command consumed it), TriggerIndex
  family (pointed at a non-existent /index endpoint)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread server/bench/go.mod Fixed
Comment thread server/bench/go.mod Fixed
Comment thread server/bench/go.mod Fixed
Comment thread server/bench/go.mod Fixed
Comment thread server/bench/go.mod Fixed
Comment thread .github/workflows/ci-go.yml Fixed
dvcdsys and others added 5 commits April 25, 2026 00:03
govulncheck on CI failed against go 1.25.0 (17 stdlib CVEs — x509, tls,
net/url, encoding/asn1, encoding/pem, net/http cookie parsing).

- 1.25.9 is the latest patch in the 1.25 line and fixes every finding
- Dockerfile + Dockerfile.cuda already use `golang:1.25-alpine` which is
  a floating tag that resolves to the current 1.25.9 base — no Dockerfile
  change needed, the next image rebuild picks up the patched stdlib
- actions/setup-go uses `go-version-file: server/go.mod` so CI also
  follows this bump automatically

Local verification:
  $ govulncheck ./...
  No vulnerabilities found.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
server/bench is a Phase 0 PoC (chromem-go + gotreesitter benchmarks),
not part of the shipped cix-server binary. It pins
golang.org/x/net@v0.14.0 via its own go.mod + replace directive to a
local go-llama.cpp build dir, so bumping its deps on CI is not viable.

The GitHub-integrated Trivy check surfaced 1 HIGH + 4 MEDIUM x/net CVEs
against that sub-module and was blocking the PR even though the prod
attack surface is unaffected.

skip-dirs: server/bench tells trivy-action to leave it alone. The scan
still covers server/, cli/, and legacy/python-api/.
Root cause of the 2026-04-25 production boot crash
("cix-server: open db: apply schema: SQL logic error: no such column:
path_hash (1)"):

- Schema const had both `path_hash TEXT` (in CREATE TABLE IF NOT EXISTS)
  and `CREATE INDEX IF NOT EXISTS idx_projects_path_hash ON projects(path_hash)`
- On a pre-m7 database the `projects` table already exists WITHOUT the
  path_hash column, so the CREATE TABLE is a no-op (keeps the old shape)
  and the CREATE INDEX that follows fails — aborting Open before
  migratePathHash ever runs
- Tests only exercised ":memory:" which is always fresh, so the regression
  slipped through the T9 gate and only surfaced when prod's 2h-old
  pre-m7 DB was touched

Fix:
- Remove the CREATE INDEX from the Schema const (with a comment explaining
  why it lives elsewhere)
- Move index creation into migratePathHash, unconditionally after the
  ALTER TABLE ADD COLUMN (idempotent via IF NOT EXISTS so fresh DBs
  still get the index on first Open)
- Add TestOpenMigratesPreM7DB regression test that stages a pre-m7
  projects table and verifies Open migrates, backfills path_hash, and
  creates the index without crashing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- cli/cmd/config.go (CodeQL go/clear-text-logging, alert #2):
  `cix config show` previously printed `cix_abcdef123456...wxyz` for
  configured keys (masked but still leaked first 12 + last 4 chars =
  half the entropy of a 32-char hex key). CodeQL flagged this as
  clear-text logging of sensitive data.

  Replace masked rendering with length-only metadata
  ("(set, N chars)"). Users who need the actual value can read
  ~/.cix/config.yaml directly; the show command's job is to verify
  that the value is configured, not to disclose it.

- .github/workflows/ci-go.yml (CodeQL go/missing-permissions, alert #11):
  Workflow had no permissions block, so it inherited the full
  GITHUB_TOKEN scope of the repo. The job only runs vet/test/build —
  no writes, no SARIF, no publishes — so a `permissions: contents: read`
  block is sufficient. Brings ci-go.yml in line with the other
  workflows (release-cli, release-server, security) which already
  pin permissions explicitly.

The 5 Trivy alerts on server/bench/go.mod auto-closed after 36c105c
added skip-dirs: server/bench to the trivy-action config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous fix replaced masked-key display with "(set, %d chars)". CodeQL
still flagged it because len(cfg.API.Key) is a tainted-data access from
the secret field — the taint propagates through Sprintf to the Printf
sink even though the actual bytes never reach the log.

Render binary state only — "(set)" / "(not set)". No data derived from
the key bytes (not even length) reaches the logging sink, so the
go/clear-text-logging flow analysis has no source-to-sink path to
report.
Comment thread cli/cmd/config.go Fixed
Even after switching the value to a literal "(set)"/"(not set)",
CodeQL's go/clear-text-logging kept flagging the Printf because the
local variable was still NAMED `apiKey`. The query treats any
identifier matching the sensitive-name pattern (*Key, *Secret,
*Token, *Password, *Credentials) as a taint source, regardless of
what value it holds.

Renaming the local to `keyStatus` removes the name-based source
match. Combined with the previous binary "(set)/(not set)" change,
there is now neither a data-flow path nor a name match from the
secret to the logging sink.
@dvcdsys dvcdsys merged commit 597c530 into main Apr 25, 2026
8 checks passed
dvcdsys added a commit that referenced this pull request Apr 25, 2026
- cli/cmd/config.go (CodeQL go/clear-text-logging, alert #2):
  `cix config show` previously printed `cix_abcdef123456...wxyz` for
  configured keys (masked but still leaked first 12 + last 4 chars =
  half the entropy of a 32-char hex key). CodeQL flagged this as
  clear-text logging of sensitive data.

  Replace masked rendering with length-only metadata
  ("(set, N chars)"). Users who need the actual value can read
  ~/.cix/config.yaml directly; the show command's job is to verify
  that the value is configured, not to disclose it.

- .github/workflows/ci-go.yml (CodeQL go/missing-permissions, alert #11):
  Workflow had no permissions block, so it inherited the full
  GITHUB_TOKEN scope of the repo. The job only runs vet/test/build —
  no writes, no SARIF, no publishes — so a `permissions: contents: read`
  block is sufficient. Brings ci-go.yml in line with the other
  workflows (release-cli, release-server, security) which already
  pin permissions explicitly.

The 5 Trivy alerts on server/bench/go.mod auto-closed after 36c105c
added skip-dirs: server/bench to the trivy-action config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants