██████╗██╗██╗ ██╗
██╔════╝██║╚██╗██╔╝
██║ ██║ ╚███╔╝
██║ ██║ ██╔██╗
╚██████╗██║██╔╝ ██╗
╚═════╝╚═╝╚═╝ ╚═╝ Code IndeX
Search your codebase by meaning, not just text. Self-hosted, embeddings-based, works with any agent or terminal.
cix search "authentication middleware"
cix search "database retry logic" --in ./api --lang go
cix symbols "UserService" --kind classGrep and fuzzy file search work fine for small projects. At scale they break down:
- You have to know what a thing is called to find it
- Results flood with noise from unrelated files
- Agents waste tokens scanning files that aren't relevant
cix indexes your code into a vector store using CodeRankEmbed — a model purpose-built for code retrieval. Search queries return ranked snippets with file paths and line numbers, not raw file lists.
cix CLI (Go)
├── init → register project + index + start file watcher
├── search → semantic search (embeddings)
├── symbols → symbol lookup by name (SQLite)
├── files → file path search
├── summary → project overview
├── reindex → manual reindex trigger
└── watch → fsnotify daemon → auto reindex on changes
cix-server (Go) — server/
├── llama-server (llama.cpp sidecar) → embeddings (CodeRankEmbed Q8_0 GGUF, 768d)
├── chromem-go → vector store (cosine similarity)
├── gotreesitter → AST chunking (200+ languages)
└── modernc.org/sqlite → project metadata, symbols, file hashes
The server is a pure-Go static binary. The CLI is a thin Go binary that talks to it over HTTP.
The llama-server sidecar (from upstream llama.cpp) handles embeddings — the Go process starts it as a child process and communicates via Unix socket.
Three deployment options:
| Mode | Best for | GPU acceleration | Prerequisites |
|---|---|---|---|
| Docker (CPU) | any OS, development | none | Docker |
| Docker (CUDA) | NVIDIA GPU servers | CUDA | Docker, NVIDIA Container Toolkit |
| Native (macOS) | Apple Silicon — full Metal GPU | Metal | Go 1.24+, Xcode CLT |
git clone https://github.com/dvcdsys/code-index && cd code-index
cp .env.example .env
# Edit .env — set CIX_API_KEY to a random string
docker compose up -dcurl http://localhost:21847/health # → {"status": "ok"}See GPU Acceleration (CUDA) section below.
docker compose -f docker-compose.cuda.yml up -dWhy not Docker? Docker Desktop on macOS runs containers inside a Linux VM — Metal GPU is not accessible from within a container. For full Apple Silicon GPU acceleration you must run the server natively.
Prerequisites: Go 1.24+, Xcode Command Line Tools
xcode-select --install # if not already installedStep 1 — Build binary + download Metal-enabled llama-server (once)
cd server
make bundle
# Outputs:
# dist/cix-darwin-arm64/cix-server
# dist/cix-darwin-arm64/llama/llama-server (includes libggml-metal.dylib)Step 2 — Configure
cp .env.example .env
# Edit .env — set at minimum:
# CIX_API_KEY=cix_<your-random-key>
# CIX_N_GPU_LAYERS=99 ← offload all layers to MetalStep 3 — Run
cd server && make run
# Reads .env from repo root, sets CIX_LLAMA_BIN_DIR automatically.curl http://localhost:21847/health # → {"status": "ok"}| Variable | Recommended | Notes |
|---|---|---|
CIX_N_GPU_LAYERS |
99 |
Offload all layers to Metal; 0 = CPU only |
CIX_LLAMA_BIN_DIR |
set by make run |
Path to the llama-server binary dir |
CIX_EMBEDDINGS_ENABLED |
true |
Enable GPU embeddings (default) |
Tip
make run always runs make bundle first (no-op if already built), so it's safe to use after any git pull.
Auto-start with launchd (optional — run server in the background on login):
cat > ~/Library/LaunchAgents/com.cix.server.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"><dict>
<key>Label</key><string>com.cix.server</string>
<key>ProgramArguments</key>
<array><string>/ABSOLUTE/PATH/TO/server/dist/cix-darwin-arm64/cix-server</string></array>
<key>EnvironmentVariables</key>
<dict>
<key>CIX_API_KEY</key><string>YOUR_KEY</string>
<key>CIX_LLAMA_BIN_DIR</key><string>/ABSOLUTE/PATH/TO/server/dist/cix-darwin-arm64/llama</string>
<key>CIX_N_GPU_LAYERS</key><string>99</string>
<key>CIX_PORT</key><string>21847</string>
<key>CIX_SQLITE_PATH</key><string>/Users/YOUR_USER/.cix/data/sqlite/projects.db</string>
<key>CIX_CHROMA_PERSIST_DIR</key><string>/Users/YOUR_USER/.cix/data/chroma</string>
<key>CIX_GGUF_CACHE_DIR</key><string>/Users/YOUR_USER/.cix/data/models</string>
</dict>
<key>RunAtLoad</key><true/>
<key>KeepAlive</key><true/>
<key>StandardOutPath</key><string>/tmp/cix-server.log</string>
<key>StandardErrorPath</key><string>/tmp/cix-server.err</string>
</dict></plist>
EOF
# Replace /ABSOLUTE/PATH/TO and YOUR_USER/YOUR_KEY with real values, then:
launchctl load ~/Library/LaunchAgents/com.cix.server.plist
launchctl start com.cix.serverOption A: one-line installer (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/dvcdsys/code-index/main/install.sh | bashOption B: from source
cd cli
make build && make install # → /usr/local/bin/cixOr without Make:
cd cli && go build -o cix . && sudo mv cix /usr/local/bin/# Point cix at your server (API key is in .env)
cix config set api.url http://localhost:21847
cix config set api.key $(grep CIX_API_KEY .env | cut -d= -f2)cd /path/to/your/project
cix init # registers, indexes, starts file watcher daemon
cix status # wait until: Status: ✓ Indexedcix search "authentication middleware"
cix search "error handling" --in ./api
cix symbols "handleRequest" --kind function
cix files "config"
cix summary| Command | Description |
|---|---|
cix init [path] |
Register + index + start file watcher |
cix status |
Show indexing status and progress |
cix list |
List all indexed projects |
cix reindex [--full] |
Trigger manual reindex |
cix summary |
Project overview: languages, directories, symbols |
# Semantic search — natural language, finds by meaning
cix search <query> [flags]
--in <path> restrict to file or directory (repeatable)
--lang <language> filter by language (repeatable)
--limit, -l <n> max results (default: 10)
--min-score <0-1> minimum relevance score (default: 0.1)
-p <path> project path (default: cwd)
# Symbol search — fast lookup by name
cix symbols <name> [flags]
--kind <type> function | class | method | type (repeatable)
--limit, -l <n> max results (default: 20)
# File search
cix files <pattern> [--limit <n>]cix watch [path] # start background daemon
cix watch --foreground # run in terminal (Ctrl+C to stop)
cix watch stop # stop daemon
cix watch status # check if runningThe watcher monitors the project with fsnotify, debounces events (5s), and triggers incremental reindexing automatically. Logs: ~/.cix/logs/watcher.log.
cix config show # print current config
cix config set <key> <val> # set a value
cix config path # show config file locationConfig file: ~/.cix/config.yaml
| Key | Default | Description |
|---|---|---|
api.url |
http://localhost:21847 |
API server URL |
api.key |
— | Bearer token for API auth (required) |
watcher.debounce_ms |
5000 |
Delay in ms before reindex is triggered after a file change |
indexing.batch_size |
20 |
Number of files sent to the server per indexing batch |
cix is designed to be called by AI agents (Claude, GPT, Cursor, custom agents) as a shell tool. Agents run cix search instead of Grep/Glob — getting ranked, relevant snippets rather than raw file dumps.
Install the bundled skill so Claude knows to use cix automatically:
cp -r skills/cix ~/.claude/skills/cixThen in any Claude Code session:
/cix
This loads search guidance into context. Claude will use cix search instead of Grep.
To activate in every session without typing /cix, add to ~/.claude/CLAUDE.md:
## Code search
Use `cix` for all code search instead of Grep/Glob:
- `cix search "query"` — semantic search by meaning
- `cix symbols "Name" --kind function` — find symbol definitions
- `cix files "pattern"` — find files by path
- `cix summary` — project overview
Run `cix init` on first use in a project.Same pattern — give the agent access to shell execution and describe the commands:
Tool: shell
Usage: cix search "what you're looking for" [--in ./subdir] [--lang python]
Returns: ranked code snippets with file paths and line numbers
# First time in a project
cix init /path/to/project
# Explore
cix summary
cix search "main entry point"
# Find specific code
cix search "JWT token validation"
cix symbols "ValidateToken" --kind function
# Navigate
cix search "who calls ValidateToken"
cix search "error handling in auth flow" --in ./apiChunking — tree-sitter parses code into semantic chunks (functions, classes, methods). Unsupported languages fall back to a sliding window (2000 chars, 256 char overlap).
Supported languages: Python, TypeScript, JavaScript, Go, Rust, Java (+ 40+ others via fallback).
Embeddings — each chunk is encoded with a GGUF build of CodeRankEmbed (default: awhiteside/CodeRankEmbed-Q8_0-GGUF; 768d, 8192 token context, ~145MB on disk) via the llama-server sidecar (llama.cpp). Queries get a "Represent this query for searching relevant code: " prefix for asymmetric retrieval.
Incremental reindex — uses SHA256 file hashes. Only new or changed files are re-embedded. Deleted files are removed from the index.
Filtering — respects .gitignore and .cixignore, skips common dirs (node_modules, .git, .venv, etc.), skips files >512KB and empty files. Per-project configuration via .cixconfig.yaml (see below).
Works exactly like .gitignore (same syntax, same nesting rules). Place it in the project root or any subdirectory. Patterns from .cixignore are merged with .gitignore — you don't need to duplicate rules.
Use .cixignore when you want to exclude files from the index that are not excluded by .gitignore (e.g., vendored code, generated files, large test fixtures).
# .cixignore
api/smart-contracts/
generated/
*.pb.go
testdata/fixtures/Nested .cixignore files work like nested .gitignore — they apply to their directory and below, without affecting sibling directories.
The file watcher automatically triggers a full reindex when .cixignore is created, modified, or deleted.
Place this file in the project root. Currently supports automatic git submodule exclusion.
# .cixconfig.yaml
ignore:
submodules: true # automatically exclude all git submodule pathsWhen ignore.submodules is true, cix reads .gitmodules and excludes all submodule paths from indexing. No git binary is required — the file is parsed directly.
This is useful for projects with Foundry/Forge dependencies, vendored submodules, or any repo where submodules contain thousands of files you don't want indexed.
Example: a project with 228 own files and 3,400+ files in nested submodules — after adding ignore.submodules: true, only the 228 project files are indexed.
The file watcher triggers a full reindex when .cixconfig.yaml changes.
See .env.example for a complete template.
| Variable | Default | Description |
|---|---|---|
CIX_API_KEY |
— | Bearer token for API auth |
CIX_PORT |
21847 |
API server port |
CIX_EMBEDDING_MODEL |
awhiteside/CodeRankEmbed-Q8_0-GGUF |
HuggingFace GGUF repo |
CIX_MAX_FILE_SIZE |
524288 |
Skip files larger than this (bytes) |
CIX_EXCLUDED_DIRS |
node_modules,.git,.venv,... |
Comma-separated dirs to skip |
CIX_N_GPU_LAYERS |
auto | 99 offloads all layers to GPU; 0 forces CPU |
CIX_GGUF_CACHE_DIR |
/data/models |
Where the GGUF file is cached |
CIX_LLAMA_BIN_DIR |
/app |
Directory containing llama-server binary |
CIX_LLAMA_STARTUP_TIMEOUT |
60 |
Seconds to wait for llama-server ready |
CIX_EMBEDDINGS_ENABLED |
true |
Set to false to skip embeddings (CPU-only mode) |
CIX_CHROMA_PERSIST_DIR |
/data/chroma |
Vector store path |
CIX_SQLITE_PATH |
/data/sqlite/projects.db |
SQLite database path |
Data is stored in /data inside the container — mount a volume to persist it.
| Local (native) | Docker (CPU) | CUDA | |
|---|---|---|---|
| Memory (idle) | ~1GB | ~1GB | ~1GB |
| Memory (indexing) | up to 2GB | up to 2GB | up to 2GB |
| CPU | no limit | CPUS env var (default: 2) |
unlimited |
| GPU | Metal (Apple Silicon) | none | NVIDIA CUDA |
| Disk | ~/.cix/data/ (~50-200MB/project) |
same | same |
| Auto-restart | no (use launchd/systemd) | yes | yes |
The server ships with awhiteside/CodeRankEmbed-Q8_0-GGUF — a Q8-quantized build of CodeRankEmbed (137M params, 768 dims, ~145MB on disk, ~650MB idle VRAM/RAM). Inference runs via the llama-server sidecar (llama.cpp), so only GGUF repositories are supported. Plain PyTorch/sentence-transformers repos will not work.
To switch models:
- Stop the server (
make server-local-stopormake server-docker-stop). - Set
EMBEDDING_MODELin.envto a Hugging Face repo that contains a.gguffile, for example:# code-specialised (default) EMBEDDING_MODEL=awhiteside/CodeRankEmbed-Q8_0-GGUF # smaller general-purpose alternative EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5-GGUF
- (Optional) Pre-cache the new model into the Docker image:
docker compose build --build-arg EMBEDDING_MODEL=<repo>. - Start the server and re-index your projects.
Note
ChromaDB and SQLite paths are suffixed by a sanitised form of the model name (e.g. projects.db_awhiteside_coderankembed_q8_0_gguf). This isolates vector spaces per model, so switching back and forth keeps old indices intact and avoids dim-mismatch errors.
Tip
Apple Silicon: Docker cannot access Metal GPU — run natively with cd server && make run (see Native macOS (Apple Silicon — Metal GPU) above). The bundled llama-server includes libggml-metal.dylib; set CIX_N_GPU_LAYERS=99 for full Metal offload.
Linux NVIDIA: use the CUDA image (docker-compose.cuda.yml). Force CPU with CIX_N_GPU_LAYERS=0.
docker compose up -d # start (CPU)
docker compose -f docker-compose.cuda.yml up -d # start (CUDA)
docker compose logs -f # tail logs
docker compose down # stopDeveloper builds (from source):
cd server && make build # build cix-server binary
cd server && make bundle # build + fetch llama-server
cd server && make test-gate # parity gate (requires GGUF)
make docker-build-cuda # build + push CUDA imagedocker login
make docker-build-cuda # builds + pushes server/Dockerfile.cuda → dvcdsys/code-index:go-cu128Pre-built images on Docker Hub:
| Tag | Architecture | Use case |
|---|---|---|
dvcdsys/code-index:latest |
linux/amd64 + linux/arm64 | CPU, CIX_EMBEDDINGS_ENABLED=false |
dvcdsys/code-index:cu128 |
linux/amd64 | NVIDIA GPU (CUDA 12.8), full embeddings |
dvcdsys/code-index:0.2-python-legacy |
linux/amd64 | Frozen Python build, rollback only |
See doc/DOCKER_TAGS.md for the full tag lifecycle policy.
All endpoints except /health require Authorization: Bearer <api_key>.
GET /health # liveness check
GET /api/v1/status # service status
POST /api/v1/projects # create project
GET /api/v1/projects # list projects
GET /api/v1/projects/{id} # project details
DELETE /api/v1/projects/{id} # delete project + index
POST /api/v1/projects/{id}/index # trigger indexing
GET /api/v1/projects/{id}/index/status # indexing progress
POST /api/v1/projects/{id}/index/cancel # cancel indexing
POST /api/v1/projects/{id}/search # semantic search
POST /api/v1/projects/{id}/search/symbols # symbol search
POST /api/v1/projects/{id}/search/files # file path search
GET /api/v1/projects/{id}/summary # project overviewAPI key not set
cix config set api.key $(grep CIX_API_KEY /path/to/code-index/.env | cut -d= -f2)connection refused
curl http://localhost:21847/health # check if server is up
docker compose up -d # start (CPU)
docker compose -f docker-compose.cuda.yml up -d # start (CUDA)project not found
cix init /path/to/projectWatcher not triggering reindex
cix watch status
cat ~/.cix/logs/watcher.log
cix watch stop && cix watch /path/to/projectSearch returns no results
- Check project is indexed:
cix status - Lower the threshold:
cix search "query" --min-score 0.05 - Docker mode: run
cix listto verify the project is registered
Cross-platform binaries are built with:
cd cli
make release VERSION=v0.1.0This produces archives for macOS and Linux (amd64 + arm64) in cli/dist/, plus a checksums.txt. Upload them to a GitHub Release and the install.sh installer will pick up the latest version automatically.
Supported targets: darwin-arm64, darwin-amd64, linux-arm64, linux-amd64.
A CUDA-enabled image is available for servers with NVIDIA GPUs. Inference runs on GPU automatically — no configuration needed.
With the GGUF backend the footprint is near-constant: weights (~200-250 MB) plus
the pre-allocated context (n_ctx=8192, ~200-400 MB) give a ~0.5-0.7 GB
idle draw. Embedding calls do not spike VRAM the way fp16 PyTorch attention
used to — sequence length and batch size only change latency, not peak memory.
MAX_CHUNK_TOKENS still caps the length of each code chunk (1 token ≈ 4 chars)
and must stay ≤ n_ctx (8192). MAX_EMBEDDING_CONCURRENCY should stay at 1
for single-GPU setups — llama.cpp serialises through one context.
See doc/vram-profiling.md for methodology and numbers.
Docker Hub: dvcdsys/code-index:cu128
Tags: cu128 (stable) and v<version>-cu128 (pinned). Image size: ~1.66 GB
(3-stage build: nvidia/cuda:12.8.1-base + libcublas + llama-server binaries + Go binary).
See doc/DOCKER_TAGS.md for the full tag lifecycle.
Host requirements:
- NVIDIA GPU with driver >= 520 (CUDA 12.x compatible)
- NVIDIA Container Toolkit installed on the host
Docker Compose:
docker compose -f docker-compose.cuda.yml up -dPortainer: use portainer-stack-cuda.yml — deploy as a new stack with API_KEY env variable set.
MIT