vecna — Vectors na Vectors

vecna is an OpenAI- and Google-compatible HTTP proxy that sits between your application and a local or remote embedding model. It forwards text to the backing model, receives the raw embedding vector, and re-shapes it to the dimension your vector database or application expects — without any changes to your client code.

New to embeddings? → docs/what_is_embeddings.md

Every time I install a tool that needs vector embeddings, it assumes OpenAI's dimensions. There are very few local models that match those exact dimensions. Most tools don't give you an option to change them, and some vector databases are hardcoded to fixed dimensions entirely. Many open source models do stick to the same dimensions, which helps — but I run embedding models across several machines and it's always been a pain to use them effectively. This is where vecna helps me.

Why

Most vector databases are initialized with a fixed dimension (e.g. 1536 for pgvector defaults, 768 for many HNSW indexes). If you switch embedding models — or run a smaller local model that produces 768-dim vectors — your existing index breaks.

vecna solves this by translating dimensions at the proxy layer:

Downscale: 3072 → 1536, 768 → 256 (truncation or random projection)
Upscale: 384 → 1536, 768 → 1536 (zero-padding)
Same-dim pass-through: no transformation, just proxy and auth

All output vectors are L2-normalized so cosine similarity remains valid.

Install

go install github.com/Warky-Devs/vecna.git/cmd/vecna@latest

Or build from source:

make build        # outputs ./bin/vecna

Quick start

# Interactive setup: discovers local servers, configures adapter, writes config
vecna onboard

# Start the proxy
vecna serve

# Test a request (OpenAI-compatible)
curl -s http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "hello world", "model": "nomic-embed-text"}'

Commands

Command	Description
`vecna onboard`	Interactive wizard: discover servers → detect dims → configure → test → write config
`vecna serve`	Start the proxy server
`vecna query <text>`	Embed text and print the resulting vector as JSON
`vecna convert`	Convert vectors from file/stdin using the configured adapter
`vecna search`	Scan LAN for embedding servers and add one to config
`vecna models`	List models available on each configured forwarder
`vecna test`	Test each configured endpoint; `--remove-broken` prunes failing ones
`vecna editconfig`	Print config path and open it in `$EDITOR`

Query

Send text directly to a forwarding target and print the adapted vector as JSON.

# uses forward.default target
vecna query "hello world"

# specific target
vecna query --target ollama "hello world"

# skip the adapter — raw model output
vecna query --raw "hello world"

# compact single-line output (pipe-friendly)
vecna query --compact "hello world"

# read text from stdin
echo "hello world" | vecna query -

# inspect a single dimension
vecna query --compact "hello world" | jq '.[0]'

# save to file
vecna query --compact "hello world" > vector.json

Status info (target, model, dims, tokens) is written to stderr; stdout is clean JSON.

Configuration

Default config path: ~/vecna.json (created by onboard or editconfig).

Override with --config path/to/file.yaml or env vars prefixed VECNA_.

{
  "server": {
    "port": 8080,
    "host": "0.0.0.0",
    "api_keys": ["sk-vecna-abc123"]
  },
  "forward": {
    "default": "ollama",
    "targets": {
      "ollama": {
        "api_type": "openai",
        "model": "nomic-embed-text",
        "api_key": "",
        "timeout_secs": 30,
        "cooldown_secs": 60,
        "priority_decay": 2,
        "priority_recovery": 5,
        "endpoints": [
          {"url": "http://localhost:11434", "priority": 10}
        ]
      }
    }
  },
  "adapter": {
    "type": "truncate",
    "source_dim": 768,
    "target_dim": 1536,
    "truncate_mode": "from_end",
    "pad_mode": "at_end"
  },
  "extra_maps": {
    "512":  { "target_dim": 512 },
    "256":  { "target_dim": 256, "type": "random", "seed": 42 },
    "fast": { "target_dim": 768, "forward_target": "small-model" }
  },
  "metrics": {
    "enabled": true,
    "path": "/metrics",
    "api_key": ""
  }
}

Important: quality and consistency warnings

Upscaling reduces quality — use the highest-dimension model you can

Upscaling (e.g. 768 → 1536) does not add information. The extra dimensions are zeros (truncate adapter) or linear combinations of existing values (random/projection). The resulting vectors occupy a 1536-dim space but carry no more semantic content than the original 768-dim ones.

The model's native output dimension is the ceiling of quality. If your vector database requires 1536 dims, use a model that natively produces 1536 dims. Use vecna's upscaling only as a compatibility shim when you cannot change the index schema — not as a way to improve retrieval quality.

Downscale (higher → lower): small, controlled quality loss. Acceptable for MRL models.
Upscale (lower → higher): no quality gain, only compatibility. Replace the model when possible.

Changing any adapter setting requires regenerating all stored embeddings

vecna's adapter is applied at query time and at indexing time. If you change any of the following, every vector already stored in your database is now in a different space and comparisons against new queries will silently return wrong results:

type (truncate / random / projection)
source_dim or target_dim
truncate_mode (from_end / from_start)
pad_mode (at_end / at_start)
seed (random adapter)
the backing model itself

When you change adapter settings: stop ingestion, re-embed your entire corpus through vecna with the new settings, repopulate the index, then resume.

There is no partial migration path — a mixed index produces degraded or incorrect search results.

Adapter types

Type	Description
`truncate`	Slice or zero-pad the vector. Fast, deterministic. Best for MRL-trained models.
`random`	Seeded Gaussian projection matrix. Preserves distances (Johnson-Lindenstrauss).
`projection`	Learned linear matrix from a JSON file. Highest quality, requires pre-training.

Extra maps

extra_maps lets you expose multiple adapter configurations on a single vecna instance. Each entry is a named AdapterConfig whose unset fields fall back to the global adapter values.

"adapter": { "type": "truncate", "source_dim": 1024, "target_dim": 1536 },
"extra_maps": {
  "512":        { "target_dim": 512 },
  "256":        { "target_dim": 256, "type": "random", "seed": 42 },
  "openai-alt": { "target_dim": 1536, "forward_target": "openai" }
}

Route	Forwarder	Adapter
`POST /v1/embeddings`	global default	global `adapter`
`POST /map/512/v1/embeddings`	global default	`extra_maps["512"]` — target 512, rest from global
`POST /map/256/v1/embeddings`	global default	`extra_maps["256"]` — random projection to 256
`POST /map/openai-alt/v1/embeddings`	`openai` target	`extra_maps["openai-alt"]` adapter

All fields are overridable per map entry:

Field	Description
`forward_target`	Named target from `forward.targets`; empty = global default
`type`	`truncate` / `random` / `projection`
`source_dim`	Source dimension; falls back to global `adapter.source_dim`
`target_dim`	Target dimension
`truncate_mode`	`from_end` / `from_start`
`pad_mode`	`at_end` / `at_start`
`seed`	Seed for random projection
`matrix_file`	Path to projection matrix JSON

The same re-embedding warning applies per map — changing any setting for an extra_maps entry requires re-embedding all vectors indexed through that endpoint.

Truncation and padding modes

`truncate_mode` — which part of the vector is kept when downscaling

Value	Keeps
`from_end` (default)	first N dimensions
`from_start`	last N dimensions

from_end — use for Matryoshka Representation Learning (MRL) models. The most important information is packed into the first dimensions. Models: nomic-embed-text, mxbai-embed-large, text-embedding-3-small, text-embedding-3-large, snowflake-arctic-embed, e5-mistral-7b-instruct.

from_start — use when task-specific information is at the end of the vector. Try this if from_end gives poor retrieval on a non-MRL model. Models: some fine-tuned BERT variants, domain-specific models with task heads appended after base dimensions.

`pad_mode` — where zeros are inserted when upscaling

Value	Zeros go
`at_end` (default)	after the real values
`at_start`	before the real values

at_end — almost always correct. Keeps the original vector in the first N positions.

at_start — use if your index expects meaningful content at the end of the vector.

Common combinations

Scenario	`truncate_mode`	`pad_mode`
MRL model downscale	`from_end`	`at_end`
MRL model upscale (e.g. 768→1536)	`from_end`	`at_end`
Non-MRL BERT fine-tune	`from_start`	`at_end`
Custom index with leading-zeros convention	`from_end`	`at_start`

When unsure, run vecna test before and after and compare the reported L2 norm.

API endpoints

OpenAI-compatible

POST /v1/embeddings
Authorization: Bearer <api_key>
Content-Type: application/json

{"input": "text or array of texts", "model": "nomic-embed-text"}

Google Gemini-compatible

POST /v1/models/{model}:embedContent
POST /v1/models/{model}:batchEmbedContents

Extra-map routes

Serve the same backing model with a different adapter per endpoint. The {mapping} segment matches a key in extra_maps.

POST /map/{mapping}/v1/embeddings
POST /map/{mapping}/v1/models/{model}:embedContent
POST /map/{mapping}/v1/models/{model}:batchEmbedContents

All extra-map routes require the same authentication as the standard API routes.

OpenAPI spec and docs

GET /openapi.yaml
GET /docs

Response tracing headers

Header	Value
`X-Vecna-Forward-Ms`	Time waiting on the backing model
`X-Vecna-Translate-Ms`	Time in the adapter
`X-Vecna-Total-Ms`	Total request wall time

Prometheus metrics

Enable in config: metrics.enabled: true. Scrape at GET /metrics. Human-readable dashboard at GET /dashboard.

Metric	Type	Description
`vecna_requests_total`	counter	Requests served, by endpoint and status
`vecna_request_duration_seconds`	histogram	Total request wall time
`vecna_forward_duration_seconds`	histogram	Time waiting on the backing model
`vecna_translate_duration_seconds`	histogram	Time in the adapter
`vecna_endpoint_priority`	gauge	Current dynamic routing priority per endpoint
`vecna_endpoint_inflight`	gauge	Active in-flight requests per endpoint
`vecna_endpoint_errors_total`	counter	Forwarding failures by error type
`vecna_tokens_total`	counter	Tokens consumed, by target, model, and type (`prompt`/`total`)

Dashboard

GET /dashboard renders a live HTML view of all metrics. Counters show request counts with status-code badges, histograms show p50/p95/p99 latencies, gauges show current endpoint priority and inflight counts.

Auth: if metrics.api_key is set, both /metrics and /dashboard require that key (Bearer token) and ignore server-level api_keys. If metrics.api_key is blank, both routes are fully public — no auth headers are checked.

Development

make build           # compile
make test            # unit tests
make lint            # golangci-lint
make fmt             # goimports + gofmt

# Integration tests against a live server
make test-integration TEST_URL=http://localhost:11434 TEST_MODEL=nomic-embed-text

# Tag and push a release
make release-version BUMP=patch   # patch | minor | major

Docker

Build

docker build -t vecna .

First-time setup with docker compose

cp docker-compose.example.yml docker-compose.yml
docker compose up -d

Starts vecna and an Ollama instance. The vecna_config named volume persists the config across container rebuilds.

Onboard (interactive setup)

docker compose run --rm -it vecna onboard

Ollama is reachable by hostname on the Docker network — the scanner will find it automatically. After onboarding, restart the proxy:

docker compose restart vecna

Query

docker compose run --rm vecna query --compact "hello world"

Test endpoints

# report latency and dims
docker compose run --rm vecna test

# test and remove failing endpoints
docker compose run --rm vecna test --remove-broken

Edit config manually

docker compose run --rm -it vecna sh -c "vi /config/vecna.json"

With Prometheus

docker compose --profile metrics up -d

Scrape config is in prometheus.example.yml. Set bearer_token if metrics.api_key is configured.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
cmd/vecna		cmd/vecna
docs		docs
pkg		pkg
tests/integration		tests/integration
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.example.yml		docker-compose.example.yml
go.mod		go.mod
go.sum		go.sum
prometheus.example.yml		prometheus.example.yml
vecna		vecna

Folders and files

Latest commit

History

Repository files navigation

vecna — Vectors na Vectors

Why

Install

Quick start

Commands

Query

Configuration

Important: quality and consistency warnings

Upscaling reduces quality — use the highest-dimension model you can

Changing any adapter setting requires regenerating all stored embeddings

Adapter types

Extra maps

Truncation and padding modes

truncate_mode — which part of the vector is kept when downscaling

pad_mode — where zeros are inserted when upscaling

Common combinations

API endpoints

OpenAI-compatible

Google Gemini-compatible

Extra-map routes

OpenAPI spec and docs

Response tracing headers

Prometheus metrics

Dashboard

Development

Docker

Build

First-time setup with docker compose

Onboard (interactive setup)

Query

Test endpoints

Edit config manually

With Prometheus

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`truncate_mode` — which part of the vector is kept when downscaling

`pad_mode` — where zeros are inserted when upscaling

Packages