Local-first text similarity search.
日本語 · Install · Quick Start · Models · Development
texvec is an open-source CLI for text similarity search. It summarizes documents locally with ONNX models, embeds both summaries and overlapping document chunks, stores the vectors in a local libsql database, and ranks matches with cosine distance.
Built by Arcnem AI, texvec reflects how we like to ship applied AI tools: local-first, inspectable, and useful without a cloud control plane.
- Local summaries, local embeddings, local storage. Your documents stay on your machine.
- Simple CLI workflow.
init,summarize,embed,search, andlistare enough to get useful results quickly. - No external vector database. Similarity search runs from a local libsql database.
- Practical text indexing. texvec stores both a document summary embedding and chunk embeddings so searches can match the gist or a specific section.
Download a release asset from GitHub Releases, or install with Go:
go install github.com/arcnem-ai/texvec@latestTo build from source:
git clone https://github.com/arcnem-ai/texvec.git
cd texvec
go build -o texvecRelease archives are expected to follow the same primary targets as picvec: macOS (arm64) and Linux (amd64).
texvec init
texvec summarize test_texts/galaxies.md
texvec embed test_texts/galaxies.md
texvec search --text "dark matter in spiral galaxies"
texvec listtexvec init downloads ONNX Runtime, creates ~/.texvec/, initializes the local database, and fetches the default summary and embedding models.
| Command | What it does |
|---|---|
init |
Download ONNX Runtime and the default models |
summarize [document] |
Generate and print a summary without writing to the database |
embed [document] |
Summarize, chunk, embed, and store a document |
search [document] |
Find similar indexed documents |
search --text "..." |
Search from raw text |
list |
List indexed documents |
set-embedding-model [name] |
Set the default embedding model |
set-summary-model [name] |
Set the default summary model |
config |
Show current configuration |
clean |
Remove all texvec data |
Global flag:
-v, --verboseenables extra runtime output
Preview a summary:
texvec summarize notes.md
texvec summarize notes.md --summary-model flan-t5-smalltexvec summarize is preview-only. It does not write to the database.
Index a document:
texvec embed notes.md
texvec embed notes.md -m bge-small-en-v1.5
texvec embed notes.md --summary-model flan-t5-smallIf the document content hash is unchanged, texvec reuses existing summary and chunk data where possible.
Search for similar documents:
texvec search notes.md
texvec search --text "barred spiral galaxy dark matter"
texvec search --text "barred spiral galaxy dark matter" -k 10
texvec search notes.md -m bge-small-en-v1.5Results are sorted by cosine distance. When searching with an already indexed document path, texvec excludes that same path from the results.
| Flag | Description | Default |
|---|---|---|
-k, --limit |
Number of results | 5 |
-m, --model |
Embedding model to use | Config default |
--summary-model |
Summary model to use for long-query reduction | Config default |
List indexed documents:
texvec list
texvec list -m all-minilm-l6-v2
texvec list -k 20| Flag | Description | Default |
|---|---|---|
-k, --limit |
Max documents to show | All |
-m, --model |
Filter by embedding model | All |
Change defaults:
texvec set-embedding-model bge-small-en-v1.5
texvec set-summary-model flan-t5-small| Name | Embedding Dim | Notes |
|---|---|---|
all-minilm-l6-v2 |
384 | Default. Fast and good for general-purpose retrieval. |
bge-small-en-v1.5 |
384 | Retrieval-focused model with a query prefix for search. |
| Name | Notes |
|---|---|
flan-t5-small |
Default summary model for 1.0.0. Small, local, and easy to ship in a plain Go CLI. |
Models are downloaded from Hugging Face on first use and stored locally under ~/.texvec/models/.
- A supported text document is loaded from
.txt,.md, or.markdown. - texvec computes a content hash to determine whether indexing work needs to be refreshed.
- The selected summary model generates a document summary.
- The selected embedding model embeds both the summary and overlapping chunks from the original document.
- Search compares the query embedding against stored summary embeddings and chunk embeddings.
- texvec merges those scores and returns document-level results ordered by cosine distance.
All runtime data lives in ~/.texvec/:
~/.texvec/
config.json # Configuration such as the default models
texvec.db # libsql database
models/ # Downloaded ONNX model files and tokenizer assets
lib/ # ONNX Runtime shared library
Use texvec clean to remove everything.
cmd/Cobra commands and user-facing outputcore/Model registry, runtime setup, downloads, text chunking, summarization, and embedding pipelinestore/Schema migration, inserts, listing, and vector search queriesconfig/~/.texvecpath helpers and config bootstrappingtest_texts/Sample documents for manual testing
| OS | Published Release | Hardware Acceleration |
|---|---|---|
| macOS | arm64 |
CPU |
| Linux | amd64 |
CPU |
texvec currently defaults to CPU execution for predictable CLI behavior and cleaner output across machines.
ONNX Runtime 1.24.3 is downloaded automatically on first run.
go test ./...
go build -o texvecSee CONTRIBUTING.md for contribution workflow and AGENTS.md for repo-specific agent instructions.
Built by Arcnem AI.