Skip to content

arcnem-ai/texvec

Repository files navigation

Arcnem AI

texvec

Local-first text similarity search.

日本語 · Install · Quick Start · Models · Development


texvec is an open-source CLI for text similarity search. It summarizes documents locally with ONNX models, embeds both summaries and overlapping document chunks, stores the vectors in a local libsql database, and ranks matches with cosine distance.

Built by Arcnem AI, texvec reflects how we like to ship applied AI tools: local-first, inspectable, and useful without a cloud control plane.

Why texvec

  • Local summaries, local embeddings, local storage. Your documents stay on your machine.
  • Simple CLI workflow. init, summarize, embed, search, and list are enough to get useful results quickly.
  • No external vector database. Similarity search runs from a local libsql database.
  • Practical text indexing. texvec stores both a document summary embedding and chunk embeddings so searches can match the gist or a specific section.

Install

Download a release asset from GitHub Releases, or install with Go:

go install github.com/arcnem-ai/texvec@latest

To build from source:

git clone https://github.com/arcnem-ai/texvec.git
cd texvec
go build -o texvec

Release archives are expected to follow the same primary targets as picvec: macOS (arm64) and Linux (amd64).

Quick Start

texvec init
texvec summarize test_texts/galaxies.md
texvec embed test_texts/galaxies.md
texvec search --text "dark matter in spiral galaxies"
texvec list

texvec init downloads ONNX Runtime, creates ~/.texvec/, initializes the local database, and fetches the default summary and embedding models.

Commands

Command What it does
init Download ONNX Runtime and the default models
summarize [document] Generate and print a summary without writing to the database
embed [document] Summarize, chunk, embed, and store a document
search [document] Find similar indexed documents
search --text "..." Search from raw text
list List indexed documents
set-embedding-model [name] Set the default embedding model
set-summary-model [name] Set the default summary model
config Show current configuration
clean Remove all texvec data

Global flag:

  • -v, --verbose enables extra runtime output

Common Examples

Preview a summary:

texvec summarize notes.md
texvec summarize notes.md --summary-model flan-t5-small

texvec summarize is preview-only. It does not write to the database.

Index a document:

texvec embed notes.md
texvec embed notes.md -m bge-small-en-v1.5
texvec embed notes.md --summary-model flan-t5-small

If the document content hash is unchanged, texvec reuses existing summary and chunk data where possible.

Search for similar documents:

texvec search notes.md
texvec search --text "barred spiral galaxy dark matter"
texvec search --text "barred spiral galaxy dark matter" -k 10
texvec search notes.md -m bge-small-en-v1.5

Results are sorted by cosine distance. When searching with an already indexed document path, texvec excludes that same path from the results.

Flag Description Default
-k, --limit Number of results 5
-m, --model Embedding model to use Config default
--summary-model Summary model to use for long-query reduction Config default

List indexed documents:

texvec list
texvec list -m all-minilm-l6-v2
texvec list -k 20
Flag Description Default
-k, --limit Max documents to show All
-m, --model Filter by embedding model All

Change defaults:

texvec set-embedding-model bge-small-en-v1.5
texvec set-summary-model flan-t5-small

Models

Embedding Models

Name Embedding Dim Notes
all-minilm-l6-v2 384 Default. Fast and good for general-purpose retrieval.
bge-small-en-v1.5 384 Retrieval-focused model with a query prefix for search.

Summary Models

Name Notes
flan-t5-small Default summary model for 1.0.0. Small, local, and easy to ship in a plain Go CLI.

Models are downloaded from Hugging Face on first use and stored locally under ~/.texvec/models/.

How It Works

  1. A supported text document is loaded from .txt, .md, or .markdown.
  2. texvec computes a content hash to determine whether indexing work needs to be refreshed.
  3. The selected summary model generates a document summary.
  4. The selected embedding model embeds both the summary and overlapping chunks from the original document.
  5. Search compares the query embedding against stored summary embeddings and chunk embeddings.
  6. texvec merges those scores and returns document-level results ordered by cosine distance.

Data Storage

All runtime data lives in ~/.texvec/:

~/.texvec/
  config.json       # Configuration such as the default models
  texvec.db         # libsql database
  models/           # Downloaded ONNX model files and tokenizer assets
  lib/              # ONNX Runtime shared library

Use texvec clean to remove everything.

Repository Layout

  • cmd/ Cobra commands and user-facing output
  • core/ Model registry, runtime setup, downloads, text chunking, summarization, and embedding pipeline
  • store/ Schema migration, inserts, listing, and vector search queries
  • config/ ~/.texvec path helpers and config bootstrapping
  • test_texts/ Sample documents for manual testing

Platforms

OS Published Release Hardware Acceleration
macOS arm64 CPU
Linux amd64 CPU

texvec currently defaults to CPU execution for predictable CLI behavior and cleaner output across machines.

ONNX Runtime 1.24.3 is downloaded automatically on first run.

Development

go test ./...
go build -o texvec

See CONTRIBUTING.md for contribution workflow and AGENTS.md for repo-specific agent instructions.


Built by Arcnem AI.

About

Local-first text similarity search

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages