texvec

Local-first text similarity search.

日本語 · Install · Quick Start · Models · Development

texvec is an open-source CLI for text similarity search. It summarizes documents locally with ONNX models, embeds both summaries and overlapping document chunks, stores the vectors in a local libsql database, and ranks matches with cosine distance.

Built by Arcnem AI, texvec reflects how we like to ship applied AI tools: local-first, inspectable, and useful without a cloud control plane.

Why texvec

Local summaries, local embeddings, local storage. Your documents stay on your machine.
Simple CLI workflow. init, summarize, embed, search, and list are enough to get useful results quickly.
No external vector database. Similarity search runs from a local libsql database.
Practical text indexing. texvec stores both a document summary embedding and chunk embeddings so searches can match the gist or a specific section.

Install

Download a release asset from GitHub Releases, or install with Go:

go install github.com/arcnem-ai/texvec@latest

To build from source:

git clone https://github.com/arcnem-ai/texvec.git
cd texvec
go build -o texvec

Release archives are expected to follow the same primary targets as picvec: macOS (arm64) and Linux (amd64).

Quick Start

texvec init
texvec summarize test_texts/galaxies.md
texvec embed test_texts/galaxies.md
texvec search --text "dark matter in spiral galaxies"
texvec list

texvec init downloads ONNX Runtime, creates ~/.texvec/, initializes the local database, and fetches the default summary and embedding models.

Commands

Command	What it does
`init`	Download ONNX Runtime and the default models
`summarize [document]`	Generate and print a summary without writing to the database
`embed [document]`	Summarize, chunk, embed, and store a document
`search [document]`	Find similar indexed documents
`search --text "..."`	Search from raw text
`list`	List indexed documents
`set-embedding-model [name]`	Set the default embedding model
`set-summary-model [name]`	Set the default summary model
`config`	Show current configuration
`clean`	Remove all `texvec` data

Global flag:

-v, --verbose enables extra runtime output

Common Examples

Preview a summary:

texvec summarize notes.md
texvec summarize notes.md --summary-model flan-t5-small

texvec summarize is preview-only. It does not write to the database.

Index a document:

texvec embed notes.md
texvec embed notes.md -m bge-small-en-v1.5
texvec embed notes.md --summary-model flan-t5-small

If the document content hash is unchanged, texvec reuses existing summary and chunk data where possible.

Search for similar documents:

texvec search notes.md
texvec search --text "barred spiral galaxy dark matter"
texvec search --text "barred spiral galaxy dark matter" -k 10
texvec search notes.md -m bge-small-en-v1.5

Results are sorted by cosine distance. When searching with an already indexed document path, texvec excludes that same path from the results.

Flag	Description	Default
`-k, --limit`	Number of results	5
`-m, --model`	Embedding model to use	Config default
`--summary-model`	Summary model to use for long-query reduction	Config default

List indexed documents:

texvec list
texvec list -m all-minilm-l6-v2
texvec list -k 20

Flag	Description	Default
`-k, --limit`	Max documents to show	All
`-m, --model`	Filter by embedding model	All

Change defaults:

texvec set-embedding-model bge-small-en-v1.5
texvec set-summary-model flan-t5-small

Models

Embedding Models

Name	Embedding Dim	Notes
`all-minilm-l6-v2`	384	Default. Fast and good for general-purpose retrieval.
`bge-small-en-v1.5`	384	Retrieval-focused model with a query prefix for search.

Summary Models

Name	Notes
`flan-t5-small`	Default summary model for `1.0.0`. Small, local, and easy to ship in a plain Go CLI.

Models are downloaded from Hugging Face on first use and stored locally under ~/.texvec/models/.

How It Works

A supported text document is loaded from .txt, .md, or .markdown.
texvec computes a content hash to determine whether indexing work needs to be refreshed.
The selected summary model generates a document summary.
The selected embedding model embeds both the summary and overlapping chunks from the original document.
Search compares the query embedding against stored summary embeddings and chunk embeddings.
texvec merges those scores and returns document-level results ordered by cosine distance.

Data Storage

All runtime data lives in ~/.texvec/:

~/.texvec/
  config.json       # Configuration such as the default models
  texvec.db         # libsql database
  models/           # Downloaded ONNX model files and tokenizer assets
  lib/              # ONNX Runtime shared library

Use texvec clean to remove everything.

Repository Layout

cmd/ Cobra commands and user-facing output
core/ Model registry, runtime setup, downloads, text chunking, summarization, and embedding pipeline
store/ Schema migration, inserts, listing, and vector search queries
config/ ~/.texvec path helpers and config bootstrapping
test_texts/ Sample documents for manual testing

Platforms

OS	Published Release	Hardware Acceleration
macOS	`arm64`	CPU
Linux	`amd64`	CPU

texvec currently defaults to CPU execution for predictable CLI behavior and cleaner output across machines.

ONNX Runtime 1.24.3 is downloaded automatically on first run.

Development

go test ./...
go build -o texvec

See CONTRIBUTING.md for contribution workflow and AGENTS.md for repo-specific agent instructions.

Built by Arcnem AI.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
cmd		cmd
config		config
core		core
store		store
test_texts		test_texts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.ja.md		README.ja.md
README.md		README.md
arcnem-logo.svg		arcnem-logo.svg
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

texvec

Why texvec

Install

Quick Start

Commands

Common Examples

Models

Embedding Models

Summary Models

How It Works

Data Storage

Repository Layout

Platforms

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

texvec

Why texvec

Install

Quick Start

Commands

Common Examples

Models

Embedding Models

Summary Models

How It Works

Data Storage

Repository Layout

Platforms

Development

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages