Local Embedding Knowledge Base — Step-by-Step Guide

This project demonstrates a full-stack, free, local vector search pipeline using:

Prisma (TypeScript ORM)
PostgreSQL with the pgvector extension
sentence-transformers (local Python embedding model)

No OpenAI API key or paid service is required. All embeddings and searches run on your machine.

1. Prerequisites

Node.js (v18+ recommended)
Python (3.8+)
Docker (for running PostgreSQL with pgvector)

2. Setup Instructions

a. Install Node and Python dependencies

npm install
pip install sentence-transformers torch numpy

b. Start PostgreSQL with pgvector (Docker)

docker run -d \
  --name postgres-pgvector \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=Russelle0 \
  -e POSTGRES_DB=adc_db \
  -p 5433:5432 \
  pgvector/pgvector:pg17

c. Configure environment

Copy .env.example to .env and ensure:

DATABASE_URL="postgresql://postgres:Russelle0@localhost:5433/adc_db?schema=public"

No OpenAI key is needed for local mode.

d. Apply database migrations

npx prisma migrate dev --name init
npx prisma migrate dev --name fix_vector_dim

3. Usage: Seeding and Searching

a. Seed the knowledge base

This reads docs/myProject.md, splits it into ~500-character chunks, generates embeddings locally, and stores them in the database:

npm run seed

b. Run a similarity search

This queries the database for the top 3 most relevant snippets to your question:

npm run search

You can edit the query string in src/searchSimilar.ts to test different questions.

4. How It Works

Embedding: src/seedVectors.ts calls a local Python script (py_embed.py) using the all-MiniLM-L6-v2 model (384-dim vectors).
Storage: Each text chunk and its embedding are stored in the CodeSnippet table (embedding column is vector(384)).
Search: src/searchSimilar.ts embeds your query locally, then runs a cosine similarity search using pgvector's <=> operator.

5. What You Can Learn

How to use Prisma with custom Postgres types (pgvector)
How to run a local embedding model for free (no API limits)
How to build a vector search pipeline end-to-end
How to use Docker for local database development
How to structure a TypeScript + Python hybrid workflow

6. Troubleshooting

Python errors: Make sure you have Python 3.8+ and all dependencies installed in your venv.
Docker errors: Ensure Docker Desktop is running and port 5433 is free.
Database errors: If you change the embedding dimension, update both the Prisma schema and run a new migration.

7. Switching Back to OpenAI (Optional)

If you want to use OpenAI embeddings instead, revert the generateEmbedding function in both seedVectors.ts and searchSimilar.ts to use the OpenAI API, and update the database column to vector(1536).

8. Next Steps & Experiments

Try adding your own markdown or text files to docs/ and re-seed.
Experiment with different queries in searchSimilar.ts.
Explore the CodeSnippet table with npx prisma studio.
Try other local embedding models (see Hugging Face for options).

Enjoy building and learning with your own local vector search stack!

If you want to switch back to OpenAI, revert the generateEmbedding function in seedVectors.ts and searchSimilar.ts to use the OpenAI API.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
docs		docs
prisma		prisma
public		public
springboot-rag		springboot-rag
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
agentAI_usage.txt		agentAI_usage.txt
devsteps.md		devsteps.md
package-lock.json		package-lock.json
package.json		package.json
springbootReplication.md		springbootReplication.md
start-all.sh		start-all.sh
start-node-all.bat		start-node-all.bat
start-node.sh		start-node.sh
start-springboot-all.bat		start-springboot-all.bat
start-springboot.sh		start-springboot.sh
status.sh		status.sh
stop-all.sh		stop-all.sh
stop-node-all.bat		stop-node-all.bat
stop-node.sh		stop-node.sh
stop-springboot-all.bat		stop-springboot-all.bat
stop-springboot.sh		stop-springboot.sh
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Embedding Knowledge Base — Step-by-Step Guide

1. Prerequisites

2. Setup Instructions

a. Install Node and Python dependencies

b. Start PostgreSQL with pgvector (Docker)

c. Configure environment

d. Apply database migrations

3. Usage: Seeding and Searching

a. Seed the knowledge base

b. Run a similarity search

4. How It Works

5. What You Can Learn

6. Troubleshooting

7. Switching Back to OpenAI (Optional)

8. Next Steps & Experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local Embedding Knowledge Base — Step-by-Step Guide

1. Prerequisites

2. Setup Instructions

a. Install Node and Python dependencies

b. Start PostgreSQL with pgvector (Docker)

c. Configure environment

d. Apply database migrations

3. Usage: Seeding and Searching

a. Seed the knowledge base

b. Run a similarity search

4. How It Works

5. What You Can Learn

6. Troubleshooting

7. Switching Back to OpenAI (Optional)

8. Next Steps & Experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages