Skip to content

kiddogreed/adc-knowledge-base

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Embedding Knowledge Base — Step-by-Step Guide

This project demonstrates a full-stack, free, local vector search pipeline using:

  • Prisma (TypeScript ORM)
  • PostgreSQL with the pgvector extension
  • sentence-transformers (local Python embedding model)

No OpenAI API key or paid service is required. All embeddings and searches run on your machine.


1. Prerequisites

  • Node.js (v18+ recommended)
  • Python (3.8+)
  • Docker (for running PostgreSQL with pgvector)

2. Setup Instructions

a. Install Node and Python dependencies

npm install
pip install sentence-transformers torch numpy

b. Start PostgreSQL with pgvector (Docker)

docker run -d \
  --name postgres-pgvector \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=Russelle0 \
  -e POSTGRES_DB=adc_db \
  -p 5433:5432 \
  pgvector/pgvector:pg17

c. Configure environment

Copy .env.example to .env and ensure:

DATABASE_URL="postgresql://postgres:Russelle0@localhost:5433/adc_db?schema=public"

No OpenAI key is needed for local mode.

d. Apply database migrations

npx prisma migrate dev --name init
npx prisma migrate dev --name fix_vector_dim

3. Usage: Seeding and Searching

a. Seed the knowledge base

This reads docs/myProject.md, splits it into ~500-character chunks, generates embeddings locally, and stores them in the database:

npm run seed

b. Run a similarity search

This queries the database for the top 3 most relevant snippets to your question:

npm run search

You can edit the query string in src/searchSimilar.ts to test different questions.


4. How It Works

  1. Embedding: src/seedVectors.ts calls a local Python script (py_embed.py) using the all-MiniLM-L6-v2 model (384-dim vectors).
  2. Storage: Each text chunk and its embedding are stored in the CodeSnippet table (embedding column is vector(384)).
  3. Search: src/searchSimilar.ts embeds your query locally, then runs a cosine similarity search using pgvector's <=> operator.

5. What You Can Learn

  • How to use Prisma with custom Postgres types (pgvector)
  • How to run a local embedding model for free (no API limits)
  • How to build a vector search pipeline end-to-end
  • How to use Docker for local database development
  • How to structure a TypeScript + Python hybrid workflow

6. Troubleshooting

  • Python errors: Make sure you have Python 3.8+ and all dependencies installed in your venv.
  • Docker errors: Ensure Docker Desktop is running and port 5433 is free.
  • Database errors: If you change the embedding dimension, update both the Prisma schema and run a new migration.

7. Switching Back to OpenAI (Optional)

If you want to use OpenAI embeddings instead, revert the generateEmbedding function in both seedVectors.ts and searchSimilar.ts to use the OpenAI API, and update the database column to vector(1536).


8. Next Steps & Experiments

  • Try adding your own markdown or text files to docs/ and re-seed.
  • Experiment with different queries in searchSimilar.ts.
  • Explore the CodeSnippet table with npx prisma studio.
  • Try other local embedding models (see Hugging Face for options).

Enjoy building and learning with your own local vector search stack!


If you want to switch back to OpenAI, revert the generateEmbedding function in seedVectors.ts and searchSimilar.ts to use the OpenAI API.

About

Phase 2 of the Autonomous DevOps Companion (ADC) — a fully local, free RAG knowledge base built with Prisma, PostgreSQL/pgvector, sentence-transformers, Express, and Ollama/Llama3. No API keys required.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors