π Simple-RAG-API
A lightweight Retrieval-Augmented Generation (RAG) API that runs locally with a local LLM (via Ollama). Built for simplicity and ease of experimentation β ideal for learning how to build vector-based search + generation without cloud APIs.
π§ͺ You get a minimal RAG backend that loads a local model, ingests text, and serves responses via HTTP in a Docker container β no cloud bills.
π Features
π§ Local LLM support using Ollama
π Document embedding + retrieval pipeline
π οΈ Simple REST API powered by FastAPI (likely; adjust if you didnβt use FastAPI)
π³ Docker support for easy deployment
π§ͺ Zero external dependencies (no OpenAI keys, no paid APIs)
π¦ Clean folder structure (e.g., app.py, embed.py)
π₯ Quick Start ποΈ Clone it git clone https://github.com/Sekiro4321/Simple-RAG-API.git cd Simple-RAG-API
π³ Build the Docker image docker build -t simple-rag-api .
π Run the API docker run -p 8000:8000 simple-rag-api
Now your API should be live at http://localhost:8000.
π‘ How It Works (High-Level)
Hereβs the idea β no rocket science:
Embed incoming text using an embedding model in embed.py
Store vectors in memory (or a lightweight store)
Run a local LLM with Ollama to answer queries based on the nearest embeddings
Serve JSON responses through an HTTP API in app.py
This is basically a local RAG pipeline β simple, fast, and offline-friendly.
π API Endpoints
π― Adjust these if your code uses different routes β just drop in the actual ones.
Method Endpoint Description GET /health Check server status POST /embed Submit text to create embeddings POST /query Ask a question and get a RAG-powered reply π οΈ Configuration
No external API keys needed!
Just ensure:
Ollama is installed and running locally
The local model is available for your container
Example .env (if used):
MODEL_NAME=your_local_model_here PORT=8000
(If youβre not using .env, keep configs in a config.py or similar.)
π¦ Project Structure Simple-RAG-API/ βββ app.py # RAG API server code βββ embed.py # Embedding & vector logic βββ Dockerfile # Container image build βββ k8s.txt # Kubernetes example (optional) βββ README.md # This file
π§ͺ Development
Want to work on the code locally?
Create a Python virtual env:
python3 -m venv venv source venv/bin/activate
Install deps:
pip install -r requirements.txt
Run locally:
uvicorn app:app --reload --host 0.0.0.0 --port 8000
Then test with curl or Postman. π
π§ What This Is Great For
π Learning how RAG works end-to-end
π Experimenting with local LLMs (no API bills)
π οΈ Building prototypes that donβt depend on cloud
π§βπ» Portfolio project to demo RAG basics