Self-host an entire production-grade AI stack with ONE command. LLMs β’ Vector DB β’ Web UI β’ Workflow Automation β’ RAG β’ Voice β all local, all yours.
Quick Start β’ What's Inside β’ Architecture β’ Use Cases
You don't have to choose between convenience and privacy. You don't have to pay OpenAI $0.03 per request for a chatbot. You don't have to leak company data through an API.
Run your own GPT β at home, on-prem, or in your private cloud.
git clone https://github.com/kasimmj/local-ai-stack
cd local-ai-stack
./start.shThat's it. You now have a ChatGPT clone at http://localhost:3000, a vector database at http://localhost:6333, a workflow editor at http://localhost:5678, and a model API at http://localhost:11434.
| Service | Purpose | Port |
|---|---|---|
| π¦ Ollama | Run LLMs locally (Llama, Mistral, Qwen, DeepSeek...) | 11434 |
| π¬ Open WebUI | ChatGPT-style UI, RAG, voice, image, multi-user | 3000 |
| π Qdrant | Production vector database for embeddings/RAG | 6333 |
| π n8n | Visual workflow automation (1000+ integrations) | 5678 |
| ποΈ Postgres | Persistent storage for n8n and your apps | 5432 |
| β‘ Redis | Fast cache + job queue | 6379 |
| π Caddy | Auto-HTTPS reverse proxy (optional) | 80/443 |
- Docker 24+ and Docker Compose v2
- 16GB RAM minimum (32GB recommended for larger models)
- ~50GB free disk space
git clone https://github.com/kasimmj/local-ai-stack
cd local-ai-stack
cp .env.example .env
./start.shOpen http://localhost:3000 β create your admin account β start chatting.
docker exec -it ollama ollama pull llama3.2:3b # Fast & small (2GB)
docker exec -it ollama ollama pull qwen2.5:7b # Great quality (4GB)
docker exec -it ollama ollama pull deepseek-r1:8b # Reasoning (5GB)./stop.sh # Graceful shutdown
./reset.sh # Nuke everything (delete data) ββββββββββββββββββββββββββββββββββββββββββββ
β Caddy (Reverse Proxy) β
ββββββ¬βββββββββββββββββ¬βββββββββββββ¬ββββββββ
β β β
ββββββββΌβββββββ ββββββββΌββββββ ββββΌββββββ
β Open WebUI β β n8n β β Qdrant β
β :3000 β β :5678 β β :6333 β
ββββββββ¬βββββββ ββββββββ¬ββββββ ββββββββββ
β β
β βββββββββββββββ΄βββ
β β Postgres β
β β :5432 β
β ββββββββββββββββββ
ββββββββΌβββββββ
β Ollama β
β :11434 β
βββββββββββββββ
All services share a private Docker network. Only Open WebUI, n8n, and Qdrant are exposed by default β everything else stays internal.
Replace ChatGPT Teams with an internal AI that knows your docs, never leaks data.
Index thousands of papers and chat with them β no API costs, full reproducibility.
Train on your knowledge base, deploy via n8n webhooks to WhatsApp/Telegram/Slack.
Your conversations, your data, your model. No telemetry.
Bring AI to areas with limited internet β fully offline after first install.
Edit .env:
DEFAULT_MODEL=qwen2.5:7b
EMBEDDING_MODEL=nomic-embed-textOpen WebUI already supports RTL out of the box. Just pick Ψ§ΩΨΉΨ±Ψ¨ΩΨ© in Settings β Interface.
DOMAIN=ai.yourcompany.com ./start.sh --with-caddyCaddy will auto-provision a Let's Encrypt certificate.
docker exec -it ollama ollama pull <model-name>
# Then refresh Open WebUI β it appears in the model picker.| Model size | RAM | Disk | Speed (RTX 4090) |
|---|---|---|---|
| 3B | 4GB | 2GB | 80 tok/s |
| 7B | 8GB | 4GB | 50 tok/s |
| 13B | 16GB | 8GB | 28 tok/s |
| 34B | 32GB | 20GB | 12 tok/s |
| 70B | 64GB | 40GB | 6 tok/s |
CPU-only mode is supported (slower).
Drop YAML files into extensions/ to add more services:
voice/β Whisper STT + Piper TTS for voice chatvision/β Stable Diffusion for image generationsearch/β SearXNG for web-grounded answersmonitoring/β Grafana + Loki for observability
./start.sh --enable voice,vision,searchβ οΈ Default credentials are random per-install (stored in.env)β οΈ Never expose ports directly to public internet without Caddy + auth- β All inter-service traffic is on a private Docker network
- β No telemetry, no analytics, no external calls (unless you add them)
PRs welcome! Especially:
- Additional Ollama model preset bundles
- n8n workflow templates
- Extensions for new services (TTS, vision, scrapers)
- Translations for Open WebUI
MIT Β© 2026 Kasim Mohammed