A modular Python framework for building and operating production AI pipelines — including end-to-end RAG, agentic tool use, a REST API, a CLI, and an evaluation harness — built on the Anthropic API, Voyage AI, and LangChain.
Ragnar is a personal AI engineering project focused on hands-on implementation of the core patterns underlying modern LLM applications. Rather than a monolithic application, it is intentionally structured as a set of focused, composable modules — each isolating a specific capability so it can be understood, tested, and extended independently.
The project reflects a deliberate approach to learning production AI engineering: build each layer cleanly, then compose them into a system.
chatbot.py — The baseline. Teaches stateful multi-turn conversation with the Anthropic API. Key insight: Claude has no memory — the full message history must be sent on every request.
stream.py — Extends the chatbot with real-time token streaming. Teaches that tokens can be received and displayed as they arrive, which is critical for responsive UX in production AI apps.
system.py — Teaches system prompt engineering. The Socratic mentor persona demonstrates how to fundamentally reshape model behavior without changing code — just the system prompt. Extracted as a module-level constant for easy swapping and testing.
structured.py — Teaches structured extraction. Uses stop sequences and prompt engineering to produce clean JSON from a model that naturally wants to add prose — foundational for any pipeline where downstream code consumes the model's response.
rag-embedding.py — Teaches what an embedding is: text converted into a vector of numbers encoding semantic meaning. The foundation of all retrieval-based systems.
rag-similarity-search.py — Teaches how retrieval works. Embeds a query and a corpus, computes cosine similarity, and ranks results — demonstrating that semantic search captures meaning rather than keywords.
lang-chain.py — Teaches document ingestion and chunking. Loads .txt, .pdf, and .docx files and splits them using RecursiveCharacterTextSplitter with configurable overlap to preserve cross-boundary context.
rag-pipeline.py — The end-to-end RAG pipeline as a RAGPipeline class. Demonstrates the two-phase pattern every RAG system uses: index time (load, chunk, embed once) and query time (embed query, retrieve, generate). Includes full observability via observability.py.
agent.py — A tool-use agent that routes questions autonomously to RAG search or arithmetic calculation. Teaches the core agentic pattern: the model reasons and decides what to call; your code executes. Includes a safe AST-based calculator to avoid code injection.
api.py — A FastAPI service wrapping the RAG pipeline. Teaches how to serve an AI pipeline over HTTP. The index is built once at startup and shared across all requests — the same pattern used in production inference services. Exposes /query, /health, and /stats endpoints.
cli.py — A command-line client for the API. Teaches the separation between API surface and client — the CLI knows nothing about RAG or embeddings, it just speaks HTTP. Supports single-question, interactive, and stats/health modes.
eval.py — An evaluation harness that benchmarks retrieval accuracy and answer correctness against a ground-truth test suite. Teaches that RAG quality degrades silently — an eval loop catches regressions before they reach users.
observability.py — Shared instrumentation layer. Tracks token usage, response latency, retrieval scores, and tool calls across all pipeline components. Imported by rag-pipeline.py, agent.py, and eval.py.
Document Ingestion & Chunking (lang-chain.py)
|
Vector Embedding - index time (rag-embedding.py)
|
Semantic Similarity Search - query time (rag-similarity-search.py)
|
Context-Augmented Generation (rag-pipeline.py)
|
+-----+------+
REST API Agent
(api.py) (agent.py)
|
CLI Client
(cli.py)
|
Evaluation Harness (eval.py)
| Layer | Technology |
|---|---|
| LLM | Anthropic API (Claude Sonnet 4) |
| Embeddings | Voyage AI (voyage-3-large) |
| Document Loading | LangChain Community |
| Text Splitting | LangChain Text Splitters |
| Similarity Search | NumPy (cosine similarity) |
| REST API | FastAPI + Uvicorn |
| CLI | argparse + requests |
| Environment | Python 3.12+, python-dotenv |
Prerequisites: Python 3.12+, an Anthropic API key, and a Voyage AI API key.
# Install dependencies
pip install uv
uv sync
# Configure environment
cp .env.example .env
# Add your keys to .env:
# ANTHROPIC_API_KEY=your_key_here
# VOYAGE_API_KEY=your_key_hereThese run as a conversation loop. Type quit, exit, or bye to stop. Type stats during a session to see live metrics.
uv run rag-pipeline.py # Full RAG pipeline - recommended starting point
uv run agent.py # Tool-use agent (RAG + calculator)
uv run chatbot.py # Basic multi-turn chatbot
uv run stream.py # Streaming chatbot
uv run system.py # Systems design mentoruv run api.py # Starts server at http://localhost:8000
# Endpoints: POST /query | GET /health | GET /statsuv run cli.py "What are your store hours?" # Single question
uv run cli.py --verbose "How do returns work?" # With observability metrics
uv run cli.py # Interactive mode
uv run cli.py --health # Server health check
uv run cli.py --stats # Session metricsThese run once and print their output.
uv run structured.py # Structured extraction (CloudFormation JSON)
uv run rag-embedding.py # Generate a vector embedding
uv run rag-similarity-search.py # Semantic similarity across a sample corpus
uv run lang-chain.py # Document ingestion and chunking
uv run eval.py # Run the evaluation harness
uv run eval.py --export results.json # Export results to JSONEach module is kept deliberately minimal — no unnecessary abstraction, no shared state between components. The goal is clarity: anyone reading the code should understand exactly what pattern it demonstrates and how to extend it. This is the same principle applied to production platform engineering: own each layer cleanly before composing them.
- End-to-end RAG pipeline (ingestion -> embedding -> retrieval -> generation)
- Agentic tool-use with RAG search and safe calculation
- REST API with observability endpoints
- CLI client for the API surface
- Evaluation harness with ground-truth test suite
- Session-level observability (token usage, latency, retrieval scoring)
- Persistent vector store integration (e.g. Pinecone, pgvector)
- Multi-document retrieval with metadata filtering
- MCP-compatible interface for agent-to-agent interoperability
- Fine-tuning pipeline for domain-specific model adaptation
- Benchmarking suite with variance analysis across chunk sizes and models