Ragnar

A modular Python framework for building and operating production AI pipelines — including end-to-end RAG, agentic tool use, a REST API, a CLI, and an evaluation harness — built on the Anthropic API, Voyage AI, and LangChain.

Overview

Ragnar is a personal AI engineering project focused on hands-on implementation of the core patterns underlying modern LLM applications. Rather than a monolithic application, it is intentionally structured as a set of focused, composable modules — each isolating a specific capability so it can be understood, tested, and extended independently.

The project reflects a deliberate approach to learning production AI engineering: build each layer cleanly, then compose them into a system.

Modules

Foundation — Core API Patterns

chatbot.py — The baseline. Teaches stateful multi-turn conversation with the Anthropic API. Key insight: Claude has no memory — the full message history must be sent on every request.

stream.py — Extends the chatbot with real-time token streaming. Teaches that tokens can be received and displayed as they arrive, which is critical for responsive UX in production AI apps.

system.py — Teaches system prompt engineering. The Socratic mentor persona demonstrates how to fundamentally reshape model behavior without changing code — just the system prompt. Extracted as a module-level constant for easy swapping and testing.

structured.py — Teaches structured extraction. Uses stop sequences and prompt engineering to produce clean JSON from a model that naturally wants to add prose — foundational for any pipeline where downstream code consumes the model's response.

RAG — Individual Layers

rag-embedding.py — Teaches what an embedding is: text converted into a vector of numbers encoding semantic meaning. The foundation of all retrieval-based systems.

rag-similarity-search.py — Teaches how retrieval works. Embeds a query and a corpus, computes cosine similarity, and ranks results — demonstrating that semantic search captures meaning rather than keywords.

lang-chain.py — Teaches document ingestion and chunking. Loads .txt, .pdf, and .docx files and splits them using RecursiveCharacterTextSplitter with configurable overlap to preserve cross-boundary context.

Production System

rag-pipeline.py — The end-to-end RAG pipeline as a RAGPipeline class. Demonstrates the two-phase pattern every RAG system uses: index time (load, chunk, embed once) and query time (embed query, retrieve, generate). Includes full observability via observability.py.

agent.py — A tool-use agent that routes questions autonomously to RAG search or arithmetic calculation. Teaches the core agentic pattern: the model reasons and decides what to call; your code executes. Includes a safe AST-based calculator to avoid code injection.

api.py — A FastAPI service wrapping the RAG pipeline. Teaches how to serve an AI pipeline over HTTP. The index is built once at startup and shared across all requests — the same pattern used in production inference services. Exposes /query, /health, and /stats endpoints.

cli.py — A command-line client for the API. Teaches the separation between API surface and client — the CLI knows nothing about RAG or embeddings, it just speaks HTTP. Supports single-question, interactive, and stats/health modes.

eval.py — An evaluation harness that benchmarks retrieval accuracy and answer correctness against a ground-truth test suite. Teaches that RAG quality degrades silently — an eval loop catches regressions before they reach users.

observability.py — Shared instrumentation layer. Tracks token usage, response latency, retrieval scores, and tool calls across all pipeline components. Imported by rag-pipeline.py, agent.py, and eval.py.

Architecture

Document Ingestion & Chunking  (lang-chain.py)
           |
  Vector Embedding - index time  (rag-embedding.py)
           |
Semantic Similarity Search - query time  (rag-similarity-search.py)
           |
  Context-Augmented Generation  (rag-pipeline.py)
           |
     +-----+------+
  REST API      Agent
  (api.py)    (agent.py)
     |
  CLI Client
  (cli.py)
           |
  Evaluation Harness  (eval.py)

Tech Stack

Layer	Technology
LLM	Anthropic API (Claude Sonnet 4)
Embeddings	Voyage AI (`voyage-3-large`)
Document Loading	LangChain Community
Text Splitting	LangChain Text Splitters
Similarity Search	NumPy (cosine similarity)
REST API	FastAPI + Uvicorn
CLI	argparse + requests
Environment	Python 3.12+, `python-dotenv`

Setup

Prerequisites: Python 3.12+, an Anthropic API key, and a Voyage AI API key.

# Install dependencies
pip install uv
uv sync

# Configure environment
cp .env.example .env
# Add your keys to .env:
# ANTHROPIC_API_KEY=your_key_here
# VOYAGE_API_KEY=your_key_here

Running the Modules

Interactive modules

These run as a conversation loop. Type quit, exit, or bye to stop. Type stats during a session to see live metrics.

uv run rag-pipeline.py     # Full RAG pipeline - recommended starting point
uv run agent.py            # Tool-use agent (RAG + calculator)
uv run chatbot.py          # Basic multi-turn chatbot
uv run stream.py           # Streaming chatbot
uv run system.py           # Systems design mentor

API server

uv run api.py              # Starts server at http://localhost:8000
# Endpoints: POST /query  |  GET /health  |  GET /stats

CLI client (requires API server running)

uv run cli.py "What are your store hours?"      # Single question
uv run cli.py --verbose "How do returns work?"  # With observability metrics
uv run cli.py                                   # Interactive mode
uv run cli.py --health                          # Server health check
uv run cli.py --stats                           # Session metrics

Non-interactive modules

These run once and print their output.

uv run structured.py             # Structured extraction (CloudFormation JSON)
uv run rag-embedding.py          # Generate a vector embedding
uv run rag-similarity-search.py  # Semantic similarity across a sample corpus
uv run lang-chain.py             # Document ingestion and chunking
uv run eval.py                   # Run the evaluation harness
uv run eval.py --export results.json  # Export results to JSON

Design Philosophy

Each module is kept deliberately minimal — no unnecessary abstraction, no shared state between components. The goal is clarity: anyone reading the code should understand exactly what pattern it demonstrates and how to extend it. This is the same principle applied to production platform engineering: own each layer cleanly before composing them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ragnar

Overview

Modules

Foundation — Core API Patterns

RAG — Individual Layers

Production System

Architecture

Tech Stack

Setup

Running the Modules

Interactive modules

API server

CLI client (requires API server running)

Non-interactive modules

Design Philosophy

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
README.txt		README.txt
agent.py		agent.py
api.py		api.py
chatbot.py		chatbot.py
cli.py		cli.py
eval.py		eval.py
faq.txt		faq.txt
lang-chain.py		lang-chain.py
observability.py		observability.py
pyproject.toml		pyproject.toml
rag-embedding.py		rag-embedding.py
rag-pipeline.py		rag-pipeline.py
rag-similarity-search.py		rag-similarity-search.py
stream.py		stream.py
structured.py		structured.py
system.py		system.py

Folders and files

Latest commit

History

Repository files navigation

Ragnar

Overview

Modules

Foundation — Core API Patterns

RAG — Individual Layers

Production System

Architecture

Tech Stack

Setup

Running the Modules

Interactive modules

API server

CLI client (requires API server running)

Non-interactive modules

Design Philosophy

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages