Skip to content

AlexBasile123/ragnar

Repository files navigation

Ragnar

A modular Python framework for building and operating production AI pipelines — including end-to-end RAG, agentic tool use, a REST API, a CLI, and an evaluation harness — built on the Anthropic API, Voyage AI, and LangChain.


Overview

Ragnar is a personal AI engineering project focused on hands-on implementation of the core patterns underlying modern LLM applications. Rather than a monolithic application, it is intentionally structured as a set of focused, composable modules — each isolating a specific capability so it can be understood, tested, and extended independently.

The project reflects a deliberate approach to learning production AI engineering: build each layer cleanly, then compose them into a system.


Modules

Foundation — Core API Patterns

chatbot.py — The baseline. Teaches stateful multi-turn conversation with the Anthropic API. Key insight: Claude has no memory — the full message history must be sent on every request.

stream.py — Extends the chatbot with real-time token streaming. Teaches that tokens can be received and displayed as they arrive, which is critical for responsive UX in production AI apps.

system.py — Teaches system prompt engineering. The Socratic mentor persona demonstrates how to fundamentally reshape model behavior without changing code — just the system prompt. Extracted as a module-level constant for easy swapping and testing.

structured.py — Teaches structured extraction. Uses stop sequences and prompt engineering to produce clean JSON from a model that naturally wants to add prose — foundational for any pipeline where downstream code consumes the model's response.

RAG — Individual Layers

rag-embedding.py — Teaches what an embedding is: text converted into a vector of numbers encoding semantic meaning. The foundation of all retrieval-based systems.

rag-similarity-search.py — Teaches how retrieval works. Embeds a query and a corpus, computes cosine similarity, and ranks results — demonstrating that semantic search captures meaning rather than keywords.

lang-chain.py — Teaches document ingestion and chunking. Loads .txt, .pdf, and .docx files and splits them using RecursiveCharacterTextSplitter with configurable overlap to preserve cross-boundary context.

Production System

rag-pipeline.py — The end-to-end RAG pipeline as a RAGPipeline class. Demonstrates the two-phase pattern every RAG system uses: index time (load, chunk, embed once) and query time (embed query, retrieve, generate). Includes full observability via observability.py.

agent.py — A tool-use agent that routes questions autonomously to RAG search or arithmetic calculation. Teaches the core agentic pattern: the model reasons and decides what to call; your code executes. Includes a safe AST-based calculator to avoid code injection.

api.py — A FastAPI service wrapping the RAG pipeline. Teaches how to serve an AI pipeline over HTTP. The index is built once at startup and shared across all requests — the same pattern used in production inference services. Exposes /query, /health, and /stats endpoints.

cli.py — A command-line client for the API. Teaches the separation between API surface and client — the CLI knows nothing about RAG or embeddings, it just speaks HTTP. Supports single-question, interactive, and stats/health modes.

eval.py — An evaluation harness that benchmarks retrieval accuracy and answer correctness against a ground-truth test suite. Teaches that RAG quality degrades silently — an eval loop catches regressions before they reach users.

observability.py — Shared instrumentation layer. Tracks token usage, response latency, retrieval scores, and tool calls across all pipeline components. Imported by rag-pipeline.py, agent.py, and eval.py.


Architecture

Document Ingestion & Chunking  (lang-chain.py)
           |
  Vector Embedding - index time  (rag-embedding.py)
           |
Semantic Similarity Search - query time  (rag-similarity-search.py)
           |
  Context-Augmented Generation  (rag-pipeline.py)
           |
     +-----+------+
  REST API      Agent
  (api.py)    (agent.py)
     |
  CLI Client
  (cli.py)
           |
  Evaluation Harness  (eval.py)

Tech Stack

Layer Technology
LLM Anthropic API (Claude Sonnet 4)
Embeddings Voyage AI (voyage-3-large)
Document Loading LangChain Community
Text Splitting LangChain Text Splitters
Similarity Search NumPy (cosine similarity)
REST API FastAPI + Uvicorn
CLI argparse + requests
Environment Python 3.12+, python-dotenv

Setup

Prerequisites: Python 3.12+, an Anthropic API key, and a Voyage AI API key.

# Install dependencies
pip install uv
uv sync

# Configure environment
cp .env.example .env
# Add your keys to .env:
# ANTHROPIC_API_KEY=your_key_here
# VOYAGE_API_KEY=your_key_here

Running the Modules

Interactive modules

These run as a conversation loop. Type quit, exit, or bye to stop. Type stats during a session to see live metrics.

uv run rag-pipeline.py     # Full RAG pipeline - recommended starting point
uv run agent.py            # Tool-use agent (RAG + calculator)
uv run chatbot.py          # Basic multi-turn chatbot
uv run stream.py           # Streaming chatbot
uv run system.py           # Systems design mentor

API server

uv run api.py              # Starts server at http://localhost:8000
# Endpoints: POST /query  |  GET /health  |  GET /stats

CLI client (requires API server running)

uv run cli.py "What are your store hours?"      # Single question
uv run cli.py --verbose "How do returns work?"  # With observability metrics
uv run cli.py                                   # Interactive mode
uv run cli.py --health                          # Server health check
uv run cli.py --stats                           # Session metrics

Non-interactive modules

These run once and print their output.

uv run structured.py             # Structured extraction (CloudFormation JSON)
uv run rag-embedding.py          # Generate a vector embedding
uv run rag-similarity-search.py  # Semantic similarity across a sample corpus
uv run lang-chain.py             # Document ingestion and chunking
uv run eval.py                   # Run the evaluation harness
uv run eval.py --export results.json  # Export results to JSON

Design Philosophy

Each module is kept deliberately minimal — no unnecessary abstraction, no shared state between components. The goal is clarity: anyone reading the code should understand exactly what pattern it demonstrates and how to extend it. This is the same principle applied to production platform engineering: own each layer cleanly before composing them.


Roadmap

  • End-to-end RAG pipeline (ingestion -> embedding -> retrieval -> generation)
  • Agentic tool-use with RAG search and safe calculation
  • REST API with observability endpoints
  • CLI client for the API surface
  • Evaluation harness with ground-truth test suite
  • Session-level observability (token usage, latency, retrieval scoring)
  • Persistent vector store integration (e.g. Pinecone, pgvector)
  • Multi-document retrieval with metadata filtering
  • MCP-compatible interface for agent-to-agent interoperability
  • Fine-tuning pipeline for domain-specific model adaptation
  • Benchmarking suite with variance analysis across chunk sizes and models

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages