py-demo-api

g A production-ready Python FastAPI server for LLM chat interactions using LangChain. Supports multiple providers (OpenAI, Anthropic, Google), streaming and batched responses, with intelligent in-memory caching.

🚀 Features

Multi-Provider Support: OpenAI, Anthropic (Claude), Google (Gemini)
OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
Streaming & Batched Responses: Real-time streaming or complete responses
Smart Caching: In-memory cache with TTL for identical prompts/history
LangChain Integration: Leverages LangChain for robust LLM interactions
Type-Safe: Full Pydantic validation for requests and responses
Docker Ready: Containerized for easy cloud deployment
Production-Ready: Health checks, CORS, proper error handling

📋 What's Included

py-demo-api/
├── app/
│   ├── __init__.py           # Package initialization
│   ├── main.py               # FastAPI application entry point
│   ├── config.py             # Configuration settings
│   ├── api/
│   │   ├── __init__.py
│   │   └── routes.py         # API endpoints (chat, health, cache)
│   ├── models/
│   │   ├── __init__.py
│   │   └── chat.py           # Request/response models
│   ├── services/
│   │   ├── __init__.py
│   │   └── llm_service.py    # LLM provider integration
│   └── utils/
│       ├── __init__.py
│       └── cache.py          # In-memory caching implementation
├── requirements.txt          # Python dependencies
├── Dockerfile               # Docker image definition
├── docker-compose.yml       # Docker Compose configuration
├── .gitignore              # Git ignore rules
└── README.md               # This file

🛠️ Quick Start

Prerequisites

Python 3.11+
pip or poetry
API keys for desired providers (OpenAI, Anthropic, or Google)

Local Setup

Clone and navigate to the directory:
```
cd py-demo-api
```

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Configure environment variables: Create a .env file in the root directory:

# Required: At least one provider API key
OPENAI_API_KEY=sk-your-openai-key-here
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
GOOGLE_API_KEY=your-google-key-here

# Optional: Configuration
DEFAULT_PROVIDER=openai
DEFAULT_MODEL=gpt-4o-mini
CACHE_MAX_SIZE=1000
CACHE_TTL_SECONDS=3600

Run the server:

uvicorn app.main:app --reload

Or using the Python script:

python -m app.main

Access the API:
- API Documentation: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Health Check: http://localhost:8000/api/health

🐳 Docker Deployment

Using Docker Compose (Recommended)

Create .env file with your API keys (see above)
Start the service:
```
docker-compose up -d
```
View logs:
```
docker-compose logs -f
```
Stop the service:
```
docker-compose down
```

Using Docker directly

Build the image:
```
docker build -t py-demo-api .
```

Run the container:

docker run -d \
  -p 8000:8000 \
  -e OPENAI_API_KEY=your-key \
  -e ANTHROPIC_API_KEY=your-key \
  --name py-demo-api \
  py-demo-api

📡 API Usage

OpenAI-Compatible Endpoint

Batched Response:

curl -X POST "http://localhost:8000/api/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "temperature": 0.7
  }'

Streaming Response:

curl -X POST "http://localhost:8000/api/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Tell me a story"}
    ],
    "stream": true,
    "provider": "openai"
  }'

Python Client Example

import requests

# Batched response
response = requests.post(
    "http://localhost:8000/api/v1/chat/completions",
    json={
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is machine learning?"}
        ],
        "provider": "openai",
        "temperature": 0.7
    }
)

data = response.json()
print(data["message"]["content"])
print(f"Cached: {data['cached']}")

# Streaming response
response = requests.post(
    "http://localhost:8000/api/v1/chat/completions",
    json={
        "messages": [{"role": "user", "content": "Count to 10"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode())

🎯 Supported Providers & Models

OpenAI (Latest as of Nov 2025)

gpt-4o - Latest GPT-4 Optimized model ⭐
gpt-4o-mini - Faster, cheaper GPT-4 (recommended for most use cases) ⭐
gpt-4-turbo - Previous generation GPT-4 Turbo
gpt-4 - Standard GPT-4
gpt-3.5-turbo - Legacy, cheaper option

Anthropic Claude (Latest as of Nov 2025)

claude-3-5-sonnet-20241022 - Latest Claude 3.5 Sonnet ⭐
claude-3-5-sonnet - Alias for latest 3.5 Sonnet
claude-3-opus-20240229 - Most capable Claude 3 model
claude-3-opus - Alias for Claude 3 Opus
claude-3-sonnet-20240229 - Balanced Claude 3 model
claude-3-sonnet - Alias for Claude 3 Sonnet
claude-3-haiku-20240307 - Fastest, most affordable Claude 3
claude-3-haiku - Alias for Claude 3 Haiku

Google Gemini (Latest as of Nov 2025)

gemini-1.5-pro - Most capable Gemini model ⭐
gemini-1.5-flash - Faster, more affordable Gemini ⭐
gemini-pro - Previous generation Gemini
gemini-pro-vision - Previous generation with vision support

⭐ = Recommended models

💾 Caching

The API implements intelligent in-memory caching:

Cache Key: Hash of messages + model + provider + temperature + max_tokens
TTL: Configurable (default: 1 hour)
Max Size: Configurable (default: 1000 entries)
Behavior: Identical requests return cached responses instantly

Clear cache manually:

curl -X POST "http://localhost:8000/api/cache/clear"

🏗️ Architecture

Request Flow

Client sends chat request
Cache lookup by request hash
If cached → return immediately
If not → LangChain → LLM Provider
Cache response for future use
Return to client

Key Components

FastAPI: Modern async web framework
LangChain: LLM orchestration and provider abstraction
Pydantic: Data validation and settings management
cachetools: TTL-based in-memory caching

🔧 Configuration

All configuration via environment variables (see .env):

Variable	Description	Default
`OPENAI_API_KEY`	OpenAI API key	None
`ANTHROPIC_API_KEY`	Anthropic API key	None
`GOOGLE_API_KEY`	Google API key	None
`DEFAULT_PROVIDER`	Default LLM provider	`openai`
`DEFAULT_MODEL`	Default model name	`gpt-4o-mini`
`CACHE_MAX_SIZE`	Max cached responses	`1000`
`CACHE_TTL_SECONDS`	Cache entry lifetime	`3600`
`HOST`	Server host	`0.0.0.0`
`PORT`	Server port	`8000`
`ENVIRONMENT`	Environment mode	`development`

🧪 Testing

Check health endpoint:

curl http://localhost:8000/api/health

Expected response:

{
  "status": "healthy",
  "cache_stats": {
    "size": 0,
    "max_size": 1000,
    "ttl_seconds": 3600
  }
}

🚀 Cloud Deployment

This API is ready for cloud deployment on:

AWS: ECS, EKS, or EC2 with Docker
Google Cloud: Cloud Run, GKE, or Compute Engine
Azure: Container Instances, AKS, or App Service
DigitalOcean: App Platform or Droplets
Railway, Render, Fly.io: Direct Docker deployment

Deployment Checklist

Set production environment variables
Configure CORS for your domain
Set up monitoring and logging
Enable HTTPS/TLS
Implement rate limiting (if needed)
Set appropriate cache sizes

📝 License

MIT License - Feel free to use in your projects!

🤝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.

Need help? Check the API docs at /docs or open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
tests		tests
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
ENV_SETUP.md		ENV_SETUP.md
IMPLEMENTATION_NOTES.md		IMPLEMENTATION_NOTES.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example
example_client.py		example_client.py
fly.toml		fly.toml
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

py-demo-api

🚀 Features

📋 What's Included

🛠️ Quick Start

Prerequisites

Local Setup

🐳 Docker Deployment

Using Docker Compose (Recommended)

Using Docker directly

📡 API Usage

OpenAI-Compatible Endpoint

Python Client Example

🎯 Supported Providers & Models

OpenAI (Latest as of Nov 2025)

Anthropic Claude (Latest as of Nov 2025)

Google Gemini (Latest as of Nov 2025)

💾 Caching

🏗️ Architecture

Request Flow

Key Components

🔧 Configuration

🧪 Testing

🚀 Cloud Deployment

Deployment Checklist

📝 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

py-demo-api

🚀 Features

📋 What's Included

🛠️ Quick Start

Prerequisites

Local Setup

🐳 Docker Deployment

Using Docker Compose (Recommended)

Using Docker directly

📡 API Usage

OpenAI-Compatible Endpoint

Python Client Example

🎯 Supported Providers & Models

OpenAI (Latest as of Nov 2025)

Anthropic Claude (Latest as of Nov 2025)

Google Gemini (Latest as of Nov 2025)

💾 Caching

🏗️ Architecture

Request Flow

Key Components

🔧 Configuration

🧪 Testing

🚀 Cloud Deployment

Deployment Checklist

📝 License

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages