Skip to content

nlkitai/py-demo-api

Repository files navigation

py-demo-api

g A production-ready Python FastAPI server for LLM chat interactions using LangChain. Supports multiple providers (OpenAI, Anthropic, Google), streaming and batched responses, with intelligent in-memory caching.

🚀 Features

  • Multi-Provider Support: OpenAI, Anthropic (Claude), Google (Gemini)
  • OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
  • Streaming & Batched Responses: Real-time streaming or complete responses
  • Smart Caching: In-memory cache with TTL for identical prompts/history
  • LangChain Integration: Leverages LangChain for robust LLM interactions
  • Type-Safe: Full Pydantic validation for requests and responses
  • Docker Ready: Containerized for easy cloud deployment
  • Production-Ready: Health checks, CORS, proper error handling

📋 What's Included

py-demo-api/
├── app/
│   ├── __init__.py           # Package initialization
│   ├── main.py               # FastAPI application entry point
│   ├── config.py             # Configuration settings
│   ├── api/
│   │   ├── __init__.py
│   │   └── routes.py         # API endpoints (chat, health, cache)
│   ├── models/
│   │   ├── __init__.py
│   │   └── chat.py           # Request/response models
│   ├── services/
│   │   ├── __init__.py
│   │   └── llm_service.py    # LLM provider integration
│   └── utils/
│       ├── __init__.py
│       └── cache.py          # In-memory caching implementation
├── requirements.txt          # Python dependencies
├── Dockerfile               # Docker image definition
├── docker-compose.yml       # Docker Compose configuration
├── .gitignore              # Git ignore rules
└── README.md               # This file

🛠️ Quick Start

Prerequisites

  • Python 3.11+
  • pip or poetry
  • API keys for desired providers (OpenAI, Anthropic, or Google)

Local Setup

  1. Clone and navigate to the directory:

    cd py-demo-api
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure environment variables: Create a .env file in the root directory:

    # Required: At least one provider API key
    OPENAI_API_KEY=sk-your-openai-key-here
    ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
    GOOGLE_API_KEY=your-google-key-here
    
    # Optional: Configuration
    DEFAULT_PROVIDER=openai
    DEFAULT_MODEL=gpt-4o-mini
    CACHE_MAX_SIZE=1000
    CACHE_TTL_SECONDS=3600
  5. Run the server:

    uvicorn app.main:app --reload

    Or using the Python script:

    python -m app.main
  6. Access the API:

🐳 Docker Deployment

Using Docker Compose (Recommended)

  1. Create .env file with your API keys (see above)

  2. Start the service:

    docker-compose up -d
  3. View logs:

    docker-compose logs -f
  4. Stop the service:

    docker-compose down

Using Docker directly

  1. Build the image:

    docker build -t py-demo-api .
  2. Run the container:

    docker run -d \
      -p 8000:8000 \
      -e OPENAI_API_KEY=your-key \
      -e ANTHROPIC_API_KEY=your-key \
      --name py-demo-api \
      py-demo-api

📡 API Usage

OpenAI-Compatible Endpoint

Batched Response:

curl -X POST "http://localhost:8000/api/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "temperature": 0.7
  }'

Streaming Response:

curl -X POST "http://localhost:8000/api/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Tell me a story"}
    ],
    "stream": true,
    "provider": "openai"
  }'

Python Client Example

import requests

# Batched response
response = requests.post(
    "http://localhost:8000/api/v1/chat/completions",
    json={
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is machine learning?"}
        ],
        "provider": "openai",
        "temperature": 0.7
    }
)

data = response.json()
print(data["message"]["content"])
print(f"Cached: {data['cached']}")

# Streaming response
response = requests.post(
    "http://localhost:8000/api/v1/chat/completions",
    json={
        "messages": [{"role": "user", "content": "Count to 10"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode())

🎯 Supported Providers & Models

OpenAI (Latest as of Nov 2025)

  • gpt-4o - Latest GPT-4 Optimized model ⭐
  • gpt-4o-mini - Faster, cheaper GPT-4 (recommended for most use cases) ⭐
  • gpt-4-turbo - Previous generation GPT-4 Turbo
  • gpt-4 - Standard GPT-4
  • gpt-3.5-turbo - Legacy, cheaper option

Anthropic Claude (Latest as of Nov 2025)

  • claude-3-5-sonnet-20241022 - Latest Claude 3.5 Sonnet ⭐
  • claude-3-5-sonnet - Alias for latest 3.5 Sonnet
  • claude-3-opus-20240229 - Most capable Claude 3 model
  • claude-3-opus - Alias for Claude 3 Opus
  • claude-3-sonnet-20240229 - Balanced Claude 3 model
  • claude-3-sonnet - Alias for Claude 3 Sonnet
  • claude-3-haiku-20240307 - Fastest, most affordable Claude 3
  • claude-3-haiku - Alias for Claude 3 Haiku

Google Gemini (Latest as of Nov 2025)

  • gemini-1.5-pro - Most capable Gemini model ⭐
  • gemini-1.5-flash - Faster, more affordable Gemini ⭐
  • gemini-pro - Previous generation Gemini
  • gemini-pro-vision - Previous generation with vision support

⭐ = Recommended models

💾 Caching

The API implements intelligent in-memory caching:

  • Cache Key: Hash of messages + model + provider + temperature + max_tokens
  • TTL: Configurable (default: 1 hour)
  • Max Size: Configurable (default: 1000 entries)
  • Behavior: Identical requests return cached responses instantly

Clear cache manually:

curl -X POST "http://localhost:8000/api/cache/clear"

🏗️ Architecture

Request Flow

  1. Client sends chat request
  2. Cache lookup by request hash
  3. If cached → return immediately
  4. If not → LangChain → LLM Provider
  5. Cache response for future use
  6. Return to client

Key Components

  • FastAPI: Modern async web framework
  • LangChain: LLM orchestration and provider abstraction
  • Pydantic: Data validation and settings management
  • cachetools: TTL-based in-memory caching

🔧 Configuration

All configuration via environment variables (see .env):

Variable Description Default
OPENAI_API_KEY OpenAI API key None
ANTHROPIC_API_KEY Anthropic API key None
GOOGLE_API_KEY Google API key None
DEFAULT_PROVIDER Default LLM provider openai
DEFAULT_MODEL Default model name gpt-4o-mini
CACHE_MAX_SIZE Max cached responses 1000
CACHE_TTL_SECONDS Cache entry lifetime 3600
HOST Server host 0.0.0.0
PORT Server port 8000
ENVIRONMENT Environment mode development

🧪 Testing

Check health endpoint:

curl http://localhost:8000/api/health

Expected response:

{
  "status": "healthy",
  "cache_stats": {
    "size": 0,
    "max_size": 1000,
    "ttl_seconds": 3600
  }
}

🚀 Cloud Deployment

This API is ready for cloud deployment on:

  • AWS: ECS, EKS, or EC2 with Docker
  • Google Cloud: Cloud Run, GKE, or Compute Engine
  • Azure: Container Instances, AKS, or App Service
  • DigitalOcean: App Platform or Droplets
  • Railway, Render, Fly.io: Direct Docker deployment

Deployment Checklist

  • Set production environment variables
  • Configure CORS for your domain
  • Set up monitoring and logging
  • Enable HTTPS/TLS
  • Implement rate limiting (if needed)
  • Set appropriate cache sizes

📝 License

MIT License - Feel free to use in your projects!

🤝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.


Need help? Check the API docs at /docs or open an issue on GitHub.

About

NLUX demo APIs built using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors