Listen, we've all been there. You've got a shiny new LLM running on your laptop, but it's basically a goldfish. It forgets what it did five minutes ago, and it keeps making the same dumb mistakes.
Enter Heidi CLI.
Heidi is a command-center for a Unified Learning Suite. It's not just some fancy wrapper for an API; it's a full-on "Closed-Loop Learning System." Basically, it's a way to turn those generic AI models into specialized, self-improving agents that actually learn from their own successes and failures. It's like a personal trainer, but for your LLMs.
Install Heidi CLI with a single command! The installer will automatically:
- Clone the latest version from GitHub
- Install all dependencies
- Build and install Heidi CLI
- Verify the installation
# Quick install (one command)
curl -fsSL https://raw.githubusercontent.com/heidi-dang/heidi-cli/main/install | bash
# Or download first
wget https://raw.githubusercontent.com/heidi-dang/heidi-cli/main/install
chmod +x install
./install# Download and run
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/heidi-dang/heidi-cli/main/install.ps1" -OutFile "install.ps1"
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
.\install.ps1# Verify installation
heidi --version
# Quick start
heidi setup
heidi api generate --name "My First Key"
heidi model serve🎉 That's it! Heidi CLI is now installed and ready to use!
📖 Step-by-Step Guide: docs/how-to-use.md
From your first model download to enterprise deployment, our comprehensive guide covers:
- ✅ Quick Start - Get running in 5 minutes
- ✅ Setup & Configuration - Configure your environment
- ✅ Model Management - Download and manage models
- ✅ HuggingFace Integration - Access 100,000+ models
- ✅ Token & Cost Tracking - Monitor usage and costs
- ✅ Analytics & Monitoring - Performance insights
- ✅ Advanced Features - Power user capabilities
- ✅ Enterprise Deployment - Production setup
- ✅ Troubleshooting - Common issues and solutions
Think of Heidi like a "Perception-Action-Learning" loop. She's got five internal modules that play together like a well-oiled (and slightly sarcastic) machine:
Stop talking to the cloud! Heidi hosts your models right here on your machine (without Ollama and local transformers).
- What it does: Gives you a unified, OpenAI-compatible API (
/v1/chat/completions). - The "Secret Sauce": You can route requests to different models—like a "stable" one for real work and an "experimental" one for when you're feeling spicy—without ever touching your app code.
Heidi doesn't like starting from zero every time.
- What it does: Uses a SQLite database for both short-term "what just happened?" and long-term "wait, I remember this!" memory.
- The "Secret Sauce": Once a task is done, the Reflection Engine kicks in. It scores how well it did (Reward Scoring) and saves "pro-tips" that worked. Next time you ask something similar, Heidi whispers those successful strategies back into the prompt. It's basically cheating, but legal.
Clean data = happy AI.
- What it does: Grabs every single interaction and stuffs them into dated "Run Folders."
- The "Secret Sauce": A Curation Engine digests these runs, tosses out the garbage, and applies a Secret Redaction Layer. It scrubs your OpenAI keys, deep-rooted secrets, and embarrassing passwords before they ever touch the retraining loop. Privacy is cool, okay?
When you've got enough data, it's time to level up.
- What it does: Manages a Model Registry with stable and candidate channels (think of it like "Production" vs. "Beta").
- The "Secret Sauce": After retraining, an Eval Harness checks if the new model is actually better or if it's just hallucinating harder. If it passes, Heidi does an Atomic Hot-Swap—reloading the new model in milliseconds with zero downtime.
Discover, download, and manage models from the world's largest AI model hub.
- What it does: Seamlessly integrates with HuggingFace Hub for model discovery and management.
- The "Secret Sauce": Smart auto-configuration based on model metadata, usage analytics, and intelligent recommendations. Turns generic models into specialized tools with zero manual configuration.
| Feature | The Command | What's it for? |
|---|---|---|
| Model Hosting | heidi model serve |
Spins up local server. Easy peasy. |
| Agent Memory | heidi memory search |
Digging through the agent's brain for that one thing. |
| Reflection | heidi learning reflect |
Forces the agent to think about what it just did. |
| Data Export | heidi learning export |
Bags up curated/redacted data for retraining. |
| Promotion | heidi learning promote |
Moves a "Candidate" model to "Stable" status. |
| System Health | heidi doctor |
Makes sure everything isn't on fire. |
| HuggingFace Hub | heidi hf search <query> |
Search models on HuggingFace Hub |
| HuggingFace Hub | heidi hf info <model> |
Get detailed model information |
| HuggingFace Hub | heidi hf download <model> |
Download model and auto-configure |
| HuggingFace Hub | heidi hf list-local |
List downloaded models |
| HuggingFace Hub | heidi hf compare <model1> <model2> |
Compare models with recommendations |
| HuggingFace Hub | heidi hf batch-download <model1> <model2> |
Download multiple models in parallel |
| HuggingFace Hub | heidi hf analytics [model] |
View usage analytics and performance |
| HuggingFace Hub | heidi hf remove <model> |
Remove downloaded model |
Let's be real, the current AI world is a bit of a mess. Heidi fixes the big headaches:
- Privacy is King: Most learning happens in the cloud. Nope. Heidi keeps your training data, your memory, and your weights 100% on your machine. Your company secrets stay your secrets.
- Stopping the "Stupid Loop": We've all seen agents make the same mistake twice. Heidi's Redaction & Reflection layers make sure the model actually gets better at your specific job, not just weirder.
- MLOps for the rest of us: Usually, you need a team of engineers to build retraining pipelines. Heidi abstracts all that noise into a single CLI tool. Now you can run a professional-grade Model Lab from your bedroom.
- Access to Thousands of Models: Why limit yourself to a few models? Heidi's HuggingFace integration gives you access to thousands of models with smart auto-configuration and usage analytics.
1. Install the bits:
python -m pip install -e '.[dev]'2. Check the vitals:
# This makes sure your state/ directories and docs are alive
heidi doctor3. Fire it up:
# Start the model host and wait for the "Serving" message
heidi model serve4. Check your status:
heidi status5. Discover and Download Models:
# Search for models on HuggingFace Hub
heidi hf search "mistral" --limit 5
# Get detailed information about a model
heidi hf info "microsoft/DialoGPT-small"
# Download and auto-configure a model
heidi hf download "microsoft/DialoGPT-small" --add-to-config
# Compare multiple models
heidi hf compare "microsoft/DialoGPT-small" "mistralai/Mistral-7B-Instruct-v0.2"
# Batch download multiple models
heidi hf batch-download "model1" "model2" "model3"
# View usage analytics
heidi hf analytics
# List your downloaded models
heidi hf list-localModel Discovery:
# Find the perfect model for your needs
heidi hf search "coding" --limit 10
# Compare models side-by-side
heidi hf compare "codellama/CodeLlama-7b-Instruct-hf" "WizardLM/WizardCoder-15B-V1.0"
# Get detailed model information
heidi hf info "meta-llama/Llama-2-7b-chat-hf"Smart Downloads:
# Download with automatic configuration
heidi hf download "microsoft/DialoGPT-small" --add-to-config
# Batch download for efficiency
heidi hf batch-download "microsoft/DialoGPT-small" "meta-llama/Llama-3.2-1B-Instruct"
# Models are stored in ~/.heidi/models/huggingface/
# Automatically configured for immediate useUsage Analytics:
# Track model performance and usage patterns
heidi hf analytics
# Get detailed analytics for a specific model
heidi hf analytics "microsoft_DialoGPT-small" --days 7
# Export analytics data for analysis
heidi hf analytics --exportProduction Deployment:
# Start serving multiple models
heidi model serve
# Models appear in OpenAI-compatible API
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "microsoft_DialoGPT-small", "messages": [{"role": "user", "content": "Hello!"}]}'
# Rich metadata in model listings
curl http://127.0.0.1:8000/v1/models | jq '.data[] | {id, display_name, capabilities, huggingface}'heidi-cli/
├── src/heidi_cli/
│ ├── model_host/ # Multi-model API server
│ │ ├── server.py # FastAPI server with OpenAI endpoints
│ │ ├── manager.py # Model routing and request handling
│ │ └── metadata.py # Rich model metadata management
│ ├── runtime/ # Learning & memory system
│ │ ├── db.py # SQLite database management
│ │ ├── reflection.py # Performance analysis
│ │ └── curation.py # Data cleaning and redaction
│ ├── pipeline/ # Data export pipeline
│ │ └── curation.py # Smart data curation
│ ├── registry/ # Model versioning and hot-swap
│ │ ├── manager.py # Model registry management
│ │ ├── eval.py # Model evaluation
│ │ └── hotswap.py # Zero-downtime model swapping
│ ├── integrations/ # External integrations
│ │ ├── huggingface.py # HuggingFace Hub integration
│ │ └── analytics.py # Usage analytics system
│ └── cli.py # Unified command interface
└── state/ # Local data storage
├── models/ # Downloaded models
├── registry/ # Model versions
├── memory/ # Agent memory
├── logs/ # System logs
└── analytics/ # Usage analytics database
Model Discovery:
- Search thousands of models on HuggingFace Hub
- Filter by task, size, capabilities, and popularity
- Get detailed model information and metadata
- Compare models side-by-side with intelligent recommendations
Smart Configuration:
- Automatic model configuration based on metadata
- Capability detection (chat, coding, vision, embeddings)
- Device requirements based on model size
- Context length and token optimization
Usage Analytics:
- Real-time request tracking and performance metrics
- Latency analysis (P95, P99, throughput)
- Error monitoring and success rates
- Token efficiency calculations
- Export functionality for external analysis
Batch Operations:
- Parallel model downloads (up to 3 concurrent)
- Progress tracking and error handling
- Automatic configuration for multiple models
- Comprehensive summary reports
API Integration:
- OpenAI-compatible endpoints (
/v1/models,/v1/chat/completions) - Rich metadata in model listings
- Seamless integration with existing tools
- Automatic capability detection and routing
Developers:
- Test different models for specific tasks
- Compare performance across model families
- Batch download model collections
- Track usage patterns and optimize costs
Enterprises:
- Manage model fleets with analytics
- Enforce model governance and compliance
- Optimize resource allocation
- Track ROI on model investments
Researchers:
- Access thousands of models for experimentation
- Compare model architectures and capabilities
- Track performance metrics across models
- Export data for academic analysis
Model Storage:
# Models are stored in ~/.heidi/models/huggingface/
# Each model has its own directory with metadata
# HuggingFace cache structure for efficient storageAnalytics Database:
# SQLite database at ~/.heidi/analytics/usage.db
# Tracks requests, performance, errors, and trends
# Thread-safe for concurrent access
# Exportable for external analysisConfiguration:
# Heidi config at ~/.heidi/config/suite.json
# Automatic model configuration
# Capability detection and optimization
# Device and precision settings- Privacy First: Your data, your models, your rules. Everything stays local.
- Smart Automation: From model discovery to performance tracking, it just works.
- Professional Grade: Enterprise-ready features with monitoring and analytics.
- Cost Effective: No cloud fees, no subscription traps.
- Open Source: Full transparency and extensibility.
- HuggingFace Powered: Access to thousands of models with one command.
- Complete Documentation: Step-by-step guide for all users.
Heidi is written by humans (mostly) to help machines act more like humans (the smart ones).
📖 Step-by-Step Guide: docs/how-to-use.md
From your first model download to enterprise deployment, our comprehensive guide covers:
- ✅ Quick Start - Get running in 5 minutes
- ✅ Setup & Configuration - Configure your environment
- ✅ Model Management - Download and manage models
- ✅ HuggingFace Integration - Access 100,000+ models
- ✅ Token & Cost Tracking - Monitor usage and costs
- ✅ Analytics & Monitoring - Performance insights
- ✅ Advanced Features - Power user capabilities
- ✅ Enterprise Deployment - Production setup
- ✅ Troubleshooting - Common issues and solutions
# 1. Install
pip install heidi-cli
# 2. Setup
heidi setup
# 3. Discover Models
heidi hf search "text-generation" --limit 5
# 4. Download Your Favorite
heidi hf download "microsoft/DialoGPT-small" --add-to-config
# 5. Start Serving
heidi model serve
# 6. Use It!
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "microsoft_DialoGPT-small", "messages": [{"role": "user", "content": "Hello!"}]}'That's it! You're now running a professional-grade AI model hosting platform with HuggingFace integration.