# Local Models Guide **Status**: ✅ Complete **Last Updated**: December 3, 2025 --- ## Overview Local models allow you to run AI models directly on your machine using Ollama, enabling offline development, enhanced privacy, and reduced costs. This guide covers everything you need to know about using local models with RiceCoder. ## What is Ollama? Ollama is an open-source tool that makes it easy to run large language models locally. Instead of sending your code and conversations to cloud providers, Ollama runs models on your own hardware, giving you complete control over your data. ### Benefits of Local Models **Privacy**: Your code and conversations never leave your machine. No data is sent to external servers. **Offline Development**: Work without internet connectivity. Perfect for airplanes, trains, or areas with poor connectivity. **Cost Savings**: No API fees. Run unlimited models after the initial download. **Customization**: Fine-tune models for your specific use cases and domains. **Speed**: Instant responses without network latency (depending on your hardware). **Control**: Complete control over model versions and behavior. ### When to Use Local Models **Use local models when**: - You need privacy and data security - You work offline frequently - You want to reduce costs - You're developing locally and don't need cloud features - You want to experiment with different models **Use cloud providers when**: - You need the latest, most powerful models - You want to minimize hardware requirements - You need guaranteed uptime and support - You're working in a team with shared infrastructure ## Installation ### Prerequisites Before installing Ollama, ensure your system meets the requirements: **Hardware Requirements**: - **Minimum**: 4GB RAM (8GB recommended) - **GPU** (optional but recommended): NVIDIA, AMD, or Apple Silicon for faster inference - **Disk Space**: 10-50GB depending on models you want to run **System Requirements**: - Windows 10/11, macOS 11+, or Linux - Administrator/sudo access for installation ### Windows Installation #### Step 1: Download Ollama 1. Visit https://ollama.ai 2. Click "Download for Windows" 3. Run the installer (`OllamaSetup.exe`) #### Step 2: Install 1. Follow the installation wizard 2. Accept the license agreement 3. Choose installation location (default: `C:\Users\{username}\AppData\Local\Programs\Ollama`) 4. Click "Install" #### Step 3: Verify Installation Open PowerShell and run: ```powershell ollama --version ``` You should see the version number. #### Step 4: Start Ollama Ollama runs as a background service on Windows. It starts automatically after installation. To verify it's running: ```powershell curl http://localhost:11434/api/tags ``` You should get a JSON response. ### macOS Installation #### Step 1: Download Ollama 1. Visit https://ollama.ai 2. Click "Download for macOS" 3. Choose the appropriate version: - **Apple Silicon** (M1/M2/M3): `ollama-darwin-arm64.zip` - **Intel**: `ollama-darwin-amd64.zip` #### Step 2: Install 1. Open the downloaded `.dmg` file 2. Drag Ollama to the Applications folder 3. Wait for the copy process to complete #### Step 3: Verify Installation Open Terminal and run: ```bash ollama --version ``` You should see the version number. #### Step 4: Start Ollama Ollama runs as a background service on macOS. Start it with: ```bash ollama serve ``` Or use Spotlight to launch Ollama from Applications. ### Linux Installation #### Step 1: Download and Install Open Terminal and run: ```bash curl https://ollama.ai/install.sh | sh ``` This script will: - Download Ollama - Install it to `/usr/local/bin/ollama` - Set up systemd service #### Step 2: Verify Installation ```bash ollama --version ``` You should see the version number. #### Step 3: Start Ollama Start the Ollama service: ```bash sudo systemctl start ollama ``` Enable it to start on boot: ```bash sudo systemctl enable ollama ``` Verify it's running: ```bash curl http://localhost:11434/api/tags ``` You should get a JSON response. ## Model Management ### Pulling Models Before you can use a model, you need to pull it from the Ollama library. #### Pull a Model ```bash ollama pull mistral ``` This downloads the Mistral model (about 4GB). The first pull takes time depending on your internet speed. #### Available Models Popular models available on Ollama: **Lightweight Models** (fast, low memory): - `mistral` - 7B parameters, fast and efficient - `neural-chat` - 7B parameters, optimized for chat - `orca-mini` - 3B parameters, very lightweight **Balanced Models** (good quality, moderate resources): - `llama2` - 7B parameters, general purpose - `neural-chat` - 7B parameters, chat optimized - `dolphin-mixtral` - 8x7B mixture of experts **Powerful Models** (high quality, high resource): - `mistral-large` - 34B parameters, very capable - `llama2-uncensored` - 7B parameters, less restricted - `neural-chat-7b-v3` - 7B parameters, latest version **Specialized Models**: - `codeup` - Optimized for code generation - `phind-codellama` - Code-focused model - `sqlcoder` - SQL generation For a complete list, visit https://ollama.ai/library #### Pull Multiple Models ```bash ollama pull mistral ollama pull llama2 ollama pull neural-chat ``` ### Listing Models View all models you've downloaded: ```bash ollama list ``` Output example: ```text NAME ID SIZE MODIFIED mistral:latest 2ae6f6dd4470 4.1 GB 2 minutes ago llama2:latest 78e26419b446 3.8 GB 1 hour ago neural-chat:latest 42ab0d3f4b51 4.1 GB 3 hours ago ``` ### Removing Models Delete a model to free up disk space: ```bash ollama rm mistral ``` Remove all models: ```bash ollama rm -a ``` ### Updating Models Pull the latest version of a model: ```bash ollama pull mistral ``` Ollama automatically downloads the latest version if available. ## Configuring RiceCoder for Ollama ### Step 1: Ensure Ollama is Running First, make sure Ollama is running on your system: ```bash # Windows (PowerShell) curl http://localhost:11434/api/tags # macOS/Linux curl http://localhost:11434/api/tags ``` If you get a connection error, start Ollama: ```bash # macOS ollama serve # Linux sudo systemctl start ollama # Windows - Ollama runs as a service automatically ``` ### Step 2: Configure RiceCoder Edit your RiceCoder configuration file: **Global Configuration** (`~/.ricecoder/config.yaml`): ```yaml provider: ollama ollama-url: http://localhost:11434 model: mistral ``` **Project Configuration** (`.agent/config.yaml`): ```yaml provider: ollama model: mistral ``` ### Step 3: Verify Configuration Check that RiceCoder can connect to Ollama: ```bash rice config show ``` You should see: ```text provider: ollama ollama-url: http://localhost:11434 model: mistral ``` ### Step 4: Test the Connection Start a chat session: ```bash rice chat ``` Type a message and press Enter. If everything is configured correctly, you should get a response from the local model. ### Configuration Options #### `provider` Set to `ollama` to use local models: ```yaml provider: ollama ``` #### `ollama-url` URL where Ollama is running (default: `http://localhost:11434`): ```yaml ollama-url: http://localhost:11434 ``` If Ollama is on a different machine: ```yaml ollama-url: http://192.168.1.100:11434 ``` #### `model` Which model to use: ```yaml model: mistral ``` Other options: ```yaml model: llama2 model: neural-chat model: codeup ``` #### `ollama-timeout` Timeout for Ollama requests in seconds (default: 300): ```yaml ollama-timeout: 600 # 10 minutes for complex tasks ``` ## Usage Examples ### Basic Chat Start a chat session with your local model: ```bash rice chat ``` Then type your questions: ```text > What is Rust? ``` The model will respond with information about Rust. ### Code Generation Generate code using your local model: ```bash rice chat > Generate a Rust function that validates email addresses ``` The model will generate code based on your request. ### Code Review Ask the model to review code: ```bash rice chat > Review this function for potential issues: > > fn process_data(data: &str) -> String { > data.to_uppercase() > } ``` ### Spec-Driven Development Use local models with specs: ```bash rice spec create my-feature rice spec design my-feature rice gen my-feature ``` The generation will use your local model instead of cloud providers. ## Performance Optimization ### Hardware Considerations **CPU-Only Performance**: - Mistral 7B: ~5-10 tokens/second - Llama2 7B: ~3-8 tokens/second - Smaller models: ~10-20 tokens/second **GPU Performance** (NVIDIA): - Mistral 7B: ~50-100 tokens/second - Llama2 7B: ~40-80 tokens/second - Larger models: ~20-50 tokens/second ### Optimization Tips #### 1. Use Smaller Models for Speed If speed is critical, use smaller models: ```yaml model: orca-mini # 3B, very fast ``` #### 2. Increase Timeout for Complex Tasks For complex code generation, increase the timeout: ```yaml ollama-timeout: 900 # 15 minutes ``` #### 3. Use GPU Acceleration If you have an NVIDIA GPU, Ollama automatically uses it. Verify: ```bash ollama list ``` Look for GPU memory usage in the output. #### 4. Adjust Model Parameters For faster responses, use fewer tokens: ```bash rice chat --max-tokens 500 ``` #### 5. Run Ollama on a Dedicated Machine For team environments, run Ollama on a dedicated server: ```yaml ollama-url: http://ollama-server.local:11434 ``` ### Memory Management **Monitor Memory Usage**: ```bash # macOS/Linux top # Look for ollama process # Windows tasklist | findstr ollama ``` **Reduce Memory Usage**: 1. Use smaller models 2. Reduce context window 3. Run only one model at a time **Free Up Memory**: ```bash ollama rm model-name ``` ## Troubleshooting ### "Connection refused" Error **Problem**: RiceCoder can't connect to Ollama **Solution**: 1. Check if Ollama is running: ```bash curl http://localhost:11434/api/tags ``` 2. If not running, start Ollama: ```bash # macOS ollama serve # Linux sudo systemctl start ollama # Windows - restart the Ollama service ``` 3. Check the URL in configuration: ```bash rice config get ollama-url ``` 4. If using a different URL, update it: ```bash rice config set ollama-url http://your-ollama-url:11434 ``` ### "Model not found" Error **Problem**: The specified model doesn't exist **Solution**: 1. Check available models: ```bash ollama list ``` 2. Pull the model: ```bash ollama pull mistral ``` 3. Update configuration: ```bash rice config set model mistral ``` ### "Out of memory" Error **Problem**: Ollama runs out of memory **Solution**: 1. Use a smaller model: ```bash ollama pull orca-mini rice config set model orca-mini ``` 2. Increase system memory or swap 3. Close other applications 4. Reduce context window size ### "Timeout" Error **Problem**: Ollama takes too long to respond **Solution**: 1. Increase timeout: ```yaml ollama-timeout: 900 # 15 minutes ``` 2. Use a faster model: ```bash ollama pull mistral rice config set model mistral ``` 3. Check system resources (CPU, memory, disk) 4. Reduce model size or complexity ### "Slow Response" Issue **Problem**: Model responses are very slow **Solution**: 1. Check if GPU is being used: ```bash ollama list ``` 2. If CPU-only, consider: - Using a smaller model - Getting a GPU - Running Ollama on a more powerful machine 3. Monitor system resources: ```bash # macOS/Linux top # Windows tasklist ``` 4. Close other applications to free resources ### "Model Download Fails" **Problem**: Can't download a model **Solution**: 1. Check internet connection: ```bash ping ollama.ai ``` 2. Try again (temporary network issue): ```bash ollama pull mistral ``` 3. Check disk space: ```bash # macOS/Linux df -h # Windows dir C:\ ``` 4. Try a different model: ```bash ollama pull llama2 ``` ### "Ollama Won't Start" **Problem**: Ollama service won't start **Solution**: 1. Check if port 11434 is in use: ```bash # macOS/Linux lsof -i :11434 # Windows netstat -ano | findstr :11434 ``` 2. If in use, kill the process or change the port 3. Restart Ollama: ```bash # macOS killall ollama ollama serve # Linux sudo systemctl restart ollama # Windows # Restart the Ollama service from Services ``` 4. Check logs for errors: ```bash # macOS/Linux tail -f ~/.ollama/logs/server.log # Windows # Check Event Viewer ``` ## Best Practices ### 1. Start with Mistral Mistral is a good balance of speed and quality. Start here: ```yaml model: mistral ``` ### 2. Use Project Configuration Store model preferences in `.agent/config.yaml`: ```yaml provider: ollama model: mistral ``` ### 3. Monitor Resource Usage Keep an eye on CPU, memory, and disk: ```bash # macOS/Linux top # Windows tasklist ``` ### 4. Keep Models Updated Periodically pull the latest versions: ```bash ollama pull mistral ``` ### 5. Document Model Choices Explain why you chose a specific model: ```yaml # .agent/config.yaml # Using Mistral for fast local development # Switch to OpenAI for production provider: ollama model: mistral ``` ### 6. Test Before Committing Always test generated code before committing: ```bash rice chat > Generate a function to validate emails # Review the generated code # Test it locally # Then commit ``` ### 7. Use Appropriate Models for Tasks - **Code generation**: `codeup`, `phind-codellama` - **General chat**: `mistral`, `neural-chat` - **Lightweight**: `orca-mini` - **Powerful**: `mistral-large`, `llama2-uncensored` ## See Also - [Configuration Guide](./Configuration.md) - All configuration options - [AI Providers Guide](./AI-Providers.md) - Comparing providers - [Quick Start Guide](./Quick-Start.md) - Get started quickly - [Troubleshooting Guide](./Troubleshooting.md) - Common issues and solutions - [Ollama Official Site](https://ollama.ai) - Official Ollama documentation --- *Last updated: December 3, 2025*