# Local Models Guide

**Status**: ✅ Complete

**Last Updated**: December 3, 2025

---

## Overview

Local models allow you to run AI models directly on your machine using Ollama, enabling offline development, enhanced privacy, and reduced costs. This guide covers everything you need to know about using local models with RiceCoder.

## What is Ollama?

Ollama is an open-source tool that makes it easy to run large language models locally. Instead of sending your code and conversations to cloud providers, Ollama runs models on your own hardware, giving you complete control over your data.

### Benefits of Local Models

**Privacy**: Your code and conversations never leave your machine. No data is sent to external servers.

**Offline Development**: Work without internet connectivity. Perfect for airplanes, trains, or areas with poor connectivity.

**Cost Savings**: No API fees. Run unlimited models after the initial download.

**Customization**: Fine-tune models for your specific use cases and domains.

**Speed**: Instant responses without network latency (depending on your hardware).

**Control**: Complete control over model versions and behavior.

### When to Use Local Models

**Use local models when**:
- You need privacy and data security
- You work offline frequently
- You want to reduce costs
- You're developing locally and don't need cloud features
- You want to experiment with different models

**Use cloud providers when**:
- You need the latest, most powerful models
- You want to minimize hardware requirements
- You need guaranteed uptime and support
- You're working in a team with shared infrastructure

## Installation

### Prerequisites

Before installing Ollama, ensure your system meets the requirements:

**Hardware Requirements**:
- **Minimum**: 4GB RAM (8GB recommended)
- **GPU** (optional but recommended): NVIDIA, AMD, or Apple Silicon for faster inference
- **Disk Space**: 10-50GB depending on models you want to run

**System Requirements**:
- Windows 10/11, macOS 11+, or Linux
- Administrator/sudo access for installation

### Windows Installation

#### Step 1: Download Ollama

1. Visit https://ollama.ai
2. Click "Download for Windows"
3. Run the installer (`OllamaSetup.exe`)

#### Step 2: Install

1. Follow the installation wizard
2. Accept the license agreement
3. Choose installation location (default: `C:\Users\{username}\AppData\Local\Programs\Ollama`)
4. Click "Install"

#### Step 3: Verify Installation

Open PowerShell and run:

```powershell
ollama --version
```

You should see the version number.

#### Step 4: Start Ollama

Ollama runs as a background service on Windows. It starts automatically after installation.

To verify it's running:

```powershell
curl http://localhost:11434/api/tags
```

You should get a JSON response.

### macOS Installation

#### Step 1: Download Ollama

1. Visit https://ollama.ai
2. Click "Download for macOS"
3. Choose the appropriate version:
   - **Apple Silicon** (M1/M2/M3): `ollama-darwin-arm64.zip`
   - **Intel**: `ollama-darwin-amd64.zip`

#### Step 2: Install

1. Open the downloaded `.dmg` file
2. Drag Ollama to the Applications folder
3. Wait for the copy process to complete

#### Step 3: Verify Installation

Open Terminal and run:

```bash
ollama --version
```

You should see the version number.

#### Step 4: Start Ollama

Ollama runs as a background service on macOS. Start it with:

```bash
ollama serve
```

Or use Spotlight to launch Ollama from Applications.

### Linux Installation

#### Step 1: Download and Install

Open Terminal and run:

```bash
curl https://ollama.ai/install.sh | sh
```

This script will:
- Download Ollama
- Install it to `/usr/local/bin/ollama`
- Set up systemd service

#### Step 2: Verify Installation

```bash
ollama --version
```

You should see the version number.

#### Step 3: Start Ollama

Start the Ollama service:

```bash
sudo systemctl start ollama
```

Enable it to start on boot:

```bash
sudo systemctl enable ollama
```

Verify it's running:

```bash
curl http://localhost:11434/api/tags
```

You should get a JSON response.

## Model Management

### Pulling Models

Before you can use a model, you need to pull it from the Ollama library.

#### Pull a Model

```bash
ollama pull mistral
```

This downloads the Mistral model (about 4GB). The first pull takes time depending on your internet speed.

#### Available Models

Popular models available on Ollama:

**Lightweight Models** (fast, low memory):
- `mistral` - 7B parameters, fast and efficient
- `neural-chat` - 7B parameters, optimized for chat
- `orca-mini` - 3B parameters, very lightweight

**Balanced Models** (good quality, moderate resources):
- `llama2` - 7B parameters, general purpose
- `neural-chat` - 7B parameters, chat optimized
- `dolphin-mixtral` - 8x7B mixture of experts

**Powerful Models** (high quality, high resource):
- `mistral-large` - 34B parameters, very capable
- `llama2-uncensored` - 7B parameters, less restricted
- `neural-chat-7b-v3` - 7B parameters, latest version

**Specialized Models**:
- `codeup` - Optimized for code generation
- `phind-codellama` - Code-focused model
- `sqlcoder` - SQL generation

For a complete list, visit https://ollama.ai/library

#### Pull Multiple Models

```bash
ollama pull mistral
ollama pull llama2
ollama pull neural-chat
```

### Listing Models

View all models you've downloaded:

```bash
ollama list
```

Output example:

```text
NAME                    ID              SIZE    MODIFIED
mistral:latest          2ae6f6dd4470    4.1 GB  2 minutes ago
llama2:latest           78e26419b446    3.8 GB  1 hour ago
neural-chat:latest      42ab0d3f4b51    4.1 GB  3 hours ago
```

### Removing Models

Delete a model to free up disk space:

```bash
ollama rm mistral
```

Remove all models:

```bash
ollama rm -a
```

### Updating Models

Pull the latest version of a model:

```bash
ollama pull mistral
```

Ollama automatically downloads the latest version if available.

## Configuring RiceCoder for Ollama

### Step 1: Ensure Ollama is Running

First, make sure Ollama is running on your system:

```bash
# Windows (PowerShell)
curl http://localhost:11434/api/tags

# macOS/Linux
curl http://localhost:11434/api/tags
```

If you get a connection error, start Ollama:

```bash
# macOS
ollama serve

# Linux
sudo systemctl start ollama

# Windows - Ollama runs as a service automatically
```

### Step 2: Configure RiceCoder

Edit your RiceCoder configuration file:

**Global Configuration** (`~/.ricecoder/config.yaml`):

```yaml
provider: ollama
ollama-url: http://localhost:11434
model: mistral
```

**Project Configuration** (`.agent/config.yaml`):

```yaml
provider: ollama
model: mistral
```

### Step 3: Verify Configuration

Check that RiceCoder can connect to Ollama:

```bash
rice config show
```

You should see:

```text
provider: ollama
ollama-url: http://localhost:11434
model: mistral
```

### Step 4: Test the Connection

Start a chat session:

```bash
rice chat
```

Type a message and press Enter. If everything is configured correctly, you should get a response from the local model.

### Configuration Options

#### `provider`

Set to `ollama` to use local models:

```yaml
provider: ollama
```

#### `ollama-url`

URL where Ollama is running (default: `http://localhost:11434`):

```yaml
ollama-url: http://localhost:11434
```

If Ollama is on a different machine:

```yaml
ollama-url: http://192.168.1.100:11434
```

#### `model`

Which model to use:

```yaml
model: mistral
```

Other options:

```yaml
model: llama2
model: neural-chat
model: codeup
```

#### `ollama-timeout`

Timeout for Ollama requests in seconds (default: 300):

```yaml
ollama-timeout: 600  # 10 minutes for complex tasks
```

## Usage Examples

### Basic Chat

Start a chat session with your local model:

```bash
rice chat
```

Then type your questions:

```text
> What is Rust?
```

The model will respond with information about Rust.

### Code Generation

Generate code using your local model:

```bash
rice chat
> Generate a Rust function that validates email addresses
```

The model will generate code based on your request.

### Code Review

Ask the model to review code:

```bash
rice chat
> Review this function for potential issues:
> 
> fn process_data(data: &str) -> String {
>     data.to_uppercase()
> }
```

### Spec-Driven Development

Use local models with specs:

```bash
rice spec create my-feature
rice spec design my-feature
rice gen my-feature
```

The generation will use your local model instead of cloud providers.

## Performance Optimization

### Hardware Considerations

**CPU-Only Performance**:
- Mistral 7B: ~5-10 tokens/second
- Llama2 7B: ~3-8 tokens/second
- Smaller models: ~10-20 tokens/second

**GPU Performance** (NVIDIA):
- Mistral 7B: ~50-100 tokens/second
- Llama2 7B: ~40-80 tokens/second
- Larger models: ~20-50 tokens/second

### Optimization Tips

#### 1. Use Smaller Models for Speed

If speed is critical, use smaller models:

```yaml
model: orca-mini  # 3B, very fast
```

#### 2. Increase Timeout for Complex Tasks

For complex code generation, increase the timeout:

```yaml
ollama-timeout: 900  # 15 minutes
```

#### 3. Use GPU Acceleration

If you have an NVIDIA GPU, Ollama automatically uses it. Verify:

```bash
ollama list
```

Look for GPU memory usage in the output.

#### 4. Adjust Model Parameters

For faster responses, use fewer tokens:

```bash
rice chat --max-tokens 500
```

#### 5. Run Ollama on a Dedicated Machine

For team environments, run Ollama on a dedicated server:

```yaml
ollama-url: http://ollama-server.local:11434
```

### Memory Management

**Monitor Memory Usage**:

```bash
# macOS/Linux
top  # Look for ollama process

# Windows
tasklist | findstr ollama
```

**Reduce Memory Usage**:

1. Use smaller models
2. Reduce context window
3. Run only one model at a time

**Free Up Memory**:

```bash
ollama rm model-name
```

## Troubleshooting

### "Connection refused" Error

**Problem**: RiceCoder can't connect to Ollama

**Solution**:

1. Check if Ollama is running:

```bash
curl http://localhost:11434/api/tags
```

2. If not running, start Ollama:

```bash
# macOS
ollama serve

# Linux
sudo systemctl start ollama

# Windows - restart the Ollama service
```

3. Check the URL in configuration:

```bash
rice config get ollama-url
```

4. If using a different URL, update it:

```bash
rice config set ollama-url http://your-ollama-url:11434
```

### "Model not found" Error

**Problem**: The specified model doesn't exist

**Solution**:

1. Check available models:

```bash
ollama list
```

2. Pull the model:

```bash
ollama pull mistral
```

3. Update configuration:

```bash
rice config set model mistral
```

### "Out of memory" Error

**Problem**: Ollama runs out of memory

**Solution**:

1. Use a smaller model:

```bash
ollama pull orca-mini
rice config set model orca-mini
```

2. Increase system memory or swap
3. Close other applications
4. Reduce context window size

### "Timeout" Error

**Problem**: Ollama takes too long to respond

**Solution**:

1. Increase timeout:

```yaml
ollama-timeout: 900  # 15 minutes
```

2. Use a faster model:

```bash
ollama pull mistral
rice config set model mistral
```

3. Check system resources (CPU, memory, disk)
4. Reduce model size or complexity

### "Slow Response" Issue

**Problem**: Model responses are very slow

**Solution**:

1. Check if GPU is being used:

```bash
ollama list
```

2. If CPU-only, consider:
   - Using a smaller model
   - Getting a GPU
   - Running Ollama on a more powerful machine

3. Monitor system resources:

```bash
# macOS/Linux
top

# Windows
tasklist
```

4. Close other applications to free resources

### "Model Download Fails"

**Problem**: Can't download a model

**Solution**:

1. Check internet connection:

```bash
ping ollama.ai
```

2. Try again (temporary network issue):

```bash
ollama pull mistral
```

3. Check disk space:

```bash
# macOS/Linux
df -h

# Windows
dir C:\
```

4. Try a different model:

```bash
ollama pull llama2
```

### "Ollama Won't Start"

**Problem**: Ollama service won't start

**Solution**:

1. Check if port 11434 is in use:

```bash
# macOS/Linux
lsof -i :11434

# Windows
netstat -ano | findstr :11434
```

2. If in use, kill the process or change the port
3. Restart Ollama:

```bash
# macOS
killall ollama
ollama serve

# Linux
sudo systemctl restart ollama

# Windows
# Restart the Ollama service from Services
```

4. Check logs for errors:

```bash
# macOS/Linux
tail -f ~/.ollama/logs/server.log

# Windows
# Check Event Viewer
```

## Best Practices

### 1. Start with Mistral

Mistral is a good balance of speed and quality. Start here:

```yaml
model: mistral
```

### 2. Use Project Configuration

Store model preferences in `.agent/config.yaml`:

```yaml
provider: ollama
model: mistral
```

### 3. Monitor Resource Usage

Keep an eye on CPU, memory, and disk:

```bash
# macOS/Linux
top

# Windows
tasklist
```

### 4. Keep Models Updated

Periodically pull the latest versions:

```bash
ollama pull mistral
```

### 5. Document Model Choices

Explain why you chose a specific model:

```yaml
# .agent/config.yaml
# Using Mistral for fast local development
# Switch to OpenAI for production
provider: ollama
model: mistral
```

### 6. Test Before Committing

Always test generated code before committing:

```bash
rice chat
> Generate a function to validate emails
# Review the generated code
# Test it locally
# Then commit
```

### 7. Use Appropriate Models for Tasks

- **Code generation**: `codeup`, `phind-codellama`
- **General chat**: `mistral`, `neural-chat`
- **Lightweight**: `orca-mini`
- **Powerful**: `mistral-large`, `llama2-uncensored`

## See Also

- [Configuration Guide](./Configuration.md) - All configuration options
- [AI Providers Guide](./AI-Providers.md) - Comparing providers
- [Quick Start Guide](./Quick-Start.md) - Get started quickly
- [Troubleshooting Guide](./Troubleshooting.md) - Common issues and solutions
- [Ollama Official Site](https://ollama.ai) - Official Ollama documentation

---

*Last updated: December 3, 2025*