Local Models

Local Models Guide

Status: ✅ Complete

Last Updated: December 3, 2025

Overview

Local models allow you to run AI models directly on your machine using Ollama, enabling offline development, enhanced privacy, and reduced costs. This guide covers everything you need to know about using local models with RiceCoder.

What is Ollama?

Ollama is an open-source tool that makes it easy to run large language models locally. Instead of sending your code and conversations to cloud providers, Ollama runs models on your own hardware, giving you complete control over your data.

Benefits of Local Models

Privacy: Your code and conversations never leave your machine. No data is sent to external servers.

Offline Development: Work without internet connectivity. Perfect for airplanes, trains, or areas with poor connectivity.

Cost Savings: No API fees. Run unlimited models after the initial download.

Customization: Fine-tune models for your specific use cases and domains.

Speed: Instant responses without network latency (depending on your hardware).

Control: Complete control over model versions and behavior.

When to Use Local Models

Use local models when:

You need privacy and data security
You work offline frequently
You want to reduce costs
You're developing locally and don't need cloud features
You want to experiment with different models

Use cloud providers when:

You need the latest, most powerful models
You want to minimize hardware requirements
You need guaranteed uptime and support
You're working in a team with shared infrastructure

Installation

Prerequisites

Before installing Ollama, ensure your system meets the requirements:

Hardware Requirements:

Minimum: 4GB RAM (8GB recommended)
GPU (optional but recommended): NVIDIA, AMD, or Apple Silicon for faster inference
Disk Space: 10-50GB depending on models you want to run

System Requirements:

Windows 10/11, macOS 11+, or Linux
Administrator/sudo access for installation

Windows Installation

Step 1: Download Ollama

Visit https://ollama.ai
Click "Download for Windows"
Run the installer (OllamaSetup.exe)

Step 2: Install

Follow the installation wizard
Accept the license agreement
Choose installation location (default: C:\Users\{username}\AppData\Local\Programs\Ollama)
Click "Install"

Step 3: Verify Installation

Open PowerShell and run:

ollama --version

You should see the version number.

Step 4: Start Ollama

Ollama runs as a background service on Windows. It starts automatically after installation.

To verify it's running:

curl http://localhost:11434/api/tags

You should get a JSON response.

macOS Installation

Step 1: Download Ollama

Visit https://ollama.ai
Click "Download for macOS"
Choose the appropriate version:
- Apple Silicon (M1/M2/M3): ollama-darwin-arm64.zip
- Intel: ollama-darwin-amd64.zip

Step 2: Install

Open the downloaded .dmg file
Drag Ollama to the Applications folder
Wait for the copy process to complete

Step 3: Verify Installation

Open Terminal and run:

ollama --version

You should see the version number.

Step 4: Start Ollama

Ollama runs as a background service on macOS. Start it with:

ollama serve

Or use Spotlight to launch Ollama from Applications.

Linux Installation

Step 1: Download and Install

Open Terminal and run:

curl https://ollama.ai/install.sh | sh

This script will:

Download Ollama
Install it to /usr/local/bin/ollama
Set up systemd service

Step 2: Verify Installation

ollama --version

You should see the version number.

Step 3: Start Ollama

Start the Ollama service:

sudo systemctl start ollama

Enable it to start on boot:

sudo systemctl enable ollama

Verify it's running:

curl http://localhost:11434/api/tags

You should get a JSON response.

Model Management

Pulling Models

Before you can use a model, you need to pull it from the Ollama library.

Pull a Model

ollama pull mistral

This downloads the Mistral model (about 4GB). The first pull takes time depending on your internet speed.

Available Models

Popular models available on Ollama:

Lightweight Models (fast, low memory):

mistral - 7B parameters, fast and efficient
neural-chat - 7B parameters, optimized for chat
orca-mini - 3B parameters, very lightweight

Balanced Models (good quality, moderate resources):

llama2 - 7B parameters, general purpose
neural-chat - 7B parameters, chat optimized
dolphin-mixtral - 8x7B mixture of experts

Powerful Models (high quality, high resource):

mistral-large - 34B parameters, very capable
llama2-uncensored - 7B parameters, less restricted
neural-chat-7b-v3 - 7B parameters, latest version

Specialized Models:

codeup - Optimized for code generation
phind-codellama - Code-focused model
sqlcoder - SQL generation

For a complete list, visit https://ollama.ai/library

Pull Multiple Models

ollama pull mistral
ollama pull llama2
ollama pull neural-chat

Listing Models

View all models you've downloaded:

ollama list

Output example:

NAME                    ID              SIZE    MODIFIED
mistral:latest          2ae6f6dd4470    4.1 GB  2 minutes ago
llama2:latest           78e26419b446    3.8 GB  1 hour ago
neural-chat:latest      42ab0d3f4b51    4.1 GB  3 hours ago

Removing Models

Delete a model to free up disk space:

ollama rm mistral

Remove all models:

ollama rm -a

Updating Models

Pull the latest version of a model:

ollama pull mistral

Ollama automatically downloads the latest version if available.

Configuring RiceCoder for Ollama

Step 1: Ensure Ollama is Running

First, make sure Ollama is running on your system:

# Windows (PowerShell)
curl http://localhost:11434/api/tags

# macOS/Linux
curl http://localhost:11434/api/tags

If you get a connection error, start Ollama:

# macOS
ollama serve

# Linux
sudo systemctl start ollama

# Windows - Ollama runs as a service automatically

Step 2: Configure RiceCoder

Edit your RiceCoder configuration file:

Global Configuration (~/.ricecoder/config.yaml):

provider: ollama
ollama-url: http://localhost:11434
model: mistral

Project Configuration (.agent/config.yaml):

provider: ollama
model: mistral

Step 3: Verify Configuration

Check that RiceCoder can connect to Ollama:

rice config show

You should see:

provider: ollama
ollama-url: http://localhost:11434
model: mistral

Step 4: Test the Connection

Start a chat session:

rice chat

Type a message and press Enter. If everything is configured correctly, you should get a response from the local model.

Configuration Options

`provider`

Set to ollama to use local models:

provider: ollama

`ollama-url`

URL where Ollama is running (default: http://localhost:11434):

ollama-url: http://localhost:11434

If Ollama is on a different machine:

ollama-url: http://192.168.1.100:11434

`model`

Which model to use:

model: mistral

Other options:

model: llama2
model: neural-chat
model: codeup

`ollama-timeout`

Timeout for Ollama requests in seconds (default: 300):

ollama-timeout: 600  # 10 minutes for complex tasks

Usage Examples

Basic Chat

Start a chat session with your local model:

rice chat

Then type your questions:

> What is Rust?

The model will respond with information about Rust.

Code Generation

Generate code using your local model:

rice chat
> Generate a Rust function that validates email addresses

The model will generate code based on your request.

Code Review

Ask the model to review code:

rice chat
> Review this function for potential issues:
> 
> fn process_data(data: &str) -> String {
>     data.to_uppercase()
> }

Spec-Driven Development

Use local models with specs:

rice spec create my-feature
rice spec design my-feature
rice gen my-feature

The generation will use your local model instead of cloud providers.

Performance Optimization

Hardware Considerations

CPU-Only Performance:

Mistral 7B: ~5-10 tokens/second
Llama2 7B: ~3-8 tokens/second
Smaller models: ~10-20 tokens/second

GPU Performance (NVIDIA):

Mistral 7B: ~50-100 tokens/second
Llama2 7B: ~40-80 tokens/second
Larger models: ~20-50 tokens/second

Optimization Tips

1. Use Smaller Models for Speed

If speed is critical, use smaller models:

model: orca-mini  # 3B, very fast

2. Increase Timeout for Complex Tasks

For complex code generation, increase the timeout:

ollama-timeout: 900  # 15 minutes

3. Use GPU Acceleration

If you have an NVIDIA GPU, Ollama automatically uses it. Verify:

ollama list

Look for GPU memory usage in the output.

4. Adjust Model Parameters

For faster responses, use fewer tokens:

rice chat --max-tokens 500

5. Run Ollama on a Dedicated Machine

For team environments, run Ollama on a dedicated server:

ollama-url: http://ollama-server.local:11434

Memory Management

Monitor Memory Usage:

# macOS/Linux
top  # Look for ollama process

# Windows
tasklist | findstr ollama

Reduce Memory Usage:

Use smaller models
Reduce context window
Run only one model at a time

Free Up Memory:

ollama rm model-name

Troubleshooting

"Connection refused" Error

Problem: RiceCoder can't connect to Ollama

Solution:

Check if Ollama is running:

curl http://localhost:11434/api/tags

If not running, start Ollama:

# macOS
ollama serve

# Linux
sudo systemctl start ollama

# Windows - restart the Ollama service

Check the URL in configuration:

rice config get ollama-url

If using a different URL, update it:

rice config set ollama-url http://your-ollama-url:11434

"Model not found" Error

Problem: The specified model doesn't exist

Solution:

Check available models:

ollama list

Pull the model:

ollama pull mistral

Update configuration:

rice config set model mistral

"Out of memory" Error

Problem: Ollama runs out of memory

Solution:

Use a smaller model:

ollama pull orca-mini
rice config set model orca-mini

Increase system memory or swap
Close other applications
Reduce context window size

"Timeout" Error

Problem: Ollama takes too long to respond

Solution:

Increase timeout:

ollama-timeout: 900  # 15 minutes

Use a faster model:

ollama pull mistral
rice config set model mistral

Check system resources (CPU, memory, disk)
Reduce model size or complexity

"Slow Response" Issue

Problem: Model responses are very slow

Solution:

Check if GPU is being used:

ollama list

If CPU-only, consider:
- Using a smaller model
- Getting a GPU
- Running Ollama on a more powerful machine
Monitor system resources:

# macOS/Linux
top

# Windows
tasklist

Close other applications to free resources

"Model Download Fails"

Problem: Can't download a model

Solution:

Check internet connection:

ping ollama.ai

Try again (temporary network issue):

ollama pull mistral

Check disk space:

# macOS/Linux
df -h

# Windows
dir C:\

Try a different model:

ollama pull llama2

"Ollama Won't Start"

Problem: Ollama service won't start

Solution:

Check if port 11434 is in use:

# macOS/Linux
lsof -i :11434

# Windows
netstat -ano | findstr :11434

If in use, kill the process or change the port
Restart Ollama:

# macOS
killall ollama
ollama serve

# Linux
sudo systemctl restart ollama

# Windows
# Restart the Ollama service from Services

Check logs for errors:

# macOS/Linux
tail -f ~/.ollama/logs/server.log

# Windows
# Check Event Viewer

Best Practices

1. Start with Mistral

Mistral is a good balance of speed and quality. Start here:

model: mistral

2. Use Project Configuration

Store model preferences in .agent/config.yaml:

provider: ollama
model: mistral

3. Monitor Resource Usage

Keep an eye on CPU, memory, and disk:

# macOS/Linux
top

# Windows
tasklist

4. Keep Models Updated

Periodically pull the latest versions:

ollama pull mistral

5. Document Model Choices

Explain why you chose a specific model:

# .agent/config.yaml
# Using Mistral for fast local development
# Switch to OpenAI for production
provider: ollama
model: mistral

6. Test Before Committing

Always test generated code before committing:

rice chat
> Generate a function to validate emails
# Review the generated code
# Test it locally
# Then commit

7. Use Appropriate Models for Tasks

Code generation: codeup, phind-codellama
General chat: mistral, neural-chat
Lightweight: orca-mini
Powerful: mistral-large, llama2-uncensored

Local Models

Local Models Guide

Overview

What is Ollama?

Benefits of Local Models

When to Use Local Models

Installation

Prerequisites

Windows Installation

Step 1: Download Ollama

Step 2: Install

Step 3: Verify Installation

Step 4: Start Ollama

macOS Installation

Step 1: Download Ollama

Step 2: Install

Step 3: Verify Installation

Step 4: Start Ollama

Linux Installation

Step 1: Download and Install

Step 2: Verify Installation

Step 3: Start Ollama

Model Management

Pulling Models

Pull a Model

Available Models

Pull Multiple Models

Listing Models

Removing Models

Updating Models

Configuring RiceCoder for Ollama

Step 1: Ensure Ollama is Running

Step 2: Configure RiceCoder

Step 3: Verify Configuration

Step 4: Test the Connection

Configuration Options

provider

ollama-url

model

ollama-timeout

Usage Examples

Basic Chat

Code Generation

Code Review

Spec-Driven Development

Performance Optimization

Hardware Considerations

Optimization Tips

1. Use Smaller Models for Speed

2. Increase Timeout for Complex Tasks

3. Use GPU Acceleration

4. Adjust Model Parameters

5. Run Ollama on a Dedicated Machine

Memory Management

Troubleshooting

"Connection refused" Error

"Model not found" Error

"Out of memory" Error

"Timeout" Error

"Slow Response" Issue

"Model Download Fails"

"Ollama Won't Start"

Best Practices

1. Start with Mistral

2. Use Project Configuration

3. Monitor Resource Usage

4. Keep Models Updated

5. Document Model Choices

6. Test Before Committing

7. Use Appropriate Models for Tasks

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`provider`

`ollama-url`

`model`

`ollama-timeout`