-
Notifications
You must be signed in to change notification settings - Fork 0
Local Models
Status: ✅ Complete
Last Updated: December 3, 2025
Local models allow you to run AI models directly on your machine using Ollama, enabling offline development, enhanced privacy, and reduced costs. This guide covers everything you need to know about using local models with RiceCoder.
Ollama is an open-source tool that makes it easy to run large language models locally. Instead of sending your code and conversations to cloud providers, Ollama runs models on your own hardware, giving you complete control over your data.
Privacy: Your code and conversations never leave your machine. No data is sent to external servers.
Offline Development: Work without internet connectivity. Perfect for airplanes, trains, or areas with poor connectivity.
Cost Savings: No API fees. Run unlimited models after the initial download.
Customization: Fine-tune models for your specific use cases and domains.
Speed: Instant responses without network latency (depending on your hardware).
Control: Complete control over model versions and behavior.
Use local models when:
- You need privacy and data security
- You work offline frequently
- You want to reduce costs
- You're developing locally and don't need cloud features
- You want to experiment with different models
Use cloud providers when:
- You need the latest, most powerful models
- You want to minimize hardware requirements
- You need guaranteed uptime and support
- You're working in a team with shared infrastructure
Before installing Ollama, ensure your system meets the requirements:
Hardware Requirements:
- Minimum: 4GB RAM (8GB recommended)
- GPU (optional but recommended): NVIDIA, AMD, or Apple Silicon for faster inference
- Disk Space: 10-50GB depending on models you want to run
System Requirements:
- Windows 10/11, macOS 11+, or Linux
- Administrator/sudo access for installation
- Visit https://ollama.ai
- Click "Download for Windows"
- Run the installer (
OllamaSetup.exe)
- Follow the installation wizard
- Accept the license agreement
- Choose installation location (default:
C:\Users\{username}\AppData\Local\Programs\Ollama) - Click "Install"
Open PowerShell and run:
ollama --versionYou should see the version number.
Ollama runs as a background service on Windows. It starts automatically after installation.
To verify it's running:
curl http://localhost:11434/api/tagsYou should get a JSON response.
- Visit https://ollama.ai
- Click "Download for macOS"
- Choose the appropriate version:
-
Apple Silicon (M1/M2/M3):
ollama-darwin-arm64.zip -
Intel:
ollama-darwin-amd64.zip
-
Apple Silicon (M1/M2/M3):
- Open the downloaded
.dmgfile - Drag Ollama to the Applications folder
- Wait for the copy process to complete
Open Terminal and run:
ollama --versionYou should see the version number.
Ollama runs as a background service on macOS. Start it with:
ollama serveOr use Spotlight to launch Ollama from Applications.
Open Terminal and run:
curl https://ollama.ai/install.sh | shThis script will:
- Download Ollama
- Install it to
/usr/local/bin/ollama - Set up systemd service
ollama --versionYou should see the version number.
Start the Ollama service:
sudo systemctl start ollamaEnable it to start on boot:
sudo systemctl enable ollamaVerify it's running:
curl http://localhost:11434/api/tagsYou should get a JSON response.
Before you can use a model, you need to pull it from the Ollama library.
ollama pull mistralThis downloads the Mistral model (about 4GB). The first pull takes time depending on your internet speed.
Popular models available on Ollama:
Lightweight Models (fast, low memory):
-
mistral- 7B parameters, fast and efficient -
neural-chat- 7B parameters, optimized for chat -
orca-mini- 3B parameters, very lightweight
Balanced Models (good quality, moderate resources):
-
llama2- 7B parameters, general purpose -
neural-chat- 7B parameters, chat optimized -
dolphin-mixtral- 8x7B mixture of experts
Powerful Models (high quality, high resource):
-
mistral-large- 34B parameters, very capable -
llama2-uncensored- 7B parameters, less restricted -
neural-chat-7b-v3- 7B parameters, latest version
Specialized Models:
-
codeup- Optimized for code generation -
phind-codellama- Code-focused model -
sqlcoder- SQL generation
For a complete list, visit https://ollama.ai/library
ollama pull mistral
ollama pull llama2
ollama pull neural-chatView all models you've downloaded:
ollama listOutput example:
NAME ID SIZE MODIFIED
mistral:latest 2ae6f6dd4470 4.1 GB 2 minutes ago
llama2:latest 78e26419b446 3.8 GB 1 hour ago
neural-chat:latest 42ab0d3f4b51 4.1 GB 3 hours ago
Delete a model to free up disk space:
ollama rm mistralRemove all models:
ollama rm -aPull the latest version of a model:
ollama pull mistralOllama automatically downloads the latest version if available.
First, make sure Ollama is running on your system:
# Windows (PowerShell)
curl http://localhost:11434/api/tags
# macOS/Linux
curl http://localhost:11434/api/tagsIf you get a connection error, start Ollama:
# macOS
ollama serve
# Linux
sudo systemctl start ollama
# Windows - Ollama runs as a service automaticallyEdit your RiceCoder configuration file:
Global Configuration (~/.ricecoder/config.yaml):
provider: ollama
ollama-url: http://localhost:11434
model: mistralProject Configuration (.agent/config.yaml):
provider: ollama
model: mistralCheck that RiceCoder can connect to Ollama:
rice config showYou should see:
provider: ollama
ollama-url: http://localhost:11434
model: mistral
Start a chat session:
rice chatType a message and press Enter. If everything is configured correctly, you should get a response from the local model.
Set to ollama to use local models:
provider: ollamaURL where Ollama is running (default: http://localhost:11434):
ollama-url: http://localhost:11434If Ollama is on a different machine:
ollama-url: http://192.168.1.100:11434Which model to use:
model: mistralOther options:
model: llama2
model: neural-chat
model: codeupTimeout for Ollama requests in seconds (default: 300):
ollama-timeout: 600 # 10 minutes for complex tasksStart a chat session with your local model:
rice chatThen type your questions:
> What is Rust?
The model will respond with information about Rust.
Generate code using your local model:
rice chat
> Generate a Rust function that validates email addressesThe model will generate code based on your request.
Ask the model to review code:
rice chat
> Review this function for potential issues:
>
> fn process_data(data: &str) -> String {
> data.to_uppercase()
> }Use local models with specs:
rice spec create my-feature
rice spec design my-feature
rice gen my-featureThe generation will use your local model instead of cloud providers.
CPU-Only Performance:
- Mistral 7B: ~5-10 tokens/second
- Llama2 7B: ~3-8 tokens/second
- Smaller models: ~10-20 tokens/second
GPU Performance (NVIDIA):
- Mistral 7B: ~50-100 tokens/second
- Llama2 7B: ~40-80 tokens/second
- Larger models: ~20-50 tokens/second
If speed is critical, use smaller models:
model: orca-mini # 3B, very fastFor complex code generation, increase the timeout:
ollama-timeout: 900 # 15 minutesIf you have an NVIDIA GPU, Ollama automatically uses it. Verify:
ollama listLook for GPU memory usage in the output.
For faster responses, use fewer tokens:
rice chat --max-tokens 500For team environments, run Ollama on a dedicated server:
ollama-url: http://ollama-server.local:11434Monitor Memory Usage:
# macOS/Linux
top # Look for ollama process
# Windows
tasklist | findstr ollamaReduce Memory Usage:
- Use smaller models
- Reduce context window
- Run only one model at a time
Free Up Memory:
ollama rm model-nameProblem: RiceCoder can't connect to Ollama
Solution:
- Check if Ollama is running:
curl http://localhost:11434/api/tags- If not running, start Ollama:
# macOS
ollama serve
# Linux
sudo systemctl start ollama
# Windows - restart the Ollama service- Check the URL in configuration:
rice config get ollama-url- If using a different URL, update it:
rice config set ollama-url http://your-ollama-url:11434Problem: The specified model doesn't exist
Solution:
- Check available models:
ollama list- Pull the model:
ollama pull mistral- Update configuration:
rice config set model mistralProblem: Ollama runs out of memory
Solution:
- Use a smaller model:
ollama pull orca-mini
rice config set model orca-mini- Increase system memory or swap
- Close other applications
- Reduce context window size
Problem: Ollama takes too long to respond
Solution:
- Increase timeout:
ollama-timeout: 900 # 15 minutes- Use a faster model:
ollama pull mistral
rice config set model mistral- Check system resources (CPU, memory, disk)
- Reduce model size or complexity
Problem: Model responses are very slow
Solution:
- Check if GPU is being used:
ollama list-
If CPU-only, consider:
- Using a smaller model
- Getting a GPU
- Running Ollama on a more powerful machine
-
Monitor system resources:
# macOS/Linux
top
# Windows
tasklist- Close other applications to free resources
Problem: Can't download a model
Solution:
- Check internet connection:
ping ollama.ai- Try again (temporary network issue):
ollama pull mistral- Check disk space:
# macOS/Linux
df -h
# Windows
dir C:\- Try a different model:
ollama pull llama2Problem: Ollama service won't start
Solution:
- Check if port 11434 is in use:
# macOS/Linux
lsof -i :11434
# Windows
netstat -ano | findstr :11434- If in use, kill the process or change the port
- Restart Ollama:
# macOS
killall ollama
ollama serve
# Linux
sudo systemctl restart ollama
# Windows
# Restart the Ollama service from Services- Check logs for errors:
# macOS/Linux
tail -f ~/.ollama/logs/server.log
# Windows
# Check Event ViewerMistral is a good balance of speed and quality. Start here:
model: mistralStore model preferences in .agent/config.yaml:
provider: ollama
model: mistralKeep an eye on CPU, memory, and disk:
# macOS/Linux
top
# Windows
tasklistPeriodically pull the latest versions:
ollama pull mistralExplain why you chose a specific model:
# .agent/config.yaml
# Using Mistral for fast local development
# Switch to OpenAI for production
provider: ollama
model: mistralAlways test generated code before committing:
rice chat
> Generate a function to validate emails
# Review the generated code
# Test it locally
# Then commit-
Code generation:
codeup,phind-codellama -
General chat:
mistral,neural-chat -
Lightweight:
orca-mini -
Powerful:
mistral-large,llama2-uncensored
- Configuration Guide - All configuration options
- AI Providers Guide - Comparing providers
- Quick Start Guide - Get started quickly
- Troubleshooting Guide - Common issues and solutions
- Ollama Official Site - Official Ollama documentation
Last updated: December 3, 2025