A tool for benchmarking local LLM (Large Language Model) performance via the Ollama API and AMD STRIX Halo GPUs. This tool measures the token generation speed of different models and presents the results in an interactive web interface with tables and charts.

- 📊 Benchmark multiple LLM models with a single command
- ⚡ Measure tokens per second for each model
- 💾 Store results in CSV format and SQLite database
- 🌐 Beautiful web interface with:
- Interactive bar charts
- Detailed results table
- Performance statistics
- Real-time refresh capability
- 🎮 NEW: AMD STRIX Halo GPU support for llama.cpp benchmarking
- Node.js (v14 or higher)
- For Ollama benchmarks:
- Ollama installed and running locally
- At least one LLM model downloaded in Ollama
- For AMD STRIX Halo benchmarks:
- AMD Ryzen AI Max "Strix Halo" APU
- Fedora 42/43 or Ubuntu 24.04
- Toolbox (podman-toolbox) installed
- See STRIX_HALO.md for detailed requirements
- Clone this repository:
git clone https://github.com/companionintelligence/Local-Bench.git
cd Local-Bench- Install dependencies:
npm install- Make sure Ollama is running:
# Ollama should be running on http://localhost:11434
# Start it if it's not already running
ollama serve- Download some models (if you haven't already):
ollama pull llama2
ollama pull mistral
ollama pull codellamaDefault models:
npm run benchmarkThis will benchmark the default models: llama2, mistral, codellama, and phi.
Custom models:
node benchmark.js llama2 mistral phiYou can specify any models you have installed in Ollama.
Custom Ollama API URL:
OLLAMA_API_URL=http://localhost:11434 npm run benchmarkFor AMD Ryzen AI Max "Strix Halo" GPU benchmarking with llama.cpp:
Detect STRIX Halo GPU:
npm run strix-halo detectSetup toolbox (one-time):
npm run strix-halo setup llama-rocm-7.2Run benchmark:
npm run strix-halo benchmark /path/to/model.gguf --toolbox llama-rocm-7.2See full documentation: STRIX_HALO.md - Complete guide for AMD STRIX Halo integration
- Start the web server:
npm start- Open your browser and navigate to:
http://localhost:3000
- The web interface will display:
- Summary statistics (total models, average speed, fastest model)
- Interactive bar chart comparing model performance
- Detailed results table with all benchmark data
You can also directly open index.html in your browser if the CSV file is in the same directory.
PORT=8080 npm startResults are saved to benchmark_results.csv in the following format:
Model,Tokens Per Second,Total Tokens,Duration (s),Timestamp,Status
llama2,45.23,120,2.65,2024-01-15T10:30:00.000Z,Success
mistral,52.18,125,2.40,2024-01-15T10:32:30.000Z,SuccessYou can customize the following by editing benchmark.js:
OLLAMA_API_URL: Ollama API endpoint (default:http://localhost:11434)TEST_PROMPT: The prompt used for benchmarkingDEFAULT_MODELS: Default models to benchmark
Cannot connect to Ollama API:
- Make sure Ollama is running:
ollama serve - Check that the API is accessible at
http://localhost:11434 - Try setting a custom URL:
OLLAMA_API_URL=http://your-url npm run benchmark
Model not found:
- List available models:
ollama list - Pull the model:
ollama pull <model-name>
Benchmark takes too long:
- Some models are slower than others
- The tool has a 2-minute timeout per model
- Consider benchmarking fewer models at once
Contributions are welcome! Please feel free to submit a Pull Request.
| Name | Size | Context | Input |
|---|---|---|---|
| gemma3:270m | 292MB | 32K | Text |
| qwen3:0.6b | 523MB | 40K | Text |
| gemma3:1b | 815MB | 32K | Text |
| deepseek-r1:1.5b | 1.1GB | 128K | Text |
| llama3.2:1b | 1.3GB | 128K | Text |
| qwen3:1.7b | 1.4GB | 40K | Text |
| qwen3-vl:2b | 1.9GB | 256K | Text, Image |
| llama3.2:3b latest | 2.0GB | 128K | Text |
| qwen3:4b | 2.5GB | 256K | Text |
| gemma3:4b latest | 3.3GB | 128K | Text, Image |
| qwen3-vl:4b | 3.3GB | 256K | Text, Image |
| deepseek-r1:7b | 4.7GB | 128K | Text |
| llama3.1:8b latest | 4.9GB | 128K | Text |
| deepseek-r1:8b latest | 5.2GB | 128K | Text |
| qwen3:8b latest | 5.2GB | 40K | Text |
| qwen3-vl:8b latest | 6.1GB | 256K | Text, Image |
| gemma3:12b | 8.1GB | 128K | Text, Image |
| deepseek-r1:14b | 9.0GB | 128K | Text |
| qwen3:14b | 9.3GB | 40K | Text |
| gpt-oss:20b | 14GB | 128K | Text |
| gemma3:27b | 17GB | 128K | Text, Image |
| qwen3-coder:latest | 19GB | 256K | Text |
| qwen3-coder:30b latest | 19GB | 256K | Text |
| qwen3:30b | 19GB | 256K | Text |
| deepseek-r1:32b | 20GB | 128K | Text |
| qwen3:32b | 20GB | 40K | Text |
| qwen3-vl:30b | 20GB | 256K | Text, Image |
| qwen3-vl:32b | 21GB | 256K | |
| deepseek-r1:70b | 43GB | 128K | Text |
| llama3.1:70b | 43GB | 128K | Text |
| gpt-oss:120b | 65GB | 128K | Text |
| llama4:16x17b latest | 67GB | 10M | Text, Image |
| GLM-4.6:TQ1_0 | 84GB | 198K | Text |
| qwen3:235b | 142GB | 256K | Text |
| qwen3-vl:235b | 143GB | 256K | |
| GLM-4.6:Q4_K_M | 216GB | 198K | Text |
| llama3.1:405b | 243GB | 128K | Text |
| llama4:128x17b | 245GB | 1M | Text, Image |
| qwen3-coder:480b | 290GB | 256K | Text |
| deepseek-v3.1:671b latest | 404GB | 160K | Text |
| deepseek-r1:671b | 404GB | 160K | Text |
| minmax m2 | 968GB | 200K | Text |