Instantly install llama.cpp.
installama.sh is a simple script that downloads and sets up a prebuilt llama-server binary for your system.
It automatically detects your OS, architecture, and GPU capabilities, so you can start using llama.cpp in seconds.
- Supported architectures:
x86_64,aarch64. - Supported OS:
Linux,macOS,FreeBSD,Windows. - Automatic detection for CPU acceleration.
- Automatic detection for GPU acceleration:
CUDA,ROCm,Vulkan,Metal. - Builds are kept as lightweight as possible without compromising performance.
See the full list of supported hardware and build configurations in PRESETS.md. Check REQUIREMENTS.md for the detailed requirements, including minimum OS versions and runtime library dependencies.
Run the following command in your terminal:
curl https://installama.sh | sh
Launch the server:
~/.installama/server -hf unsloth/Qwen3-4B-GGUF:Q4_0
In some scenarios, you may want to skip detection for specific backends.
You can do this by setting environment variables before piping to sh:
curl https://installama.sh | SKIP_CUDA=1 sh
Available options: SKIP_CUDA=1, SKIP_ROCM=1, SKIP_VULKAN=1.
Run the following command in PowerShell:
irm https://installama.sh | iex
Launch the server:
& $env:USERPROFILE\installama\server.exe -hf unsloth/Qwen3-4B-GGUF:Q4_0
Once the server is running with your chosen model, simply open your browser and navigate to:
If it doesn't work on your system, please create an issue.