Summary
Docker Model Runner's bundled llama.cpp binary is compiled for CUDA architectures
750, 800, 860, 890, 1200, 1210 (Turing and newer), which excludes Pascal GPUs
(GTX 1080, 1080 Ti, 1070, 1060, etc. — sm_61/sm_62).
As a result, users with Pascal GPUs see the following at startup:
ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
warning: no usable GPU found, --gpu-layers option will be ignored
And the runner logs gpuSupport=false, falling back entirely to CPU inference.
Environment
- GPU: NVIDIA GeForce GTX 1080 Ti (Pascal, sm_61)
- Driver: 580.159.03 / CUDA 13.0
- Docker Model Runner: v1.1.38 (Docker Engine, Linux)
- llama.cpp backend: Running (fc2b0053f)
Expected behavior
llama.cpp should use the GPU for inference on Pascal hardware, as CUDA 12/13
still fully supports sm_60, sm_61, and sm_62 via nvcc.
Root cause
The llamacpp/native/cuda.Dockerfile
does not set -DCMAKE_CUDA_ARCHITECTURES, so CMake uses the CUDA SDK defaults
which no longer include Pascal in recent versions.
Suggested fix
Add Pascal architectures to the CMake flags in cuda.Dockerfile:
RUN echo "-B build \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES='61;62;70;75;80;86;89;120;121' \
..." > cmake-flags
This would extend GPU support to a large installed base of Pascal GPUs with no
functional downside — only a slightly larger binary and longer build time.
Summary
Docker Model Runner's bundled llama.cpp binary is compiled for CUDA architectures
750, 800, 860, 890, 1200, 1210(Turing and newer), which excludes Pascal GPUs(GTX 1080, 1080 Ti, 1070, 1060, etc. —
sm_61/sm_62).As a result, users with Pascal GPUs see the following at startup:
And the runner logs
gpuSupport=false, falling back entirely to CPU inference.Environment
Expected behavior
llama.cpp should use the GPU for inference on Pascal hardware, as CUDA 12/13
still fully supports
sm_60,sm_61, andsm_62via nvcc.Root cause
The
llamacpp/native/cuda.Dockerfiledoes not set
-DCMAKE_CUDA_ARCHITECTURES, so CMake uses the CUDA SDK defaultswhich no longer include Pascal in recent versions.
Suggested fix
Add Pascal architectures to the CMake flags in
cuda.Dockerfile:This would extend GPU support to a large installed base of Pascal GPUs with no
functional downside — only a slightly larger binary and longer build time.