Skip to content

Add Pascal GPU (sm_61/sm_62) support to llama.cpp CUDA build #929

@FlorentPoinsaut

Description

@FlorentPoinsaut

Summary

Docker Model Runner's bundled llama.cpp binary is compiled for CUDA architectures
750, 800, 860, 890, 1200, 1210 (Turing and newer), which excludes Pascal GPUs
(GTX 1080, 1080 Ti, 1070, 1060, etc. — sm_61/sm_62).

As a result, users with Pascal GPUs see the following at startup:

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
warning: no usable GPU found, --gpu-layers option will be ignored

And the runner logs gpuSupport=false, falling back entirely to CPU inference.

Environment

  • GPU: NVIDIA GeForce GTX 1080 Ti (Pascal, sm_61)
  • Driver: 580.159.03 / CUDA 13.0
  • Docker Model Runner: v1.1.38 (Docker Engine, Linux)
  • llama.cpp backend: Running (fc2b0053f)

Expected behavior

llama.cpp should use the GPU for inference on Pascal hardware, as CUDA 12/13
still fully supports sm_60, sm_61, and sm_62 via nvcc.

Root cause

The llamacpp/native/cuda.Dockerfile
does not set -DCMAKE_CUDA_ARCHITECTURES, so CMake uses the CUDA SDK defaults
which no longer include Pascal in recent versions.

Suggested fix

Add Pascal architectures to the CMake flags in cuda.Dockerfile:

RUN echo "-B build \
    -DGGML_CUDA=ON \
    -DCMAKE_CUDA_ARCHITECTURES='61;62;70;75;80;86;89;120;121' \
    ..." > cmake-flags

This would extend GPU support to a large installed base of Pascal GPUs with no
functional downside — only a slightly larger binary and longer build time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions