Skip to content

Nondeterministic CPU inference with --image despite --threads 1 due to OpenMP in OpenBLAS #22956

@nh2

Description

@nh2

Hi, I tried to get fully deterministic CPU inference, using --temp 0 with --threads 1.

It gave deterministic output with small text prompts, but was nondeterministic when passing --image to llama-cli.

I found as workaround: OMP_NUM_THREADS=1.

That was weird: Why wouldn't --threads 1 already achieve that, given that llama-cpp propagates that into a openblas_set_num_threads(ctx->n_threads) call?

I dug into OpenBLAS and found a bug that openblas_set_num_threads() is ineffective, using OpenMP's default number of threads anyway.

I reported that as an OpenBLAS issue:

I also made an OpenBLAS PR to fix it:

With those, llama-cpp should hopefully be deterministic when run on 1 thread on the CPU.

So this ticket mainly tracks whether those get merged, and ideally should be closed then. Some open questions I have regarding the CPU determinism though:

  • Is this enough? How deterministic do llama.cpp expect it to be on CPUs?
  • There is code that seems to invoke BLAS also for non---image, if matrices are large enough. So maybe text-only prompts may have been nondeterministic as well?
  • Can we have multi-thread implementation that is fully deterministic (e.g. does parallel maps with deterministic reductions) so that deterministic runs aren't so slow?
  • It would be great to have:
    • Some docs that describe what's already deterministic (CPU?) and what isn't (GPU?).
    • Some tests that check whether --temp 0 --threads 1 is deterministic, so that this issue I found would have been caught.

Environment:

  • NixOS Linux 25.11
  • llama-cpp 8983

Invocation example:

llama-cli \
  --single-turn --no-display-prompt --log-verbosity 0 \
  --jinja --temp 0 --threads 1 --n-gpu-layers 0 \
  --model ./gemma-4-E2B-it-Q4_0.gguf \
  --mmproj ./mmproj-gemma-4-E2B-it-F16.gguf \
  --image myimage.png \
  -p 'Describe the image'

Pinned model URLs: gemma-4-E2B-it-Q4_0.gguf, mmproj-gemma-4-E2B-it-F16.gguf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions