Skip to content

server: "prompt cache is enabled, size limit: 8192 MiB" still logged when --cache-ram 0 is set #22127

@rmurray484

Description

@rmurray484

Summary

llama-server prints an informational log at model-load time claiming the prompt cache is enabled with a 8192 MiB limit, even when the user has explicitly disabled the cache via --cache-ram 0. The message also suggests the user pass --cache-ram 0 to disable — which is ironic, since in our run the flag is already passed. The log message simply doesn't reflect the effective runtime state.

This is cosmetic (no functional impact) but actively misleading when diagnosing cache-related issues, because a user debugging heap/memory problems will read the log and conclude the cache is still active when it isn't.

Environment

  • llama.cpp SHA: 9e5647aff (build b8840)
  • Flag: --cache-ram 0 on the command line (also LLAMA_ARG_CACHE_RAM=0 in env, belt-and-suspenders)
  • Also passing --no-cache-prompt (separate flag, also set)

Reproducer

Start the server with the cache explicitly disabled:

llama-server -m <any model> --cache-ram 0 --host 127.0.0.1 --port 8080

Observe in the stdout log around model-load time:

srv    load_model: prompt cache is enabled, size limit: 8192 MiB
srv    load_model: use `--cache-ram 0` to disable the prompt cache

Expected vs. actual

Expected (when --cache-ram 0):

srv    load_model: prompt cache is disabled (--cache-ram 0)

or equivalent that reflects the effective config.

Actual:
The message prints unconditionally with the default 8192 MiB text, regardless of the --cache-ram argument.

Confirmation the cache is actually disabled

We confirmed via runtime RSS that with --cache-ram 0 the cache is in fact off:

Config Qwen2.5-VL 7B + mmproj RSS under load (600 req)
default --cache-ram 8192 ~11.9 GB
--cache-ram 0 ~1.0 GB

So the flag works, only the log line lies.

Suggested fix direction

Move the log print to after the cache is actually initialized, and branch on the effective size:

if (cache_ram_mib == 0) {
    LOG_INF("srv    load_model: prompt cache disabled (--cache-ram 0)\n");
} else {
    LOG_INF("srv    load_model: prompt cache enabled, size limit: %d MiB\n", cache_ram_mib);
    LOG_INF("srv    load_model: use `--cache-ram 0` to disable the prompt cache\n");
}

(Exact location likely in the code introduced by PR #16391.)

Why this matters

Right now, anybody trying to debug an issue that points at the prompt cache (e.g. issue #21336, the heap-corruption companion issue we're filing alongside this) will look at the log, see "prompt cache is enabled", and waste time assuming the flag didn't take effect. This is a small change that meaningfully improves the diagnostic surface.

Good first issue

This looks like a clean self-contained change in the logging code path for somebody new to the project — one file, one function, trivial to test (start server with/without --cache-ram 0, grep the log).

Related


Reported by Claude et Richard Murray — colossus-ia.org


Suggested labels for triage: bug, good first issue, area server.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions