Summary
llama-server prints an informational log at model-load time claiming the prompt cache is enabled with a 8192 MiB limit, even when the user has explicitly disabled the cache via --cache-ram 0. The message also suggests the user pass --cache-ram 0 to disable — which is ironic, since in our run the flag is already passed. The log message simply doesn't reflect the effective runtime state.
This is cosmetic (no functional impact) but actively misleading when diagnosing cache-related issues, because a user debugging heap/memory problems will read the log and conclude the cache is still active when it isn't.
Environment
- llama.cpp SHA:
9e5647aff (build b8840)
- Flag:
--cache-ram 0 on the command line (also LLAMA_ARG_CACHE_RAM=0 in env, belt-and-suspenders)
- Also passing
--no-cache-prompt (separate flag, also set)
Reproducer
Start the server with the cache explicitly disabled:
llama-server -m <any model> --cache-ram 0 --host 127.0.0.1 --port 8080
Observe in the stdout log around model-load time:
srv load_model: prompt cache is enabled, size limit: 8192 MiB
srv load_model: use `--cache-ram 0` to disable the prompt cache
Expected vs. actual
Expected (when --cache-ram 0):
srv load_model: prompt cache is disabled (--cache-ram 0)
or equivalent that reflects the effective config.
Actual:
The message prints unconditionally with the default 8192 MiB text, regardless of the --cache-ram argument.
Confirmation the cache is actually disabled
We confirmed via runtime RSS that with --cache-ram 0 the cache is in fact off:
| Config |
Qwen2.5-VL 7B + mmproj RSS under load (600 req) |
default --cache-ram 8192 |
~11.9 GB |
--cache-ram 0 |
~1.0 GB |
So the flag works, only the log line lies.
Suggested fix direction
Move the log print to after the cache is actually initialized, and branch on the effective size:
if (cache_ram_mib == 0) {
LOG_INF("srv load_model: prompt cache disabled (--cache-ram 0)\n");
} else {
LOG_INF("srv load_model: prompt cache enabled, size limit: %d MiB\n", cache_ram_mib);
LOG_INF("srv load_model: use `--cache-ram 0` to disable the prompt cache\n");
}
(Exact location likely in the code introduced by PR #16391.)
Why this matters
Right now, anybody trying to debug an issue that points at the prompt cache (e.g. issue #21336, the heap-corruption companion issue we're filing alongside this) will look at the log, see "prompt cache is enabled", and waste time assuming the flag didn't take effect. This is a small change that meaningfully improves the diagnostic surface.
Good first issue
This looks like a clean self-contained change in the logging code path for somebody new to the project — one file, one function, trivial to test (start server with/without --cache-ram 0, grep the log).
Related
Reported by Claude et Richard Murray — colossus-ia.org
Suggested labels for triage: bug, good first issue, area server.
Summary
llama-serverprints an informational log at model-load time claiming the prompt cache is enabled with a 8192 MiB limit, even when the user has explicitly disabled the cache via--cache-ram 0. The message also suggests the user pass--cache-ram 0to disable — which is ironic, since in our run the flag is already passed. The log message simply doesn't reflect the effective runtime state.This is cosmetic (no functional impact) but actively misleading when diagnosing cache-related issues, because a user debugging heap/memory problems will read the log and conclude the cache is still active when it isn't.
Environment
9e5647aff(buildb8840)--cache-ram 0on the command line (alsoLLAMA_ARG_CACHE_RAM=0in env, belt-and-suspenders)--no-cache-prompt(separate flag, also set)Reproducer
Start the server with the cache explicitly disabled:
Observe in the stdout log around model-load time:
Expected vs. actual
Expected (when
--cache-ram 0):or equivalent that reflects the effective config.
Actual:
The message prints unconditionally with the default 8192 MiB text, regardless of the
--cache-ramargument.Confirmation the cache is actually disabled
We confirmed via runtime RSS that with
--cache-ram 0the cache is in fact off:--cache-ram 8192--cache-ram 0So the flag works, only the log line lies.
Suggested fix direction
Move the log print to after the cache is actually initialized, and branch on the effective size:
(Exact location likely in the code introduced by PR #16391.)
Why this matters
Right now, anybody trying to debug an issue that points at the prompt cache (e.g. issue #21336, the heap-corruption companion issue we're filing alongside this) will look at the log, see "prompt cache is enabled", and waste time assuming the flag didn't take effect. This is a small change that meaningfully improves the diagnostic surface.
Good first issue
This looks like a clean self-contained change in the logging code path for somebody new to the project — one file, one function, trivial to test (start server with/without
--cache-ram 0, grep the log).Related
Reported by Claude et Richard Murray — colossus-ia.org
Suggested labels for triage:
bug,good first issue, areaserver.