server: "prompt cache is enabled, size limit: 8192 MiB" still logged when --cache-ram 0 is set

## Summary

`llama-server` prints an informational log at model-load time claiming the prompt cache is enabled with a 8192 MiB limit, even when the user has explicitly disabled the cache via `--cache-ram 0`. The message also suggests the user pass `--cache-ram 0` to disable — which is ironic, since in our run the flag is already passed. The log message simply doesn't reflect the effective runtime state.

This is cosmetic (no functional impact) but actively misleading when diagnosing cache-related issues, because a user debugging heap/memory problems will read the log and conclude the cache is still active when it isn't.

## Environment

- **llama.cpp SHA**: `9e5647aff` (build `b8840`)
- Flag: `--cache-ram 0` on the command line (also `LLAMA_ARG_CACHE_RAM=0` in env, belt-and-suspenders)
- Also passing `--no-cache-prompt` (separate flag, also set)

## Reproducer

Start the server with the cache explicitly disabled:

```
llama-server -m <any model> --cache-ram 0 --host 127.0.0.1 --port 8080
```

Observe in the stdout log around model-load time:

```
srv    load_model: prompt cache is enabled, size limit: 8192 MiB
srv    load_model: use `--cache-ram 0` to disable the prompt cache
```

## Expected vs. actual

**Expected** (when `--cache-ram 0`):
```
srv    load_model: prompt cache is disabled (--cache-ram 0)
```
or equivalent that reflects the effective config.

**Actual**:
The message prints unconditionally with the default 8192 MiB text, regardless of the `--cache-ram` argument.

## Confirmation the cache is actually disabled

We confirmed via runtime RSS that with `--cache-ram 0` the cache is in fact off:

| Config | Qwen2.5-VL 7B + mmproj RSS under load (600 req) |
|---|---|
| default `--cache-ram 8192` | ~11.9 GB |
| `--cache-ram 0` | ~1.0 GB |

So the flag *works*, only the log line lies.

## Suggested fix direction

Move the log print to after the cache is actually initialized, and branch on the effective size:

```cpp
if (cache_ram_mib == 0) {
    LOG_INF("srv    load_model: prompt cache disabled (--cache-ram 0)\n");
} else {
    LOG_INF("srv    load_model: prompt cache enabled, size limit: %d MiB\n", cache_ram_mib);
    LOG_INF("srv    load_model: use `--cache-ram 0` to disable the prompt cache\n");
}
```

(Exact location likely in the code introduced by PR #16391.)

## Why this matters

Right now, anybody trying to debug an issue that points at the prompt cache (e.g. issue #21336, the heap-corruption companion issue we're filing alongside this) will look at the log, see "prompt cache is enabled", and waste time assuming the flag didn't take effect. This is a small change that meaningfully improves the diagnostic surface.

## Good first issue

This looks like a clean self-contained change in the logging code path for somebody new to the project — one file, one function, trivial to test (start server with/without `--cache-ram 0`, grep the log).

## Related

- #16391 — PR that introduced the prompt cache and this log line
- Companion issue (being filed at the same time) on heap corruption in the same cache under load

---

*Reported by Claude et Richard Murray — colossus-ia.org*

---

*Suggested labels for triage: `bug`, `good first issue`, area `server`.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: "prompt cache is enabled, size limit: 8192 MiB" still logged when --cache-ram 0 is set #22127

Summary

Environment

Reproducer

Expected vs. actual

Confirmation the cache is actually disabled

Suggested fix direction

Why this matters

Good first issue

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: "prompt cache is enabled, size limit: 8192 MiB" still logged when --cache-ram 0 is set #22127

Description

Summary

Environment

Reproducer

Expected vs. actual

Confirmation the cache is actually disabled

Suggested fix direction

Why this matters

Good first issue

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions