Add kv cache hit source breakdown (device/host/storage) to step profiling by Weili-0234 · Pull Request #32 · ThunderAgent-org/ThunderAgent

Weili-0234 · 2026-03-16T04:55:38Z

When SGLang has HiCache or LMCache enabled, responses include sglext.cached_tokens_details with per-tier hit counts:

device: GPU KV cache
host: CPU memory (HiCache L2 / LMCache host buffers)
storage: L3 backend (when configured)

Changes:

vllm_request_processor.py: extract cached_tokens_details from sglext, add maybe_enable_cached_token_details() to request return_cached_tokens_details when profiling is enabled, extend on_usage callback signature
profile/state.py: add cached_tokens_device/host/storage and cached_storage_backend fields to StepMetrics and CSV output
app.py: pass cached_tokens_details through on_usage callback

When SGLang has HiCache or LMCache enabled, responses include sglext.cached_tokens_details with per-tier hit counts: - device: GPU KV cache - host: CPU memory (HiCache L2 / LMCache host buffers) - storage: L3 backend (when configured) Changes: - vllm_request_processor.py: extract cached_tokens_details from sglext, add maybe_enable_cached_token_details() to request return_cached_tokens_details when profiling is enabled, extend on_usage callback signature - profile/state.py: add cached_tokens_device/host/storage and cached_storage_backend fields to StepMetrics and CSV output - app.py: pass cached_tokens_details through on_usage callback

…g#32) # What does this PR do? Improves installation instructions in readme and docs. The quick start on the README is meant for single node with 4 GPUs. (ray need not even be installed, because `ray.init` will start a cluster if not running) The installation guide is also updated. I have verified that the quickstart works on a fresh instance. --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kv cache hit source breakdown (device/host/storage) to step profiling#32

Add kv cache hit source breakdown (device/host/storage) to step profiling#32
Weili-0234 wants to merge 1 commit into
ThunderAgent-org:mainfrom
Weili-0234:add-cached-tokens-details

Weili-0234 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Weili-0234 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant