Skip to content

Add kv cache hit source breakdown (device/host/storage) to step profiling#32

Open
Weili-0234 wants to merge 1 commit into
ThunderAgent-org:mainfrom
Weili-0234:add-cached-tokens-details
Open

Add kv cache hit source breakdown (device/host/storage) to step profiling#32
Weili-0234 wants to merge 1 commit into
ThunderAgent-org:mainfrom
Weili-0234:add-cached-tokens-details

Conversation

@Weili-0234
Copy link
Copy Markdown
Collaborator

When SGLang has HiCache or LMCache enabled, responses include sglext.cached_tokens_details with per-tier hit counts:

  • device: GPU KV cache
  • host: CPU memory (HiCache L2 / LMCache host buffers)
  • storage: L3 backend (when configured)

Changes:

  • vllm_request_processor.py: extract cached_tokens_details from sglext, add maybe_enable_cached_token_details() to request return_cached_tokens_details when profiling is enabled, extend on_usage callback signature
  • profile/state.py: add cached_tokens_device/host/storage and cached_storage_backend fields to StepMetrics and CSV output
  • app.py: pass cached_tokens_details through on_usage callback

When SGLang has HiCache or LMCache enabled, responses include
sglext.cached_tokens_details with per-tier hit counts:
- device: GPU KV cache
- host: CPU memory (HiCache L2 / LMCache host buffers)
- storage: L3 backend (when configured)

Changes:
- vllm_request_processor.py: extract cached_tokens_details from sglext,
  add maybe_enable_cached_token_details() to request
  return_cached_tokens_details when profiling is enabled, extend
  on_usage callback signature
- profile/state.py: add cached_tokens_device/host/storage and
  cached_storage_backend fields to StepMetrics and CSV output
- app.py: pass cached_tokens_details through on_usage callback
ergt10 pushed a commit to ergt10/ThunderAgent that referenced this pull request Apr 4, 2026
…g#32)

# What does this PR do?


Improves installation instructions in readme and docs.

The quick start on the README is meant for single node with 4 GPUs. (ray
need not even be installed, because `ray.init` will start a cluster if
not running)

The installation guide is also updated. I have verified that the
quickstart works on a fresh instance.

---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant