Add SGLang scheduler to SimAI Vidur: chunked prefill + RadixAttention prefix caching by Copilot · Pull Request #223 · aliyun/SimAI

Copilot · 2026-02-28T04:52:40Z

SimAI had no way to simulate SGLang's runtime scheduling behavior. This adds a first-class sglang replica scheduler to the Vidur inference simulator that models SGLang's two core performance features.

Changes

New: `SglangReplicaScheduler`

Chunked prefill – identical semantics to Sarathi-Serve; breaks long prompts into chunk_size-token chunks interleaved with decode iterations
RadixAttention prefix caching – parameterized via prefix_cache_hit_rate (0.0–1.0):
- Reduces KV-block allocation: only ceil((1 − r) × prefill_tokens / block_size) fresh blocks are allocated per request
- Fast-forwards num_processed_tokens past the cached portion in the first iteration, reducing the number of prefill chunks proportionally
- Decode-phase block tracking accounts for the "virtual" cached token capacity

Config: `SglangSchedulerConfig`

New dataclass registered under ReplicaSchedulerType.SGLANG = 7:

Field	Default	Purpose
`chunk_size`	512	Prefill chunk size (tokens)
`enable_prefix_caching`	`True`	RadixAttention toggle
`prefix_cache_hit_rate`	`0.0`	Fraction of prefill tokens served from cache
`max_tokens_in_batch`	4096	Batch token budget

Example

python -m vidur.main \
  --replica_config_model_name meta-llama/Meta-Llama-3-8B \
  --replica_scheduler_config_type sglang \
  --sglang_scheduler_config_chunk_size 512 \
  --sglang_scheduler_config_enable_prefix_caching \
  --sglang_scheduler_config_prefix_cache_hit_rate 0.7 \
  --sglang_scheduler_config_max_tokens_in_batch 4096 \
  ...

Choosing prefix_cache_hit_rate: 0.0 for random prompts, 0.3–0.5 for few-shot workloads, 0.7–0.95 for workloads with long shared system prompts. See README-vidur.md for the full guidance table.

Notes

Since the simulator operates at token-count granularity (not actual token values), exact radix-tree prefix matching is approximated by the hit-rate parameter. The memory savings are modeled correctly; execution time for the first prefill chunk is conservatively overestimated by the cached-token portion.
README.md and README-vidur.md updated with CLI reference and usage guidance.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

CLAassistant · 2026-02-28T04:58:10Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Co-authored-by: tianhao909 <48342395+tianhao909@users.noreply.github.com>

Initial plan

6fc6620

Copilot AI assigned Copilot and tianhao909 Feb 28, 2026

Copilot started work on behalf of tianhao909 February 28, 2026 04:52 View session

Add SGLang scheduler integration (RadixAttention + chunked prefill)

288aa99

Co-authored-by: tianhao909 <48342395+tianhao909@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Integrate sglang for seamless simulation~~ Add SGLang scheduler to SimAI Vidur: chunked prefill + RadixAttention prefix caching Feb 28, 2026

Copilot finished work on behalf of tianhao909 February 28, 2026 05:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SGLang scheduler to SimAI Vidur: chunked prefill + RadixAttention prefix caching#223

Add SGLang scheduler to SimAI Vidur: chunked prefill + RadixAttention prefix caching#223
Copilot wants to merge 2 commits intomasterfrom
copilot/merge-sglang-for-simulation

Copilot AI commented Feb 28, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

New: SglangReplicaScheduler

Config: SglangSchedulerConfig

Example

Notes

Uh oh!

CLAassistant commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 28, 2026 •

edited

Loading

New: `SglangReplicaScheduler`

Config: `SglangSchedulerConfig`