Skip to content

[FEAT] Add replay from trace strategy#620

Draft
VincentG1234 wants to merge 4 commits intovllm-project:mainfrom
VincentG1234:add-strategy-replay-from-trace
Draft

[FEAT] Add replay from trace strategy#620
VincentG1234 wants to merge 4 commits intovllm-project:mainfrom
VincentG1234:add-strategy-replay-from-trace

Conversation

@VincentG1234
Copy link

@VincentG1234 VincentG1234 commented Mar 4, 2026

Summary

  • Add a new replay benchmarking strategy that reproduces real-world request patterns from trace log files (.jsonl)
  • Enable time-based request rate replay with precise timestamp scheduling
  • Support synthetic prompt generation that matches token counts from trace files
  • use max_requests and max_seconds cli options to limit the number of requests processed from a trace

Motivation

This change addresses issue #597 by enabling users to benchmark their vLLM servers using real production traces. Instead of synthetic load patterns, users can now replay exact request arrival times and token distributions from their actual workloads for more realistic performance testing.

Changes

  • Add TraceReplayStrategy scheduler strategy for timestamp-based request dispatching
  • Add ReplayProfile class for configuring trace-based benchmarking parameters
  • Add TraceSyntheticDatasetDeserializer to generate prompts matching trace input/output lengths
  • Add TraceReader utility for reading .jsonl trace files with timestamp, input_length, output_length fields
  • Update Entrypoint to handle replay profile and dataset configuration
  • use max_requests and max_seconds truncation support to limit trace replay length

Testing

  • pytest tests/unit/scheduler/test_trace_replay.py (pass)

  • pytest tests/unit/benchmark/test_replay_profile.py (pass)

  • pytest tests/unit/data/deserializers/test_trace_synthetic.py (pass)

  • Added tests: scheduling accuracy, boundary conditions, malformed trace handling, empty trace cases, max_requests truncation

  • test it in practice quickly with NB COLAB

Notes / Risks

  • Minimal implementation focused on core trace replay functionality
  • Requires manual verification with real trace files before merging

Next Steps (this PR)

  1. Apply reviewer feedback and address any comments
  2. Add handwritten integration tests with sample trace files
  3. Add E2E tests verifying end-to-end trace replay flow
  4. Add CLI usage examples in PR description and docs
  5. Update documentation with trace replay guide

Out of Scope (future PRs)

  • Mooncake trace format support (token-level traces)
  • Helper utilities for timestamp format conversions (Unix epoch, ISO8601, relative timestamps)
  • Support for request payload traces (not just token counts)
  • Trace file validation and schema verification tools
  • Performance optimizations for large trace files (streaming, chunked processing)
  • Metrics export formatted for trace analysis comparison
  • Support for trace file compression formats (.gz, .bz2)

Add trace replay capability to GuideLLM for reproducing real-world request
patterns from trace files. This enables time-based request rate replay
and synthetic prompt generation matching trace token counts.

- Add TraceReplayStrategy for scheduling requests at precise timestamps
- Add ReplayProfile for configuring trace-based benchmarking
- Add TraceSyntheticDatasetDeserializer for generating prompts from traces
- Support max_requests truncation to limit trace length

This is a minimal implementation to address issue 597. Full Mooncake
format support, E2E tests, and documentation will follow in subsequent PRs.

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
@VincentG1234 VincentG1234 force-pushed the add-strategy-replay-from-trace branch from 008633f to a66034b Compare March 4, 2026 13:32
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant