[FEAT] Add replay from trace strategy#620
Draft
VincentG1234 wants to merge 4 commits intovllm-project:mainfrom
Draft
[FEAT] Add replay from trace strategy#620VincentG1234 wants to merge 4 commits intovllm-project:mainfrom
VincentG1234 wants to merge 4 commits intovllm-project:mainfrom
Conversation
Add trace replay capability to GuideLLM for reproducing real-world request patterns from trace files. This enables time-based request rate replay and synthetic prompt generation matching trace token counts. - Add TraceReplayStrategy for scheduling requests at precise timestamps - Add ReplayProfile for configuring trace-based benchmarking - Add TraceSyntheticDatasetDeserializer for generating prompts from traces - Support max_requests truncation to limit trace length This is a minimal implementation to address issue 597. Full Mooncake format support, E2E tests, and documentation will follow in subsequent PRs. Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
008633f to
a66034b
Compare
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
replaybenchmarking strategy that reproduces real-world request patterns from trace log files (.jsonl)max_requestsandmax_secondscli options to limit the number of requests processed from a traceMotivation
This change addresses issue #597 by enabling users to benchmark their vLLM servers using real production traces. Instead of synthetic load patterns, users can now replay exact request arrival times and token distributions from their actual workloads for more realistic performance testing.
Changes
TraceReplayStrategyscheduler strategy for timestamp-based request dispatchingReplayProfileclass for configuring trace-based benchmarking parametersTraceSyntheticDatasetDeserializerto generate prompts matching trace input/output lengthsTraceReaderutility for reading .jsonl trace files with timestamp, input_length, output_length fieldsEntrypointto handle replay profile and dataset configurationmax_requestsandmax_secondstruncation support to limit trace replay lengthTesting
pytest tests/unit/scheduler/test_trace_replay.py(pass)pytest tests/unit/benchmark/test_replay_profile.py(pass)pytest tests/unit/data/deserializers/test_trace_synthetic.py(pass)Added tests: scheduling accuracy, boundary conditions, malformed trace handling, empty trace cases, max_requests truncation
test it in practice quickly with NB COLAB
Notes / Risks
Next Steps (this PR)
Out of Scope (future PRs)