Conversation
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
jaredoconnell
left a comment
There was a problem hiding this comment.
I'm not 100% sure of the arg name. I can see arguments both ways.
But I think this would be a good opportunity to add a markdown file detailing all of the constraints.
I added two comments.
| """ | ||
| Constraint that limits execution based on minimum request counts. | ||
|
|
||
| Like MinNumberConstraint but instead of stopping request generation after reaching |
There was a problem hiding this comment.
Mistake: It should say "Like MaxNumberConstraint"
I think this wording doesn't emphasize the nuances of this implementation enough. Maybe clarify generation and processing, and why this may be helpful. It's identical except that it doesn't stop queueing until the max processed quantity is reached.
|
|
||
|
|
||
| @ConstraintsInitializerFactory.register( # type: ignore[arg-type] | ||
| ["min_number", "min_num", "min_requests", "min_req"] |
There was a problem hiding this comment.
It may make sense to instead rename this to max-processed. I think this would be less confusing. But I can see the argument for min, since it's going to keep scheduling past that until the max-processed is reached. So I'm not sure what should be done.
There was a problem hiding this comment.
Yeah... max-processed is both a little too vague and also incorrect since we can end up processing more requests then set. I think min is fine actually. I'll just add some notes to the docs that clarify constraints are OR not AND. Maybe in the future we can support AND constraint combinations.
Summary
Adds a
--min-requestsconstraint that acts like--max-requestsbut keeps scheduling requests until the last request under the threshold completes.Details
The normal
--max-requestsvariant can have unexpectedly low per-request throughput / latency due to request trail-off at the end of the benchmark. The current solution to this problem is to set--max-requestsso high that the proportion of trail-off time to total benchmark time is small. If we continue to schedule requests even after hitting the constraint we ensure that the requested rate is maintained for the entire duration of measurement.Note that
--min-requestsis a bit of a misnomer when combined with other constraints, since any other active constraints can trigger the benchmark to end before--min-requests. Other name suggestions are welcome.Test Plan
Here is an example benchmark which should cause
--min-requeststo behave differently from--max-requests:guidellm benchmark run \ --target http://127.0.0.1:8000 \ --request-format /v1/completions \ --profile concurrent \ --rate 30 \ --data prompt_tokens=50,output_tokens=50 \ --min-requests 50Use of AI
## WRITTEN BY AI ##)