Lack of server implementation in order to test end to end performance

In order to test end to end performance for QLM with different benchmarks, qlm needs to be setup as a server in order to serve requests. 

I do have a sample implementation that I've forked for my use case:
https://gist.github.com/vikranth22446/b2544d1a83e9f69401442661a2c579cf

Ideally, this can be tested with a benchmark like:
https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of server implementation in order to test end to end performance #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lack of server implementation in order to test end to end performance #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions