In order to test end to end performance for QLM with different benchmarks, qlm needs to be setup as a server in order to serve requests.
I do have a sample implementation that I've forked for my use case:
https://gist.github.com/vikranth22446/b2544d1a83e9f69401442661a2c579cf
Ideally, this can be tested with a benchmark like:
https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py
In order to test end to end performance for QLM with different benchmarks, qlm needs to be setup as a server in order to serve requests.
I do have a sample implementation that I've forked for my use case:
https://gist.github.com/vikranth22446/b2544d1a83e9f69401442661a2c579cf
Ideally, this can be tested with a benchmark like:
https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py