Skip to content

Lack of server implementation in order to test end to end performance #1

@vikranth22446

Description

@vikranth22446

In order to test end to end performance for QLM with different benchmarks, qlm needs to be setup as a server in order to serve requests.

I do have a sample implementation that I've forked for my use case:
https://gist.github.com/vikranth22446/b2544d1a83e9f69401442661a2c579cf

Ideally, this can be tested with a benchmark like:
https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions