Skip to content

Data parallel Deployments same (model, slo) on QLM sent to the same virtual queue engine #2

@vikranth22446

Description

@vikranth22446

If there is more than one model deployment in data parallel, the current code won't send the requests to a second instance

if (request.model, request.slo) in self.model_slo_group_bimap:

data parallel model deployments is a very common use case when trying to scale up a system

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions