If there is more than one model deployment in data parallel, the current code won't send the requests to a second instance
|
if (request.model, request.slo) in self.model_slo_group_bimap: |
data parallel model deployments is a very common use case when trying to scale up a system
If there is more than one model deployment in data parallel, the current code won't send the requests to a second instance
QLM/qlm/queue/virtual_queue_engine.py
Line 42 in eea5b62
data parallel model deployments is a very common use case when trying to scale up a system