I have adapted FireredASR-LLM based on the community vLLM 0.10.1.
Testing on a local single-GPU H20 environment shows the QPM reaching approximately 1200 with a batch size of 16. I would appreciate your feedback on these results.
Additionally, is it possible to modify the FireredASR-LLM structure to better fit vLLM's loading workflow?