[bugfix] fix download model vllm_engine#8062
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the initialization logic across several inference engines, including Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request fixes a bug where the model was not being downloaded when a pre-existing template was provided to the inference engines. The fix involves calling _get_processor() unconditionally in LmdeployEngine, SglangEngine, and VllmEngine to ensure the model files are always available locally.
While this effectively resolves the bug, I've noted a small inefficiency in the implementation. The processor object created by _get_processor() is discarded if a template is already provided. I've left a comment with a suggestion for a future refactoring to improve this.
Overall, the change is a good and simple fix for the problem.
…vllm' into fix_download_model_vllm
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where the model was not being downloaded when a template was provided to the inference engines. The fix involves calling get_model_info_meta with download_model=True in LmdeployEngine, SglangEngine, and VllmEngine. The change correctly resolves the issue. However, this fix introduces code duplication across the three engine classes. I've added a comment with a suggestion to refactor this duplicated logic into the base InferEngine class to improve maintainability.
| else: | ||
| get_model_info_meta( | ||
| model_id_or_path, hub_token=hub_token, use_hf=use_hf, revision=revision, download_model=True) |
There was a problem hiding this comment.
This logic for ensuring the model is downloaded when a template is provided is duplicated across LmdeployEngine, SglangEngine, and VllmEngine. This code duplication can make future maintenance more difficult.
To improve maintainability, consider refactoring this logic into a shared method in the InferEngine base class. For example, you could create a method that handles template initialization and model downloading, which can then be called from each engine's __init__ method.
Example of a refactored method in InferEngine:
# In swift/infer_engine/infer_engine.py
from swift.model import get_model_info_meta
class InferEngine(BaseInferEngine, ProcessorMixin):
# ...
def _init_template_and_model(self, template: Optional[Template]) -> Template:
if template is None:
processor = self._get_processor()
template = self._get_template(processor)
else:
# Ensure model is downloaded if template is provided
get_model_info_meta(
self.model_id_or_path,
hub_token=self.hub_token,
use_hf=self.use_hf,
revision=self.revision,
download_model=True)
return template
@abstractmethod
def _get_processor(self):
raise NotImplementedErrorEach engine subclass could then simplify its initialization. While this refactoring would involve changes to a file not in this PR, it's a valuable improvement to consider for a follow-up.
No description provided.