Dear authors,
Thanks for your great work! I have noticed some difference between the training code and the description in paper.
- For the projector pre-training stage, you said all model weights are frozen except for those in the projector. However, I noticed that you still train the embed_tokens and save them in the model.initialize_vision_tokenizer part.
- For the VLM Fine-tuning stage, you said you update only the LoRA modules and the projector while keeping all other weights frozen. However, I noticed that you still train other parameters including "vision_tower", "mm_projector", "embed_tokens" and "lm_head".
Could you explain it please?
Dear authors,
Thanks for your great work! I have noticed some difference between the training code and the description in paper.
Could you explain it please?