Difference between code and training description in paper

Dear authors,

Thanks for your great work! I have noticed some difference between the training code and the description in paper. 
1. For the projector pre-training stage, you said all model weights are frozen except for those in the projector. However, I noticed that you still train the embed_tokens and save them in the model.initialize_vision_tokenizer part.
2. For the VLM Fine-tuning stage, you said you update only the LoRA modules and the projector while keeping all other weights frozen. However, I noticed that you still train other parameters including "vision_tower", "mm_projector", "embed_tokens" and "lm_head". 
Could you explain it please?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between code and training description in paper #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Difference between code and training description in paper #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions