Skip to content

Difference between code and training description in paper #10

@baileyyeah0326

Description

@baileyyeah0326

Dear authors,

Thanks for your great work! I have noticed some difference between the training code and the description in paper.

  1. For the projector pre-training stage, you said all model weights are frozen except for those in the projector. However, I noticed that you still train the embed_tokens and save them in the model.initialize_vision_tokenizer part.
  2. For the VLM Fine-tuning stage, you said you update only the LoRA modules and the projector while keeping all other weights frozen. However, I noticed that you still train other parameters including "vision_tower", "mm_projector", "embed_tokens" and "lm_head".
    Could you explain it please?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions