Skip to content

Influence of ViT #11

@jiazhen-code

Description

@jiazhen-code

Thank you for your insightful discovery. I have a question regarding the influence of ViT. If you use a pre-trained ViT and freeze it, then only train the added adapter layer while also freezing the LLaMA block, will the performance consistently improve?

Additionally, would using multimodal-aligned LLMs like LLaMA in LLaVA achieve better performance compared to the original LLaMA? I find it fascinating to explore these aspects as they could provide clearer guidance on utilizing LLM-blocks in vision components.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions