Hey Lukas,
Hope you're well :) I believe the weights you learned for the OpenCLIP ViT-L/14 laion400m_e32 are for the module visual.ln_post and not visual as the folder name suggests in this repo.
Using visual.ln_post with this model in thingsvision returns 768 dimensional features, whereas visual returns 512 dimensional features. The weight matrices in transforms/OpenCLIP_ViT-L-14_laion400m_e32/visual/transform.npz are 768 by 768, matching visual.ln_post
I'm pointing this out because the mismatch of names causes issues when calling the .align method in thingsvision
cheers :)
Can
Hey Lukas,
Hope you're well :) I believe the weights you learned for the OpenCLIP ViT-L/14 laion400m_e32 are for the module
visual.ln_postand notvisualas the folder name suggests in this repo.Using
visual.ln_postwith this model in thingsvision returns 768 dimensional features, whereasvisualreturns 512 dimensional features. The weight matrices intransforms/OpenCLIP_ViT-L-14_laion400m_e32/visual/transform.npzare 768 by 768, matchingvisual.ln_postI'm pointing this out because the mismatch of names causes issues when calling the
.alignmethod in thingsvisioncheers :)
Can