Conversation
|
Tested out on random noise and it runs. I'll try adapting to webdataset on some clusters and see how it does! |
|
I found https://arxiv.org/abs/2212.03185 thanks to Laion(Ryu) which improves on movq.
|
|
I'm starting to add the projected gan technique from here. This seems to still have state-of-the-art in quite a few datasets although it is from 2021. The main idea is instead of plugging in images to the generator/discriminator, plugging in timm computed hierarchical features which makes training converge faster. |
|
Other news is I was finally able to add the imagenet training dataset to the cluster so I will be testing the movq/spectral norm added f16 pre-trained model soon |
|
I'll add Finite Scalar Quantization: VQ-VAE Made Simple since that seems very interesting. It seems to lead to Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation which has a better fid than diffusion models seems like |
This is a draft pr for adding the vqgan training. It's still quite rough around the edges but might be able to do ok after some bug fixes.