Can BERT be used as language model for generating captions instead of GPT-2?
Can BERT be used as language model for generating captions instead of GPT-2?