You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implemented Syntax Aware LSTM Model capable of generating captions on video clips using Computer Vision And NLP.
Preprocessed the MSVD dataset and used pretrained Inception V4 model as encoder generating features for each video frames.
Trained LSTM with Attention from scratch in Pytorch and used it as Decoder obtaining overall BLEU score of 34%
About
This project is an End-to-End Video Captioning System designed to bridge the gap between Computer Vision and Natural Language Processing. It automatically generates descriptive text for video content, essentially teaching a computer to "watch" a video and describe what is happening in English