Multimodal video analysis — Whisper transcription, LLM chapter summarisation, CLIP key-frame extraction, and natural language Q&A over uploaded video content.
nlp computer-vision deep-learning video-understanding temporal-grounding large-language-models generative-ai multimodal-ai content-summarization youtube-transcript-analysis
-
Updated
Mar 15, 2026 - Python