High-Performance Multimodal Video Analysis & Narrative Synthesis
English | 🇫🇷 Voir le README en Français
Zenith AI is a cutting-edge multimodal intelligence system designed to "understand" video content like a human would. By combining Computer Vision (YOLOv8), Speech-to-Text (Whisper), and Large Language Models (LLM), it transforms any video or URL into a structured, professional narrative report.
- 🎥 Universal Input: Upload local files or paste links (YouTube, TikTok, Twitter, etc.).
- 👁️ Vision Intelligence: Real-time object detection and scene analysis using YOLOv8.
- 🎙️ Audio Transcription: High-fidelity speech-to-text with automatic language detection.
- 🧠 Narrative Synthesis: Generates a deep, contextual analysis report in French (or your preferred language).
- 💎 Luxury UI: A sleek, dark-mode dashboard built with Gradio.
Follow these simple steps to get Zenith AI running in seconds:
Go to Google Colab and create a new Python 3 notebook.
For maximum performance:
- Go to
Runtime>Change runtime type - Select T4 GPU (or any available GPU)
- Click Save
Copy the entire content of main.py into a cell.
Before running the cell, locate the API_CONFIG section at the top of the script and enter your credentials:
API_CONFIG = {
"url": "YOUR_API_ENDPOINT",
"key": "YOUR_API_KEY",
"model": "YOUR_MODEL_NAME"
}- Execute the cell (Ctrl + Enter).
- Wait for the dependencies to install.
- Click the public URL (ending in
.gradio.live) to open the dashboard.
gradio: Web Interfaceultralytics: YOLOv8 Visionfaster-whisper: Audio Transcriptionyt-dlp: Video Downloaderdecord: High-speed frame extraction
Distributed under the MIT License. See LICENSE for more information.
Built with ❤️ By Shadrak BESSANH