A powerful web application that downloads YouTube channel videos, extracts transcripts with timestamps, analyzes speech patterns and debate techniques, and creates an AI-powered training bot to help you learn effective communication and argumentation skills.
- Download all videos from any YouTube channel
- Extract video metadata (title, duration, views, etc.)
- Support for processing 1000+ videos
- Automatic transcript extraction with timestamps
- Support for both manual and auto-generated captions
- Whisper-powered fallback generates subtitles when no captions exist (requires OpenAI API key)
- Resume-friendly: previously processed videos are skipped automatically (use
--forceto regenerate) - Multiple export formats:
- JSON: Full transcript data with timestamps
- TXT: Plain text with formatted timestamps
- SRT: Standard subtitle format
- VTT: Web video text tracks
-
Logical Fallacy Detection: Identify common logical fallacies
- Ad hominem attacks
- Straw man arguments
- Appeal to authority
- Slippery slope
- False dichotomy
- And more...
-
Rhetorical Device Analysis:
- Rhetorical questions
- Repetition patterns
- Analogies and metaphors
- Contrast and emphasis
- Rule of three
-
Speaking Style Metrics:
- Formality score
- Assertiveness level
- Emotional language usage
- Question frequency
- Average sentence length
-
Key Phrase Extraction: Most common phrases and language patterns
-
Chat with an AI that emulates the YouTuber's speaking style
-
Three training modes:
- Practice Mode: Engage in debates with the AI
- Analyze Mode: Get feedback on your arguments
- Learn Mode: Learn specific techniques and strategies
-
Personalized based on channel analysis
-
View example responses from the actual YouTuber
-
Track conversation history
- Python 3.8 or higher
- pip package manager
- ffmpeg installed and on your PATH (recommended; without it audio stays in source format but Whisper still works)
-
Clone or download this project
-
Create a virtual environment (recommended):
cd youtube-debate-trainer
python -m venv venv
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Configure environment variables:
cp .env.example .envEdit .env file and add your API keys:
OPENAI_API_KEY=your_openai_api_key_here
# OR
ANTHROPIC_API_KEY=your_anthropic_api_key_here
Note: You need at least ONE API key (OpenAI or Anthropic) for AI trainer features.
- Set
ENABLE_WHISPER_FALLBACK=true(default) in.envto auto-generate subtitles when a video has no captions - Requires
OPENAI_API_KEYandffmpeginstalled on your system - Optional overrides:
WHISPER_MODEL=whisper-1WHISPER_LANGUAGE=en
- If you drop a standalone
ffmpegbinary intoyoutube-debate-trainer/bin, the app automatically adds it toPATHso you don't need a system-wide install
To get API keys:
- OpenAI: https://platform.openai.com/api-keys
- Anthropic: https://console.anthropic.com/
- Verify setup (recommended):
python check_setup.pyThis preflight script checks dependencies, ffmpeg, environment variables, and Whisper fallback settings. Fix any reported issues before continuing. If you only get a warning about missing ffmpeg, Whisper can still run (it will just use the original audio container).
- Run the application:
python app.py- Open in browser:
Navigate to
http://localhost:5000
- Go to the home page
- Enter the YouTube channel URL (e.g.,
https://youtube.com/@channelname) - Give it a name (e.g.,
debater_john) - Set maximum videos to process (start with 10-50 for testing)
- Select export formats
- Click "Start Processing"
- Re-running the same channel skips videos you've already processed; add
--force(CLI) or"force": true(API) if you need to rebuild transcripts - Web/API calls run the same preflight checks as
python check_setup.py, so you'll get actionable errors if something is missing
- Re-running the same channel skips videos you've already processed; add
The app will:
- Download channel information
- Extract transcripts from all videos
- Analyze speech patterns and techniques
- Save everything in the
data/directory
Processing time: Approximately 2-5 seconds per video
-
Go to "AI Trainer" page
-
Select your processed channel
-
Click "Initialize Trainer"
-
Choose a training mode:
- Practice: Debate with the AI
- Analyze: Get feedback on your arguments
- Learn: Learn techniques
-
Start chatting!
- Scroll to the "Single Video Mode" card on the home page
- Paste any YouTube video URL and click Get Transcript (Free)
- Preview the transcript inline or download in TXT/MD/SRT/VTT/JSON/CSV
- The same functionality is available via
POST /api/transcribewith body{ "url": "<youtube url>" }
- Process a channel of a skilled debater
- Use "Learn Mode" to understand their techniques
- Practice arguments in "Practice Mode"
- Type your argument in "Analyze Mode"
- Get feedback on logical fallacies, rhetorical effectiveness
- Improve your reasoning
- Study the speaking patterns of charismatic speakers
- See their formality, assertiveness, and language patterns
- Practice emulating their style
youtube-debate-trainer/
├── app.py # Flask web application
├── config.py # Configuration settings
├── requirements.txt # Python dependencies
├── .env # Environment variables (create from .env.example)
│
├── app/ # Core modules
│ ├── youtube_downloader.py # YouTube video & metadata downloader
│ ├── transcript_extractor.py # Transcript extraction & export
│ ├── speech_analyzer.py # Speech pattern & fallacy analysis
│ └── ai_trainer.py # AI chatbot trainer
│
├── templates/ # HTML templates
│ ├── base.html # Base template
│ ├── index.html # Home page
│ └── chat.html # AI trainer chat interface
│
└── data/ # Generated data (created automatically)
├── videos/ # Downloaded videos (if enabled)
├── audio/ # Cached audio for Whisper fallback
├── transcripts/ # Extracted transcripts
└── exports/ # Analysis results & exports
POST /api/process-channel- Start processing a channelGET /api/job-status/<job_id>- Check processing statusGET /api/channels- List all processed channelsGET /api/channel/<channel_name>- Get channel details
POST /api/trainer/init/<channel_name>- Initialize trainerPOST /api/trainer/chat/<trainer_id>- Chat with AIPOST /api/trainer/reset/<trainer_id>- Reset conversationGET /api/trainer/examples/<trainer_id>- Get example responses
GET /api/export/<channel_name>/<format>- Export data
Edit .env file to customize:
# Maximum videos to process per channel
MAX_VIDEOS_PER_CHANNEL=1000
# Download actual video files (requires storage space)
DOWNLOAD_VIDEO_FILES=false
# AI model to use
DEFAULT_AI_MODEL=gpt-4-turbo-preview
# or for Anthropic: claude-3-5-sonnet-20241022
# Whisper fallback controls
ENABLE_WHISPER_FALLBACK=true
WHISPER_MODEL=whisper-1
WHISPER_LANGUAGE=en-
Start Small: Process 10-20 videos first to test, then scale up
-
Video Selection: The more videos you process, the better the AI understands the speaking style
-
API Costs:
- OpenAI GPT-4: ~$0.01-0.03 per conversation
- Anthropic Claude: ~$0.015 per conversation
- Transcript extraction is FREE (no API needed)
-
Storage: Each transcript is ~10-50KB. 100 videos ≈ 1-5MB
-
Best Channels to Analyze:
- Debate channels
- Philosophy discussions
- Educational content creators
- Public speakers
- Podcast hosts
- Some videos don't have captions enabled
- Check if the video has manual or auto-generated captions on YouTube
- Verify API key is set correctly in
.env - Check API key has credits/quota
- Check console for error messages
- Normal: 2-5 seconds per video
- Depends on video length and transcript availability
- Run in background and check progress
pip install -r requirements.txt- The clip is age or region restricted. Provide cookies to
yt-dlp, log in, or remove that video from the batch. - The progress card now surfaces this warning immediately and moves on to the next video automatically.
- ffmpeg not found: Install via Homebrew (
brew install ffmpeg), apt (sudo apt install ffmpeg), or download binaries and ensure the executable is on your PATH. - Whisper fallback misconfigured: Provide
OPENAI_API_KEYin.envor setENABLE_WHISPER_FALLBACK=falseif you don't want automatic subtitles. - No AI API keys: Add either
OPENAI_API_KEYorANTHROPIC_API_KEYbefore using the AI trainer or Whisper fallback features.
Create a Python script:
from app.youtube_downloader import YouTubeDownloader
from app.transcript_extractor import TranscriptExtractor
from app.speech_analyzer import SpeechAnalyzer
from app.ai_trainer import AITrainer
# Download channel
downloader = YouTubeDownloader()
videos = downloader.get_channel_videos('CHANNEL_URL', max_videos=50)
downloader.save_video_metadata(videos, 'channel_name')
# Extract transcripts
extractor = TranscriptExtractor()
results = extractor.process_channel_transcripts(videos, ['json', 'txt'])
# Analyze
analyzer = SpeechAnalyzer()
analysis = analyzer.analyze_channel('channel_name')
# Train
trainer = AITrainer('channel_name')
response = trainer.chat("What's your view on free speech?", mode='practice')
print(response)- This tool is for educational purposes only
- Respect copyright and fair use
- Get permission before training on private content
- Use responsibly for learning and self-improvement
- Don't use to impersonate or mislead others
This project is provided as-is for educational purposes.
Built with:
- Flask - Web framework
- yt-dlp - YouTube downloader
- youtube-transcript-api - Transcript extraction
- OpenAI / Anthropic - AI capabilities
For issues, questions, or feature requests, please create an issue in the repository.
Happy learning and debating! 🎯