Skip to content

AnthonyMadia/yt_transcripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Channel Transcriber

A Python script that downloads and transcribes videos from YouTube channels using OpenAI's Whisper. The script first attempts to use YouTube's built-in transcription API, and falls back to Whisper if no transcript is available.

Features

  • Fetches all playlists from a YouTube channel matching specified keywords
  • Downloads and processes videos in batches
  • Uses YouTube's transcript API when available
  • Falls back to OpenAI's Whisper for videos without transcripts
  • Stores transcripts in SQLite database
  • Shows progress with tqdm progress bars

Prerequisites

  • Python 3.7+
  • ffmpeg (required for Whisper)
  • Chrome/Chromium browser (for Selenium)
  • YouTube Data API key

Installation

  1. Clone this repository
  2. Install required packages:
pip install -r requirements.txt
  1. Install ffmpeg (if not already installed):

    • Ubuntu: sudo apt install ffmpeg
    • macOS: brew install ffmpeg
    • Windows: Download from ffmpeg website
  2. Copy config.example.json to config.json and update with your settings:

    • Get a YouTube API key from Google Cloud Console
    • Set your target channel URL
    • Define keywords to match playlists

Usage

  1. Configure your settings in config.json
  2. Run the script:
python main.py

Configuration

Edit config.json with your settings:

  • youtube_api_key: Your YouTube Data API key
  • channel_url: URL of the YouTube channel to process
  • playlist_keywords: List of keywords to match playlists
  • whisper_model: Whisper model to use (tiny, base, small, medium, large)

License

MIT License

About

Python script to take a youtube channel and ingest transcripts. Utilizes OpenAI's Whisper to generate transcripts if they are not readily available. Enjoy!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages