Universal Video Transcriber

Universal Video Transcriber is a native macOS application built with SwiftUI that provides powerful, privacy-focused video transcription using OpenAI's Whisper engine. All transcription happens locally on your Mac - no internet required after the initial model download.

Copyright © 2026 mpcode. All Rights Reserved. This project and its contents are proprietary and confidential.

Key Features

Whisper Medium Model: High-accuracy local transcription (~94% accuracy, 1.5GB)
Advanced Parameter Tuning: Fine-tune transcription quality with temperature, beam size, and noise suppression controls
99 Languages Supported: Transcribe videos in 99 different languages with automatic language detection
Auto-Detect Intelligence: Automatically detects the spoken language and transcribes in that language (NOT translated to English)
Privacy-First: All transcription runs locally on your Mac - no cloud services, no data sent externally
Auto-Setup: Automatic model download on first launch with progress tracking
Video Playback: Integrated video player with synchronized transcript and optional subtitle overlay (videos load paused, not auto-playing)
Transcript Editor: Full-featured editor to merge, split, delete, and correct transcript segments
Search: Full-text search within generated transcripts
Export: Export transcripts as plain text (.txt) or SubRip Subtitles (.srt)
Comprehensive Help: Built-in help system with keyboard shortcuts and troubleshooting guide
Fast Processing: ~30 seconds to transcribe a 5-minute video on modern Macs

How It Works

Transcription Process:

Select a video file from your Mac
Choose a language (or use "Auto-detect" to let Whisper identify it automatically)
Click "Transcribe" - the app will:
- Extract audio from the video
- Convert to 16kHz mono WAV format (Whisper requirement)
- Run local Whisper Medium model on your CPU
- Parse results into timestamped segments
View the transcript synchronized with video playback
Edit segments if needed (merge, split, correct text/timestamps)
Export to text or SRT subtitle format

Important: When using "Auto-detect", Whisper will:

✅ Detect the spoken language (e.g., Lithuanian, Spanish, French)
✅ Transcribe in that detected language (Lithuanian text output)
❌ NOT translate to English

Technical Architecture

Platform: macOS 13.0+ (Ventura or later)
Language: Swift 5.9+
Framework: SwiftUI
Architecture: MVVM (Model-View-ViewModel)
Transcription Engine: Whisper.cpp (local, on-device)
Model: Fixed to Whisper Medium (optimal balance of speed/accuracy)

Core Components

TranscriptionManager: Central coordinator for transcription workflow
WhisperService: Manages local whisper-cli execution, automatic model download, and audio conversion with dynamic parameter support
DownloadStateManager: Global singleton for tracking model download progress
SettingsManager: Settings management with advanced Whisper parameters (temperature, beam size, suppress non-speech)
VideoPlayerView: AVKit-based video player with subtitle overlay
TranscriptEditorView: Full-featured transcript editing interface
HelpView: Comprehensive help system with usage guide and troubleshooting
PersistenceManager: JSON-based storage for transcriptions

Prerequisites

macOS 13.0 or later
Xcode 14.0 or later (for building from source)
1.5GB free disk space (for Whisper Medium model)
Internet connection (for initial model download only, then fully offline)

Build & Installation

1. Clone the Repository

git clone https://github.com/mp-c0de/UniversalVideoTranscriber.git
cd UniversalVideoTranscriber

2. Whisper Integration (Critical)

This app relies on the whisper.cpp command-line tool for local transcription. You must compile it and add it to the project.

Clone and Build whisper.cpp:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make

Add to Xcode:
- Locate the compiled whisper-cli binary (usually in the root or build/bin folder).
- Drag and drop whisper-cli into the UniversalVideoTranscriber/Resources group in Xcode.
- Important: Ensure "Copy items if needed" is checked and "Add to targets" includes UniversalVideoTranscriber.
Add Libraries:
- Locate libggml.dylib, libwhisper.dylib, and related .dylib files generated by the build.
- Add them to the project in the same way.

3. Build and Run

Open UniversalVideoTranscriber.xcodeproj in Xcode.
Select the UniversalVideoTranscriber scheme.
Press Cmd+R to build and run.

4. First Launch

On first launch, the app will automatically:

Detect that the Whisper Medium model is not present
Download the model (1.5GB) from HuggingFace with progress tracking
Save it to ~/Library/Application Support/UniversalVideoTranscriber/WhisperModels/
Make the app ready for transcription

Note: The download happens only once. Subsequent launches will skip this step.

Usage Guide

Basic Workflow

First Launch: Wait for the Whisper Medium model to download automatically (1.5GB, one-time).
Select Video: Click "Select Video" button to choose a video file from your computer.
Choose Language:
- Use "Auto-detect" (default) to let Whisper identify the language automatically
- Or manually select from 99 supported languages in the dropdown
Transcribe: Click the "Transcribe" button.
- Progress bar will show conversion and transcription progress
- Wait ~30 seconds for a 5-minute video (varies by video length and Mac performance)
Review: The transcript appears on the right panel, synchronized with video playback.
Play Video: Click the play button (video loads paused, not auto-playing).
Edit (Optional): Click "Edit" button to open the transcript editor:
- Merge: Select multiple segments and click "Merge Selected"
- Split: Position cursor in text and click "Split at Cursor"
- Delete: Select segments and click "Delete Selected"
- Edit Text: Click any segment to edit text inline
- Adjust Timestamps: Click timestamps to modify with time picker
Export: Use the "Export" menu to save:
- Text Export: Plain text with timestamps [HH:MM:SS.mmm] Transcript text
- SRT Export: SubRip subtitle format for video players

Advanced Features

Advanced Whisper Parameters (Settings → Advanced):

Temperature (0.0-1.0): Control transcription randomness
- Lower (0.0-0.3): More accurate, deterministic results
- Higher (0.5-0.8): Better at capturing unclear speech
- Default: 0.0 (maximum accuracy)
Suppress Non-Speech Tokens (On/Off): Filter background music and noise
- Keep ON for clean transcripts
- Turn OFF if speech is being incorrectly filtered
- Default: ON
Beam Size (1-8): Control search width during decoding
- Higher values (5-8): Better quality, slower processing
- Lower values (1-3): Faster processing, lower quality
- Default: 5 (balanced)
Reset to Defaults: Quick restore of optimal settings

Subtitle Overlay:

Toggle the "Subtitles" button in video player controls
Subtitles display synchronized with video playback
Appears at bottom of video with semi-transparent background

Search:

Switch to "Search" tab in right panel
Enter keywords to find specific parts of the transcript
Click results to jump to that timestamp in the video

Keyboard Shortcuts:

Cmd+O: Select Video
Cmd+,: Open Settings
Cmd+?: Show Help
Cmd+Shift+E: Export Text
Space: Play/Pause Video

Settings (Cmd+,): Three tabs for organized configuration:

General: Whisper model management (download/re-download)
Advanced: Fine-tune transcription parameters
About: Version info, credits, copyright

Help System (Cmd+?):

Comprehensive getting started guide
Full feature documentation
Keyboard shortcuts reference
Advanced settings explanations
Troubleshooting common issues

Language Support

The Whisper Medium model supports 99 languages including:

Common Languages (Quick Access):

Auto-detect (recommended - detects language automatically)
English, Spanish, French, German, Italian
Portuguese, Dutch, Polish, Lithuanian
And 90+ more in the full dropdown list

How Auto-Detect Works:

Whisper analyzes the audio in the first few seconds
Identifies the spoken language (e.g., Lithuanian)
Transcribes the entire video in that language (Lithuanian text output)
Does NOT translate to English - preserves the original language

Performance:

~94% accuracy across all supported languages
Processes ~30 seconds for a 5-minute video
Runs entirely offline after initial model download

Performance

Transcription Speed:

5-minute video: ~30 seconds
30-minute video: ~3 minutes
2-hour video: ~12 minutes

Hardware Requirements:

Runs on CPU (no GPU required)
Optimised for Apple Silicon (M1/M2/M3/M4)
Intel Macs supported (slower processing)

Model Size:

Whisper Medium: 1.5GB
Stored in: ~/Library/Application Support/UniversalVideoTranscriber/WhisperModels/

Troubleshooting

Video doesn't play:

Ensure video format is supported (MP4, MOV, M4V, AVI)
Check video file isn't corrupted
Try re-selecting the video

Transcription fails:

Verify Whisper Medium model is downloaded (check Settings → General)
Ensure video has audio track
Try re-downloading model via Settings → General → Re-download Model

Transcription is inaccurate:

Adjust Advanced Settings (Settings → Advanced):
- Increase Beam Size to 8 for maximum quality
- Adjust Temperature (try 0.5-0.8 if speech is unclear)
- Ensure correct Language is selected (or use Auto-detect)

Transcription misses words or cuts out speech:

Turn OFF "Suppress Non-Speech Tokens" in Settings → Advanced
This filter may be removing speech that sounds like noise

Auto-detect translates to English (instead of transcribing):

This is now fixed in latest version (v1.1+)
Update to latest version or manually select the language

App is slow or unresponsive:

Whisper transcription is CPU-intensive
Higher Beam Size values (8) significantly increase processing time
Try lowering Beam Size to 3-5 in Settings → Advanced for faster results

First launch download stuck:

Check internet connection
Ensure 1.5GB free disk space
Restart app and try again

Need help?

Press Cmd+? to open the built-in Help system
Full documentation with troubleshooting guide available in-app

Contribution & Licence

All rights reserved. This code is for educational and review purposes only. Redistribution, modification, or commercial use of this source code without explicit written permission from the owner is strictly prohibited.

If you wish to contribute, please fork the repository and submit a Pull Request for review. Note that submitting a PR implies you grant the owner rights to use your contributions.

Maintained by mpcode

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
UniversalVideoTranscriber.xcodeproj		UniversalVideoTranscriber.xcodeproj
UniversalVideoTranscriber		UniversalVideoTranscriber
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal Video Transcriber

Key Features

How It Works

Technical Architecture

Core Components

Prerequisites

Build & Installation

1. Clone the Repository

2. Whisper Integration (Critical)

3. Build and Run

4. First Launch

Usage Guide

Basic Workflow

Advanced Features

Language Support

Performance

Troubleshooting

Contribution & Licence

About

Uh oh!

Releases

Packages

Languages

License

mp-c0de/UniversalVideoTranscriber

Folders and files

Latest commit

History

Repository files navigation

Universal Video Transcriber

Key Features

How It Works

Technical Architecture

Core Components

Prerequisites

Build & Installation

1. Clone the Repository

2. Whisper Integration (Critical)

3. Build and Run

4. First Launch

Usage Guide

Basic Workflow

Advanced Features

Language Support

Performance

Troubleshooting

Contribution & Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages