AI Media Studio — Desktop App (v1.0.0)

One workspace that summarizes and compares websites, YouTube videos, documents, images, audio, and video — powered by a local Python extraction pipeline and LiteLLM model routing.

Status: beta desktop/web AI workspace. The repo name still reflects the original scraper/Discord-bot roots, but the current product is AI Media Studio.

⚡ TL;DR — what you need to know


What it is	A multimodal AI chat workspace: paste a URL, YouTube link, or drop a file, and get an AI summary/analysis — with per-chat memory.
Install	Download a release installer, or run from source (see Quick Start).
Inputs	Websites · YouTube · text · PDF/DOCX/PPTX/XLSX/RTF/MD/CSV/JSON/HTML/XML · images · audio · video.
Requires	Python 3.10+, an AI provider key in `.env`; for full features: `ffmpeg`, `tesseract`, Playwright Chromium.
Models	22+ models across 5 providers (OpenAI, Anthropic, Google, Together/Llama, MiniMax) — or any LiteLLM string. Set `TEXT_AI_MODEL`.
Why it's robust	Layered extraction with fallback chains (esp. YouTube transcripts) so a summary still works when one path is blocked.
Drag & drop	Drop files straight onto the desktop window to analyze them.
Local-first	DB, downloads, transcripts, and uploads stay in user-writable local locations. You pay your own provider API usage.
Honesty layer	The app carries which path actually succeeded into the summary, to avoid faking that a fallback worked.

Who it's for: anyone who wants a single tool to read/watch/listen for them — researchers comparing a paper against an article, people summarizing long videos, or anyone tired of juggling separate scraper / transcript / OCR / vision tools.

Jump to: Screenshots · Quick Start · Architecture · What it can do · Supported inputs & file types · Extraction pipeline · Models · Desktop install · Run from source · Build installers · Config defaults · Troubleshooting

Why use AI Media Studio?

Handles many input types in one workspace instead of separate tools.
Uses extraction fallbacks so summaries can still work when one source path fails.
Keeps per-chat memory and uploaded context local to the app workflow.
Supports multiple AI providers through LiteLLM.

Current limitations

Summaries are only as reliable as the extracted source content and the selected model.
Some websites, videos, or files may block extraction or require fallback analysis.
API keys and local dependencies are required for full multimodal functionality.

Screenshots

Workspace — Multi-Chat Interface with Document Analysis

Sidebar chat management, hero banner, chat area with an AI-summarized research paper, and the input bar with file attachment.

YouTube Video Analysis

A YouTube link is pasted in chat; the AI extracts the transcript, summarizes key points, and answers follow-ups.

Website Scraping + File Comparison

A website is scraped and summarized, then a PDF is uploaded and compared against the article — multi-step analysis in one chat.

🚀 Quick Start

Option A — Desktop installer (recommended)

Get the latest from the Releases page:

Platform	File
macOS (Apple Silicon — M1/M2/M3/M4)	`AI-Media-Studio-1.0.0-macOS-Apple-Silicon.dmg`
macOS (Intel)	`AI-Media-Studio-1.0.0-macOS-Intel.dmg`
Windows (64-bit)	`AI-Media-Studio-Setup-1.0.0-Windows-x64.exe`

See Installation (Desktop App) for prerequisites and first-launch steps.

Option B — Run from source

git clone https://github.com/Evan1108-Coder/Website-Youtube-File-AI-Scraper.git
cd Website-Youtube-File-AI-Scraper
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with the AI provider keys you want to use
uvicorn src.ai_scraper_bot.webapp:app --reload

The web server runs at http://127.0.0.1:8000 (or port 18919 when launched via Electron).

Desktop App Features

Everything from the web version, plus:

Native app — launches from Applications/Start Menu, with its own dock icon.
Drag-and-drop — drop files directly onto the window to analyze them.
Auto-starts backend — no need to manually run a Python server.
Hidden inset titlebar (macOS) — clean, modern appearance.
Persistent storage — database and downloads in user-writable locations.

What this app can do

Normal AI chat with memory per chat.
Multiple chats in a left sidebar (limit 10), with rename / clear / delete / clear-all controls.
Upload files directly in the browser or by drag-and-drop.
Summarize websites and YouTube links.
Analyze documents, images, audio, and video.
Per-chat drafts & attachment state, a pause button while the AI works, local SQLite history, and a banner hide/show toggle.

It's designed to feel like a normal AI chat screen, with the project's multimodal extraction pipeline behind it.

Supported inputs & file types

Inputs: websites · YouTube links · text questions · uploaded documents / images / audio / video.

Category	Extensions
Text & markup	`.txt` `.md` `.csv` `.json` `.html` `.xml`
Documents	`.pdf` `.docx` `.pptx` `.xlsx` `.rtf`
Images	`.png` `.jpg` `.jpeg` `.avif`
Audio	`.mp3` `.wav` `.m4a` `.aac` `.flac` `.ogg`
Video	`.mp4` `.mov`

Core extraction pipeline

The project avoids dead-end failures by using layered extraction.

Websites

Page-text extraction → related useful-URL collection → image review when relevant → directly-downloadable website-video review when possible → a final summary focused on the subject matter, not just page structure.

YouTube (transcript-first, with fallbacks)

optional YouTube Data API metadata
youtube-transcript-api
yt-dlp subtitle attempt
DownSub + Playwright
SaveSubs + Playwright
metadata fallback

So the app can still produce something useful even if direct YouTube access is partially blocked.

Images & video frames

The active visual-description path uses your configured AI model — image descriptions and video key-frame descriptions both come from your model (the older BLIP caption path is no longer the normal active flow).

Audio & video

The media pipeline separates transcript/speech analysis, visual analysis, and music analysis — so a silent video can still be reviewed visually, and a music-heavy file can still produce music analysis even if speech transcription is weak.

Music analysis (free/local-friendly by default)

Essentia — default music feature layer (BPM, key, loudness-like values).
AcoustID — optional song ID via local fingerprinting + API key.
MIRFLEX — optional repo hook for future music tagging/classification.

If one music stage fails, the others continue.

Honesty & failure handling

The app carries extra runtime/extraction context into the summary pipeline — which YouTube path succeeded, which music libraries were attempted vs produced output, recent runtime diary lines, and which media was actually reviewed — to reduce fake claims like "a fallback worked" when it didn't.

Supported AI Models

Set TEXT_AI_MODEL in your .env to any of these (or any LiteLLM-compatible model string):

Provider	Models
OpenAI	`gpt-5.5-pro`, `gpt-5.5`, `gpt-5.5-mini`, `gpt-5.4-pro`, `gpt-5.4-mini`, `gpt-4o`, `gpt-4o-mini`
Anthropic	`claude-opus-4-7`, `claude-sonnet-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`, `claude-3.5-sonnet`
Google	`gemini-3.1-pro`, `gemini-3-flash`, `gemini-2.5-flash-lite`
Together AI (Llama)	`llama-4-maverick`, `llama-4-scout`, `llama-3.3-70b`
MiniMax	`minimax-m3`, `minimax-m2.7`, `minimax-m2.5`

Installation (Desktop App)

macOS

Download the .dmg for your chip (Apple Silicon or Intel).
Open the DMG and drag AI Media Studio to Applications.
On first launch, right-click the app → Open (to bypass Gatekeeper — the app is unsigned).
The app auto-starts the Python backend.

Windows

Download the .exe installer.
Run it and follow the prompts.
Launch AI Media Studio from the Start Menu.

Prerequisites

Python 3.10+ in PATH, with pip install -r requirements.txt.
System tools: ffmpeg, tesseract (audio/OCR features).
Playwright Chromium: playwright install chromium (website scraping).
.env with your AI API keys (see .env.example).

Where the app stores data

Database: ~/Library/Application Support/ai-media-studio/webapp.sqlite (macOS) / %APPDATA%/ai-media-studio/webapp.sqlite (Windows).
Downloads: ~/Documents/AI Media Studio Downloads/.

Development (Run from Source)

Full guide in SETUP.md.

pip install -r requirements.txt
npm install

npm start                                   # Electron app in dev mode
# or run just the web server:
PYTHONPATH=src python -m ai_scraper_bot.webapp

Main files

Area	Files
Web entrypoint	`src/ai_scraper_bot/webapp.py`
Web backend	`src/ai_scraper_bot/web/service.py`, `web/store.py`
Web frontend	`src/ai_scraper_bot/web/static/index.html`, `app.css`, `app.js`
Shared config / prompts	`src/ai_scraper_bot/config.py`, `prompts.py`
Summarizer (LiteLLM)	`src/ai_scraper_bot/services/summarizer.py`
YouTube extraction	`src/ai_scraper_bot/services/youtube.py`
Website extraction	`src/ai_scraper_bot/services/website.py`
Transcript-site fallbacks	`services/downsub.py`, `services/savesubs.py`
Transcription	`src/ai_scraper_bot/services/transcription.py`
Local video / vision / music	`services/video_analysis.py`, `services/vision.py`, `services/music_analysis.py`
File parsing	`src/ai_scraper_bot/parsers/file_parser.py`

Building Installers

npm run build:mac    # macOS (arm64 + x64)
npm run build:win    # Windows (x64)
npm run build:all    # both platforms

Output goes to dist-electron/.

Recommended Defaults

ENABLE_LOCAL_VISION=true
ENABLE_MUSIC_DETECTION=true
MUSIC_ESSENTIA_ENABLED=true
MUSIC_ACOUSTID_ENABLED=false      # until AcoustID is configured
MUSIC_MIRFLEX_ENABLED=false       # until MIRFLEX is actually set up
YOUTUBE_COOKIE_MODE_ENABLED=false
YOUTUBE_DOWNSUB_ENABLED=true
YOUTUBE_SAVESUBS_ENABLED=true
YOUTUBE_TRANSCRIPT_SITE_HEADLESS=true

Full env reference: ENVREADME.md.

Diagrams

System Architecture

Web UI ↔ FastAPI server ↔ specialized extractors (web, YouTube, documents, vision, audio, music), with LLM calls routed through LiteLLM to multiple providers.

Processing Pipeline

How URLs, YouTube links, documents, images, audio, and video flow through layered extraction with fallback chains, converging into a final LiteLLM summary stored in per-chat SQLite memory.

Privacy & data handling

API keys belong in .env and should never be committed.
Uploaded files, scraped content, transcripts, and summaries may be stored locally by the app.
Content can be sent to the configured AI provider when you ask the app to summarize/analyze it.
Don't upload private/sensitive files unless you understand your local setup and selected provider.
Before sharing the repo, do not expose .env, .venv, downloaded test media, cookies files, browser profile exports, real API keys, or machine-specific secrets (.gitignore covers these — double-check anyway).
See SECURITY.md for vulnerability reporting and secret handling.

Troubleshooting Notes

YouTube still fails — that doesn't mean the whole app is broken; check which stage failed (youtube-transcript-api, yt-dlp, DownSub, SaveSubs, metadata fallback).

Audio transcription fails with NumPy / torch issues — the local Whisper setup is currently intended to run with numpy<2.

A video has no audio — some .mp4/.mov files are silent: transcript audio analysis won't run, but visual review still can; music analysis only runs if an audio stream exists.

MIRFLEX enabled but not working — it's an optional repo hook; the rest of the music chain continues regardless.

More: TROUBLESHOOTING.md.

Reference docs

Topic	Document
Full setup & configuration	SETUP.md
Every environment variable	ENVREADME.md
Troubleshooting	TROUBLESHOOTING.md
Security policy	SECURITY.md
Contributing	CONTRIBUTING.md

License

Real visual snapshot

These visuals are generated from the actual repository structure and project workflow, not placeholders.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github		.github
docs		docs
electron		electron
src/ai_scraper_bot		src/ai_scraper_bot
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
ENVREADME.md		ENVREADME.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SETUP.md		SETUP.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Media Studio — Desktop App (v1.0.0)

⚡ TL;DR — what you need to know

Why use AI Media Studio?

Current limitations

Screenshots

Workspace — Multi-Chat Interface with Document Analysis

YouTube Video Analysis

Website Scraping + File Comparison

🚀 Quick Start

Option A — Desktop installer (recommended)

Option B — Run from source

Desktop App Features

What this app can do

Supported inputs & file types

Core extraction pipeline

Websites

YouTube (transcript-first, with fallbacks)

Images & video frames

Audio & video

Music analysis (free/local-friendly by default)

Honesty & failure handling

Supported AI Models

Installation (Desktop App)

macOS

Windows

Prerequisites

Where the app stores data

Development (Run from Source)

Main files

Building Installers

Recommended Defaults

Diagrams

System Architecture

Processing Pipeline

Privacy & data handling

Troubleshooting Notes

Reference docs

License

Real visual snapshot

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages