feat: visual description mode for videos without speech by Mapleeeeeeeeeee · Pull Request #11 · Mapleeeeeeeeeee/bilingualsub

Mapleeeeeeeeeee · 2026-04-28T03:37:53Z

Summary

Add visual description mode that uses Gemini 3.1 Flash Lite Preview to analyze video frames and generate timestamped description subtitles for speechless videos
Frontend toggle between "Speech Subtitles" and "Visual Description" modes with full pipeline support
Mode switch between visual→subtitle and subtitle→visual at any phase (with on-the-fly audio extraction fallback)

Changes

Backend: visual_describer.py core module, pipeline branching in pipeline.py, ProcessingMode enum, upload validation reordering
Frontend: mode toggle in UrlInput, ProcessingMode type, DownloadLinks filtering (hide ASS/Audio in visual mode), ProgressTracker source badge, useJob state tracking, startSubtitle mode forwarding
Config: GEMINI_API_KEY + VISUAL_DESCRIPTION_MODEL env vars
Docs: design doc + architecture doc

Test plan

166 unit tests pass, no regression
ruff, mypy, TypeScript, prettier all clean
curl E2E: create job (both modes), upload with mode, invalid mode rejection
curl E2E: visual→subtitle mode switch (C-B2 fix verified — audio extracted on-the-fly)
curl E2E: subtitle→visual mode switch
curl E2E: SRT download with bilingual content verified
4 rounds of code review (correctness, convention, simplicity, cleanliness, efficiency)
QA verification of all bug fixes

🤖 Generated with Claude Code

New pipeline mode that uses Gemini to analyze video frames and generate translated subtitles from visual content (on-screen text, UI elements, scene descriptions). Users toggle between speech subtitles and visual description via a new UI switch. Backend: core/visual_describer.py (Gemini File API), pipeline branch on processing_mode, configurable model via VISUAL_DESCRIPTION_MODEL env var. Frontend: Toggle in UrlInput, i18n keys, processing_mode in request types. Tests: 7 unit tests + 3 integration tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ciples Restructured DESCRIBE_PROMPT using XML sections with why-what-how flow: pacing (3-8s segments), on_screen_text (quote actual text for translation), ui_actions (narrate purpose not labels), skip (omit logo cards). Also fixed file processing wait loop and mypy type issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add ProcessingMode StrEnum to eliminate magic strings - Add polling timeout (600s) and FAILED state handling for Gemini file upload - Clean up uploaded files from Gemini after use (try/finally) - Fix error handling in _run_visual_description_subtitle (no re-raise) - Fix check order: _genai import before API key validation - Add processing_mode to upload route - Extract _require_api_key helper (Rule of Three) - Use source_lang in prompt instead of ignoring it - Pass work_dir as parameter to _serialize_translated_only - Tighten test assertions (remove assertion roulette) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Code review fixes: - Remove duplicate [dependency-groups] section from pyproject.toml - Validate processing_mode in upload route (prevent 500 on invalid input) - Use ProcessingMode.SUBTITLE as Form default instead of magic string - Change _genai-missing check to raise ValueError for correct error routing - Guard client in finally block to prevent potential UnboundLocalError - Refactor _require_api_key to accept value directly (type-safe, no getattr) - Skip audio extraction for visual description mode (saves 30-60s) - Align pre-commit mypy (v1.10→v1.19.1) and add google-genai to its additional_dependencies so both environments resolve the same types Test review fixes: - Add MM:SS and HH:MM:SS timestamp parsing tests - Add start>=end boundary value test - Add FAILED state and timeout path tests for _wait_for_active - Verify describe→translate causal chain in IT Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Round-3: early-fail on missing file name, audio extraction guard for mode switch, validation before file upload, error message fixes, frontend ProcessingMode type alias, upload FormData forwards mode. Round-4: ProgressTracker visual description label, DownloadLinks hides ASS/Audio in visual mode, startSubtitle forwards processingMode, consolidate get_settings calls, update docs to Gemini 3.1 Flash Lite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mapleeeeeeeeeee and others added 5 commits April 24, 2026 14:12

Mapleeeeeeeeeee merged commit dede41b into main Apr 28, 2026
2 of 3 checks passed

Mapleeeeeeeeeee deleted the feat/visual-description-mode branch April 28, 2026 03:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: visual description mode for videos without speech#11

feat: visual description mode for videos without speech#11
Mapleeeeeeeeeee merged 5 commits into
mainfrom
feat/visual-description-mode

Mapleeeeeeeeeee commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mapleeeeeeeeeee commented Apr 28, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant