feat: visual description mode for videos without speech#11
Merged
Conversation
New pipeline mode that uses Gemini to analyze video frames and generate translated subtitles from visual content (on-screen text, UI elements, scene descriptions). Users toggle between speech subtitles and visual description via a new UI switch. Backend: core/visual_describer.py (Gemini File API), pipeline branch on processing_mode, configurable model via VISUAL_DESCRIPTION_MODEL env var. Frontend: Toggle in UrlInput, i18n keys, processing_mode in request types. Tests: 7 unit tests + 3 integration tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ciples Restructured DESCRIBE_PROMPT using XML sections with why-what-how flow: pacing (3-8s segments), on_screen_text (quote actual text for translation), ui_actions (narrate purpose not labels), skip (omit logo cards). Also fixed file processing wait loop and mypy type issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ProcessingMode StrEnum to eliminate magic strings - Add polling timeout (600s) and FAILED state handling for Gemini file upload - Clean up uploaded files from Gemini after use (try/finally) - Fix error handling in _run_visual_description_subtitle (no re-raise) - Fix check order: _genai import before API key validation - Add processing_mode to upload route - Extract _require_api_key helper (Rule of Three) - Use source_lang in prompt instead of ignoring it - Pass work_dir as parameter to _serialize_translated_only - Tighten test assertions (remove assertion roulette) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code review fixes: - Remove duplicate [dependency-groups] section from pyproject.toml - Validate processing_mode in upload route (prevent 500 on invalid input) - Use ProcessingMode.SUBTITLE as Form default instead of magic string - Change _genai-missing check to raise ValueError for correct error routing - Guard client in finally block to prevent potential UnboundLocalError - Refactor _require_api_key to accept value directly (type-safe, no getattr) - Skip audio extraction for visual description mode (saves 30-60s) - Align pre-commit mypy (v1.10→v1.19.1) and add google-genai to its additional_dependencies so both environments resolve the same types Test review fixes: - Add MM:SS and HH:MM:SS timestamp parsing tests - Add start>=end boundary value test - Add FAILED state and timeout path tests for _wait_for_active - Verify describe→translate causal chain in IT Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round-3: early-fail on missing file name, audio extraction guard for mode switch, validation before file upload, error message fixes, frontend ProcessingMode type alias, upload FormData forwards mode. Round-4: ProgressTracker visual description label, DownloadLinks hides ASS/Audio in visual mode, startSubtitle forwards processingMode, consolidate get_settings calls, update docs to Gemini 3.1 Flash Lite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
visual_describer.pycore module, pipeline branching inpipeline.py,ProcessingModeenum, upload validation reorderingUrlInput,ProcessingModetype,DownloadLinksfiltering (hide ASS/Audio in visual mode),ProgressTrackersource badge,useJobstate tracking,startSubtitlemode forwardingGEMINI_API_KEY+VISUAL_DESCRIPTION_MODELenv varsTest plan
🤖 Generated with Claude Code