feat: support any yt-dlp site + readable download filenames by Mapleeeeeeeeeee · Pull Request #9 · Mapleeeeeeeeeee/bilingualsub

Mapleeeeeeeeeee · 2026-04-20T03:06:55Z

Summary

Rename youtube_url → source_url across all layers (schema, jobs, pipeline, routes, frontend)
Replace YouTube-only URL validation with yt-dlp extractor-based _is_supported_url (supports X/Twitter, TikTok, Bilibili, etc.)
Add _sanitize_filename + _build_download_filename for human-readable download filenames (e.g., Video Title (en_to_zh-TW).srt)
Prefer yt-dlp info_dict title over ffprobe MP4 tags (fixes missing titles on non-YouTube platforms)
Add @lru_cache to _is_supported_url for per-request performance (1,862 extractors)
Delete 103-line run_pipeline dead code, clean up bare except Exception, extract constants
Add API contract IT + live-download E2E tests; fix test quality issues (remove Mockery/Implementation-Detail anti-patterns)

Test plan

Unit tests: 126 passed
Ruff / mypy / vulture / prettier all pass
Manual test: X/Twitter video download + full pipeline (transcribe + translate) verified
CI: integration + E2E tests

🤖 Generated with Claude Code

- Rename youtube_url -> source_url across backend schemas, Job, routes, pipeline and all tests, plus frontend types, component and i18n copy. Reflects that the tool now supports any platform yt-dlp can handle (X, Twitter, TikTok, etc.), not just YouTube. - Replace regex-based _is_youtube_url() with _is_supported_url(), which delegates to yt-dlp extractors (excluding the generic one) so URL validation always matches yt-dlp's actual capabilities. - Download filenames now use the video title plus a language-intent suffix: "<title> (original).srt" for source-only outputs and "<title> (en_to_zh-TW).srt" for translated ones, with illegal chars stripped and length capped at 120 chars. - Frontend copy and URL validation relaxed to accept any https URL.

Correctness / Cleanliness: - Rename download_youtube_video -> download_video (leaked abstraction); update all call sites in pipeline, core/__init__ and tests. - Swap Chinese error string for English ("URL not supported by downloader (yt-dlp): ...") to match the rest of the file and avoid leaking a UI concern into core. - Strip trailing " ." after 120-char truncation in _sanitize_filename to avoid Windows-invalid names ending in space. - Update module docstring in downloader.py to drop "YouTube". Efficiency (Blocker): - Cache yt-dlp extractor classes at module load; _is_supported_url now iterates the cached list and calls suitable() on classes (~11ms/call -> ~0ms). Removes the deferred import + PLC0415 noqa. Simplicity / Convention: - Merge _EXTENSIONS and _MEDIA_TYPES into a single _FILE_META dict keyed by FileType (NamedTuple of ext + media_type). - Extract magic strings: _DEFAULT_FILENAME, _SUFFIX_ORIGINAL, _LANG_SEPARATOR. Frontend: - Tighten URL pattern from /^https?:\/\/.+/ to /^https?:\/\/\S+$/ so whitespace-only suffixes are rejected client-side. API defensiveness (QA F2): - JobCreateRequest now uses extra="forbid"; clients that send youtube_url (old field) alongside source_url get 422 instead of a silent success. Tests: - Strengthen two assertion-roulette cases to assert exact strings / legal content preservation, so a mutation that returns the fallback "video" would now fail the tests.

download_video was populating VideoMetadata.title from ffprobe's MP4 container tags, which for a freshly-downloaded video is empty — ffprobe falls back to the filename stem `video`. The yt-dlp info_dict carries the upstream platform title (YouTube / X / TikTok…), matching the pre-existing treatment of `description`. Previously unobservable because the download filename was hardcoded to `bilingualsub.{ext}`. After the rename to use `video_title` in the Content-Disposition filename, every downloaded file showed up as `video (original).{ext}`. E2E run with "Me at the zoo" (YouTube's first video) now produces `Me at the zoo (original).mp4` as expected.

@e2e

tests/integration/test_api_contract.py (19 tests, @integration): - JobCreateRequest schema: source_url accepted; youtube_url (old field), both fields simultaneously, non-URL strings, and empty bodies all get 422 via HttpUrl + extra="forbid". - HTTP-layer acceptance of any valid HttpUrl (YouTube, youtu.be, X, Twitter, TikTok, Vimeo) — the test is intentionally scoped to schema layer only; extractor-level coverage is in the unit tests and in the async failure test below. - Async unsupported-URL path: example.com creates a job then fails via the pipeline with error.code=invalid_input and a yt-dlp-specific detail. 10-second polling window to survive CI thread contention. - Content-Disposition filename format: parametrized over title, language pair, and file type, covering translated outputs, same-lang original suffix, empty-title fallback, and illegal-char sanitisation. - Both fixtures use `with TestClient(app) as c: yield c` so Starlette actually runs the lifespan and populates app.state.job_manager. tests/e2e/test_filename_e2e.py (2 tests, @e2e, gated by ENABLE_LIVE_DOWNLOAD=1): - Real yt-dlp download of "Me at the zoo" through FastAPI, then asserts Content-Disposition filename == "Me at the zoo (original).{mp4|mp3}". - Specifically exists to catch regression of the bug where video_title was read from ffprobe MP4 container tags (empty) instead of yt-dlp info_dict.title. If the live download fails we pytest.fail — silent skip would hide the regression the test is built to detect. tests/unit/core/test_downloader.py: - Rename test_ffprobe_missing_title_uses_filename -> test_info_dict_ title_overrides_ffprobe. The old expectation (fall back to filename stem) tested behaviour that was itself the bug; the new assertion matches the documented contract that info_dict.title always wins.

Production code: - Delete run_pipeline dead code (103 lines) - Replace list[Any] with list[type] for extractor classes - Add @lru_cache to _is_supported_url (1862 extractors) - Replace bare except Exception with specific exceptions - Add _FILENAME_BAD_CHARS comment, extract _MAX_UPLOAD_BYTES constant - Remove WHAT comments, remove dead url_placeholder i18n key Test code: - Delete 3 Implementation-Detail/Mockery tests in test_pipeline - Migrate all tests from run_pipeline to run_download/run_subtitle - Remove mock internal assertions in test_downloader - Use exact value assertion in test_routes - Rename misleading IT to clarify PlayRes fixed design - Add coverage: extract_audio failure + sanitize edge cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mapleeeeeeeeeee and others added 5 commits April 18, 2026 09:50

Mapleeeeeeeeeee merged commit 767a2b8 into main Apr 20, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support any yt-dlp site + readable download filenames#9

feat: support any yt-dlp site + readable download filenames#9
Mapleeeeeeeeeee merged 5 commits into
mainfrom
feat/readable-filename-multiplatform

Mapleeeeeeeeeee commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mapleeeeeeeeeee commented Apr 20, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant