feat: support any yt-dlp site + readable download filenames#9
Merged
Conversation
- Rename youtube_url -> source_url across backend schemas, Job, routes, pipeline and all tests, plus frontend types, component and i18n copy. Reflects that the tool now supports any platform yt-dlp can handle (X, Twitter, TikTok, etc.), not just YouTube. - Replace regex-based _is_youtube_url() with _is_supported_url(), which delegates to yt-dlp extractors (excluding the generic one) so URL validation always matches yt-dlp's actual capabilities. - Download filenames now use the video title plus a language-intent suffix: "<title> (original).srt" for source-only outputs and "<title> (en_to_zh-TW).srt" for translated ones, with illegal chars stripped and length capped at 120 chars. - Frontend copy and URL validation relaxed to accept any https URL.
Correctness / Cleanliness:
- Rename download_youtube_video -> download_video (leaked abstraction);
update all call sites in pipeline, core/__init__ and tests.
- Swap Chinese error string for English ("URL not supported by
downloader (yt-dlp): ...") to match the rest of the file and avoid
leaking a UI concern into core.
- Strip trailing " ." after 120-char truncation in _sanitize_filename
to avoid Windows-invalid names ending in space.
- Update module docstring in downloader.py to drop "YouTube".
Efficiency (Blocker):
- Cache yt-dlp extractor classes at module load; _is_supported_url
now iterates the cached list and calls suitable() on classes
(~11ms/call -> ~0ms). Removes the deferred import + PLC0415 noqa.
Simplicity / Convention:
- Merge _EXTENSIONS and _MEDIA_TYPES into a single _FILE_META dict
keyed by FileType (NamedTuple of ext + media_type).
- Extract magic strings: _DEFAULT_FILENAME, _SUFFIX_ORIGINAL,
_LANG_SEPARATOR.
Frontend:
- Tighten URL pattern from /^https?:\/\/.+/ to /^https?:\/\/\S+$/
so whitespace-only suffixes are rejected client-side.
API defensiveness (QA F2):
- JobCreateRequest now uses extra="forbid"; clients that send
youtube_url (old field) alongside source_url get 422 instead of a
silent success.
Tests:
- Strengthen two assertion-roulette cases to assert exact strings /
legal content preservation, so a mutation that returns the fallback
"video" would now fail the tests.
download_video was populating VideoMetadata.title from ffprobe's MP4
container tags, which for a freshly-downloaded video is empty — ffprobe
falls back to the filename stem `video`. The yt-dlp info_dict carries
the upstream platform title (YouTube / X / TikTok…), matching the
pre-existing treatment of `description`.
Previously unobservable because the download filename was hardcoded to
`bilingualsub.{ext}`. After the rename to use `video_title` in the
Content-Disposition filename, every downloaded file showed up as
`video (original).{ext}`. E2E run with "Me at the zoo" (YouTube's
first video) now produces `Me at the zoo (original).mp4` as expected.
tests/integration/test_api_contract.py (19 tests, @integration): - JobCreateRequest schema: source_url accepted; youtube_url (old field), both fields simultaneously, non-URL strings, and empty bodies all get 422 via HttpUrl + extra="forbid". - HTTP-layer acceptance of any valid HttpUrl (YouTube, youtu.be, X, Twitter, TikTok, Vimeo) — the test is intentionally scoped to schema layer only; extractor-level coverage is in the unit tests and in the async failure test below. - Async unsupported-URL path: example.com creates a job then fails via the pipeline with error.code=invalid_input and a yt-dlp-specific detail. 10-second polling window to survive CI thread contention. - Content-Disposition filename format: parametrized over title, language pair, and file type, covering translated outputs, same-lang original suffix, empty-title fallback, and illegal-char sanitisation. - Both fixtures use `with TestClient(app) as c: yield c` so Starlette actually runs the lifespan and populates app.state.job_manager. tests/e2e/test_filename_e2e.py (2 tests, @e2e, gated by ENABLE_LIVE_DOWNLOAD=1): - Real yt-dlp download of "Me at the zoo" through FastAPI, then asserts Content-Disposition filename == "Me at the zoo (original).{mp4|mp3}". - Specifically exists to catch regression of the bug where video_title was read from ffprobe MP4 container tags (empty) instead of yt-dlp info_dict.title. If the live download fails we pytest.fail — silent skip would hide the regression the test is built to detect. tests/unit/core/test_downloader.py: - Rename test_ffprobe_missing_title_uses_filename -> test_info_dict_ title_overrides_ffprobe. The old expectation (fall back to filename stem) tested behaviour that was itself the bug; the new assertion matches the documented contract that info_dict.title always wins.
Production code: - Delete run_pipeline dead code (103 lines) - Replace list[Any] with list[type] for extractor classes - Add @lru_cache to _is_supported_url (1862 extractors) - Replace bare except Exception with specific exceptions - Add _FILENAME_BAD_CHARS comment, extract _MAX_UPLOAD_BYTES constant - Remove WHAT comments, remove dead url_placeholder i18n key Test code: - Delete 3 Implementation-Detail/Mockery tests in test_pipeline - Migrate all tests from run_pipeline to run_download/run_subtitle - Remove mock internal assertions in test_downloader - Use exact value assertion in test_routes - Rename misleading IT to clarify PlayRes fixed design - Add coverage: extract_audio failure + sanitize edge cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
youtube_url→source_urlacross all layers (schema, jobs, pipeline, routes, frontend)_is_supported_url(supports X/Twitter, TikTok, Bilibili, etc.)_sanitize_filename+_build_download_filenamefor human-readable download filenames (e.g.,Video Title (en_to_zh-TW).srt)info_dicttitle over ffprobe MP4 tags (fixes missing titles on non-YouTube platforms)@lru_cacheto_is_supported_urlfor per-request performance (1,862 extractors)run_pipelinedead code, clean up bareexcept Exception, extract constantsTest plan
🤖 Generated with Claude Code