Fix duplicated subtitle issue--core deduplication logic and screen-display part#1448
Fix duplicated subtitle issue--core deduplication logic and screen-display part#1448TransZAllen wants to merge 4 commits intoTeamNewPipe:devfrom
Conversation
TransZAllen
commented
Jan 30, 2026
- [ √ ] I carefully read the contribution guidelines and agree to them.
- [ √ ] I have tested the API against NewPipe.
- [ √ ] I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API.
… URL parameters. - Add `V`, `LANG`, `TLANG` constants to `YoutubeParsingHelper` - Implement `extractVideoId()`, `extractLanguageCode()`, `extractTranslationCode()` - Add `extractQueryParam()` utility in `Utils.java`
- Add core deduplicated logic/method - Reproduce bug with the YouTube video: https://www.youtube.com/watch?v=b7vmW_5HSpE - Introduce `SubtitleDeduplicator.java` to check and remove duplicates, storing results in cache. - Add `SubtitleOrigin` and `SubtitleState` enums to model subtitle type and state. - Ensure cache directory is recreated if missing.
…ntegrate deduplicated subtitles, calling `checkAndDeduplicate()` to remove duplicates and store results in cache.
…ubtitleDeduplicatorTest.java`.
Related issueScope of changesThis PR involves two repositories:
Reproduction caseAndroid device, duplicated subtitles visible during playback YouTube video used for testing: Subtitle cache locationCached subtitle files ( The directory name corresponds to Cache file namingCached subtitle filenames are intentionally descriptive, Cache lifecycle & storage impactDo cached subtitle files need to be deleted?No.
Why keep cached subtitles?
Storage considerations
Unit testsTests focus on the core deduplication logic: Why SubtitleDeduplicator operates on raw TTML text
This design is intended to be practical and simple. At this stage, the goal is only to detect obviously Difference from
|
|
The fix has been tested with a YouTube video link: https://www.youtube.com/watch?v=b7vmW_5HSpE Before the fix, the subtitle is shown as follows: After applying the fix, the subtitle is displayed as follows: |
AudricV
left a comment
There was a problem hiding this comment.
I think we don't want NewPipe Extractor to download files directly, so your approach must be changed, especially as you do not delete files. Also, I would avoid downloading each subtitle to avoid reaching rate limits.
The extractor is not an Android library, therefore Android specific comments should be removed.
If YouTube provides incorrect subtitles, this should be not to the extractor to fix them in my opinion. It makes more sense to be fixed with a custom ExoPlayer component in the app side for me.
|
Thanks for the feedback, it’s helpful for me to better understand the intended boundaries of NewPipeExtractor. I’m preparing some follow-up comments to explain these commits, especially around subtitle downloading. I’m also taking some time to think about whether this design makes sense. I’ll add more comments soon. |
|
About
: Just to make sure I understand correctly: currently, the extractor only provides My original idea was to fix duplicated subtitles as early as possible and However, I now realize that my changes effectively moved the subtitle downloading At first, I thought this was acceptable since subtitles are eventually downloaded So, performing file downloads inside NewPipeExtractor crosses its intended boundary, right? |
|




