Open
Conversation
- 添加视频字幕翻译开关(默认开启) - 在 content.ts 中初始化视频字幕翻译模块 - 在 Config 中新增 enableVideoSubtitle 配置项 - 通过 manifest content_scripts 注入 MAIN world 脚本以绕过 YouTube CSP 限制 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
安全修复(XSS): - overlay.ts:用 createElement + textContent 替换 innerHTML,彻底消除 XSS 风险 - manager.ts:buildBtnSvg 改为返回 SVGElement(createElementNS),不再使用 innerHTML 性能优化: - overlay.ts:findCue 由线性扫描改为二分搜索,大幅降低长视频每帧开销 注入范围收窄: - wxt.config.ts:MAIN world 脚本的 matches 从 <all_urls> 缩小到已支持的具体平台域名 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
关闭翻译时调用 restoreNativeSubtitle(),开启时调用 hideNativeSubtitle(), 避免用户关闭翻译后原生字幕仍处于 display:none 导致无字幕可看。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mount() 修改 static 元素为 position:relative 前,先保存其原始内联 position 值;cleanup() 时还原,避免残留样式影响宿主页面布局。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- manager.ts:批量翻译改用 [N] 编号标记替代 ⌿ 分隔符, 避免翻译 API 破坏分隔符导致拆分错位或翻译缺失 - overlay.ts:findCue 支持 overlap cue, 同一时间点有多条时取最后开始的一条, 兼容 YouTube 滚动字幕(前一行残留 + 新行同时存在)的场景 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
新增 mergeSentenceGroups():将连续碎片 cue(不以句末标点结尾、 下一条首字母小写)合并为完整句子后再送翻译,译文回填到组内 所有 cue,使碎片时间段也能显示完整句子的译文。 判断逻辑: - 以 .!?。!?… 结尾 → 完整句,断开 - 下一条首字母大写 → 新句,断开 - 超过 MAX_GROUP_SIZE(6) 条 → 强制断开 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
达到 MAX_GROUP_SIZE 时若末尾 cue 仍是碎片(无句末标点), 将其"进位"到下一组开头,确保与续句合并后一起翻译。 修复前:["...but the seeds"] | ["of animosity..."] 各自翻译 修复后:["..."] | ["but the seeds" + "of animosity..."] 合并翻译 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 每批翻译前加 subtitle 专用指令,告知模型这是字幕碎片、 需结合相邻行保持语义连贯、行数必须一一对应 - 带入上一批最后一组的结尾 12 个词作为 [previous context], 让模型理解跨批边界的句子续接,避免首行翻译因缺乏前文而语序错乱 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
移除 SentenceGroup、mergeSentenceGroups 及相关常量, 每条 cue 直接对应一条译文,英中字幕严格一一对应。 跨批边界的连贯性由 [previous context] 前缀和字幕专用 指令交给模型处理,代码逻辑大幅简化。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. parser.ts:parseYouTubeXML 新增 mergeOverlappingCues(), 合并时间重叠的相邻 cue(YouTube 滚动字幕特性), 使碎片句在解析阶段就拼成完整句子 2. manager.ts:恢复 mergeSentenceGroups() 句子合并逻辑, MAX_GROUP_SIZE 由 6 调大至 8,减少强制截断频率 3. manager.ts:BATCH_SIZE 由 15 降至 5, 每批上下文更集中,翻译连贯性更好 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
移除 sentence merging,改为每批 5 条逐条翻译: - 每批在正文前后各附 CONTEXT_SIZE=2 条 [context before/after] - 提示词明确允许模型跨行借用语义处理碎片句 - 英中 cue 严格一一对应,无对齐问题 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
新增 mergeByTimeGap(),按说话停顿时长决定句子边界: - 相邻 cue 间隔 < 1500ms → 同一句,合并 - 间隔 ≥ 1500ms 或超过 19 词 → 新句,断开 相比文本特征(标点/大写),时间信号对无标点全小写的 YouTube ASR 字幕同样有效,且能正确合并跨 cue 词组 (如 "the united" + "states")。 参考:沉浸式翻译 ytAsrConfig mergeConfig 同款策略。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MERGE_GAP_MS 从 1500 降至 600ms,避免不同句子因短暂停顿 被合并成一大段导致翻译语义错乱 - MAX_WORDS 触发时若下一条仍是小间隔,将末尾 cue 进位到下一组, 防止碎片句(如 "but the seeds")被孤立翻译 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviewer's GuideImplements a new video subtitle translation pipeline for YouTube and generic VTT-based platforms by injecting a MAIN-world network hook script, parsing multiple subtitle formats, grouping and batch-translating cues, and rendering a bilingual overlay synchronized to the video timeline, with a UI toggle and configuration flag to control the feature. Sequence diagram for video subtitle interception and translation pipelinesequenceDiagram
actor User
participant YouTubePage
participant MainWorldScript as MainWorldScript_video_subtitle_inject
participant ContentScript as ContentScript
participant VideoManager as VideoSubtitleManager
participant SubtitleParser as SubtitleParser
participant Overlay as SubtitleOverlay
participant TranslateApi as TranslateApi
participant Config as ConfigStore
User->>YouTubePage: Enable native subtitles
ContentScript->>Config: Read enableVideoSubtitle
Config-->>ContentScript: enableVideoSubtitle=true
ContentScript->>VideoManager: initVideoSubtitle()
VideoManager->>MainWorldScript: window.postMessage config(patterns)
VideoManager->>VideoManager: attachMessageListener()
VideoManager->>VideoManager: watchNavigation()
Note over MainWorldScript,YouTubePage: Network interception in MAIN world
YouTubePage->>MainWorldScript: XHR or fetch to subtitle URL
MainWorldScript->>MainWorldScript: isSubtitleUrl(url)
MainWorldScript-->>YouTubePage: Proceed with request
YouTubePage-->>MainWorldScript: Response with subtitle data
MainWorldScript->>MainWorldScript: Cache lastCapture
MainWorldScript->>ContentScript: window.postMessage subtitle-captured(url,data)
ContentScript->>VideoManager: window message event
VideoManager->>VideoManager: handleSubtitleData(url,data)
VideoManager->>SubtitleParser: detectSubtitleFormat(url,data)
SubtitleParser-->>VideoManager: format
VideoManager->>SubtitleParser: parseYouTubeXML/parseYouTubeJSON3/parseVTT
SubtitleParser-->>VideoManager: SubtitleCue[]
VideoManager->>YouTubePage: findVideo()
YouTubePage-->>VideoManager: HTMLVideoElement
VideoManager->>YouTubePage: findMountTarget(video)
YouTubePage-->>VideoManager: mountTarget
VideoManager->>Overlay: mount(video,mountTarget)
VideoManager->>Overlay: setCues(cues) with original text
VideoManager->>YouTubePage: hideNativeSubtitle()
VideoManager->>YouTubePage: mountQuickButton()
loop Playback
YouTubePage->>Overlay: video.currentTime via requestAnimationFrame
Overlay->>Overlay: findCue(time) via binary search
Overlay->>Overlay: render(cue) bilingual overlay
end
loop Batch translation
VideoManager->>VideoManager: mergeByTimeGap(cues)
VideoManager->>TranslateApi: translateText(batchWithContext,document.title)
TranslateApi-->>VideoManager: translatedLines
VideoManager->>VideoManager: Fill translatedText on SentenceGroup.cues
VideoManager->>Overlay: setCues(updatedCues)
end
User->>YouTubePage: Click quick toggle button
YouTubePage->>VideoManager: Toggle subtitleEnabled
alt subtitleEnabled
VideoManager->>YouTubePage: hideNativeSubtitle()
VideoManager->>Overlay: show()
else not subtitleEnabled
VideoManager->>Overlay: hide()
VideoManager->>YouTubePage: restoreNativeSubtitle()
end
Updated class diagram for subtitle translation types and configurationclassDiagram
class Config {
+boolean on
+boolean translationStatus
+string inputBoxTranslationTrigger
+string inputBoxTranslationTarget
+boolean enableVideoSubtitle
+Config()
}
class SubtitleCue {
+number start
+number end
+string text
+string translatedText
}
class SubtitleOverlay {
-HTMLElement container
-HTMLVideoElement video
-SubtitleCue[] cues
-number rafId
-string lastCueKey
-HTMLElement mountTarget
-string originalMountPosition
+mount(video, mountTarget)
+setCues(cues)
+show()
+hide()
+cleanup()
-startLoop()
-findCue(time)
-render(cue)
}
class PlatformConfig {
+string id
+string[] matches
+string[] subtitleUrlPatterns
+string format
+string videoSelector
+string containerSelector
+string hideNativeSelector
}
class PlatformsModule {
+PlatformConfig[] platforms
+detectPlatform(hostname)
+getAllSubtitlePatterns()
}
class ParserModule {
+parseYouTubeXML(xmlText) SubtitleCue[]
+parseYouTubeJSON3(jsonText) SubtitleCue[]
+parseVTT(vttText) SubtitleCue[]
+detectSubtitleFormat(url, data) string
-mergeOverlappingCues(cues) SubtitleCue[]
-vttTimeToSeconds(t) number
-stripVttTags(text) string
-decodeEntities(text) string
}
class VideoSubtitleManager {
-SubtitleOverlay overlay
-boolean listenerAttached
-string processingUrl
-boolean subtitleEnabled
+initVideoSubtitle()
-sendConfig()
-attachMessageListener()
-handleSubtitleData(url, rawData)
-mergeByTimeGap(cues) SentenceGroup[]
-translateCuesBatched(cues, onProgress)
-mountQuickButton()
-buildBtnSvg(active) SVGElement
-waitForElement(selector, callback, maxMs)
-findVideo() HTMLVideoElement
-findMountTarget(video) HTMLElement
-hideNativeSubtitle()
-restoreNativeSubtitle()
-watchNavigation()
}
class SentenceGroup {
+SubtitleCue[] cues
+string text
}
Config <.. VideoSubtitleManager : reads
SubtitleCue <.. SentenceGroup : element
SubtitleOverlay o-- SubtitleCue : renders
VideoSubtitleManager o-- SubtitleOverlay : owns
VideoSubtitleManager ..> ParserModule : uses
VideoSubtitleManager ..> PlatformsModule : uses
PlatformsModule o-- PlatformConfig : aggregates
ParserModule o-- SubtitleCue : creates
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The YouTube-specific behavior is scattered (e.g., hostname checks in both
initVideoSubtitleandwatchNavigationplus CSS selectors inplatforms.ts); consider centralizing platform-specific logic (including whether to mount the quick button) into thePlatformConfigso that adding/removing platforms doesn’t require touching multiple files. - In
SubtitleOverlay.mount, you callcleanup()after assigningthis.videobut beforethis.mountTarget, andcleanup()removes#fr-subtitle-overlayglobally; if multiple overlays are ever mounted (e.g., multiple videos or future features), this global removal could become surprising—consider scoping cleanup to the instance’s container instead of usingdocument.getElementById.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The YouTube-specific behavior is scattered (e.g., hostname checks in both `initVideoSubtitle` and `watchNavigation` plus CSS selectors in `platforms.ts`); consider centralizing platform-specific logic (including whether to mount the quick button) into the `PlatformConfig` so that adding/removing platforms doesn’t require touching multiple files.
- In `SubtitleOverlay.mount`, you call `cleanup()` after assigning `this.video` but before `this.mountTarget`, and `cleanup()` removes `#fr-subtitle-overlay` globally; if multiple overlays are ever mounted (e.g., multiple videos or future features), this global removal could become surprising—consider scoping cleanup to the instance’s container instead of using `document.getElementById`.
## Individual Comments
### Comment 1
<location path="entrypoints/video/manager.ts" line_range="290-295" />
<code_context>
+ return (video.parentElement as HTMLElement) || document.body
+}
+
+function hideNativeSubtitle() {
+ const platform = detectPlatform(window.location.hostname)
+ if (!platform.hideNativeSelector) return
+ // 用 display:none 彻底隐藏,visibility:hidden 仍占位且有时被 YouTube 重置
+ document.querySelectorAll<HTMLElement>(platform.hideNativeSelector)
+ .forEach(el => el.style.setProperty('display', 'none', 'important'))
+}
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Hiding native subtitles overwrites `display` without preserving original values, which can interfere with site styling.
Because `hideNativeSubtitle` sets `display: none !important` and the corresponding restore logic only calls `removeProperty('display')`, any pre-existing inline `display` value is lost after the first toggle, which can change how the page lays out those elements.
Consider storing the previous inline `display` value (e.g. in a `data-` attribute) before overriding it, and restoring from that when subtitles are re-enabled, so the page’s original layout is preserved.
</issue_to_address>
### Comment 2
<location path="entrypoints/video/overlay.ts" line_range="67" />
<code_context>
+ cancelAnimationFrame(this.rafId)
+ this.rafId = null
+ }
+ document.getElementById(OVERLAY_ID)?.remove()
+ if (this.mountTarget !== undefined && this.originalMountPosition !== undefined) {
+ this.mountTarget.style.position = this.originalMountPosition
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Cleanup removes any element with the overlay ID globally instead of just this instance’s container.
Using `document.getElementById(OVERLAY_ID)?.remove()` works now but risks removing an unrelated element if the host page reuses that ID, and it’s not tied to this instance. Calling `this.container?.remove()` would scope cleanup to the overlay created by this instance and better support multiple overlays or non-document mount points.
```suggestion
this.container?.remove()
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- manager.ts:hideNativeSubtitle 隐藏前将原始内联 display 存入 data-fr-orig-display,restoreNativeSubtitle 从中还原, 避免覆盖宿主页面预设的 display 样式 - overlay.ts:cleanup 改用 this.container?.remove() 替代 document.getElementById,避免误删宿主页面中同名元素 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
|
@Bistutu Hi, I've added support for YouTube subtitle translations. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
功能概述
新增视频字幕翻译功能,在 YouTube 等平台开启原生字幕后,
自动拦截字幕数据、翻译并以双语形式叠加显示在视频上。
新增文件
entrypoints/video/manager.ts— 核心调度:拦截消息、解析、翻译、渲染entrypoints/video/overlay.ts— 字幕浮层渲染,RAF 时间轴同步 + 二分搜索entrypoints/video/parser.ts— 支持 YouTube XML、JSON3、WebVTT 三种格式entrypoints/video/platforms.ts— 平台配置(域名、字幕 URL 规则)public/video-subtitle-inject.js— MAIN world 注入脚本,Hook XHR/fetch 拦截字幕请求修改文件
components/Main.vue— 新增视频字幕翻译开关(默认开启)entrypoints/content.ts— 初始化字幕翻译模块entrypoints/utils/model.ts— 新增enableVideoSubtitle配置项wxt.config.ts— 注册 MAIN world content script,限定支持平台域名技术方案
字幕拦截
通过 MAIN world content script 在
document_start注入,绕过页面 CSP 限制,Hook XHR/fetch 拦截字幕请求,通过
postMessage传递给 content script。翻译质量
YouTube ASR 字幕无标点、全小写,使用时间间隔(< 600ms)作为断句信号,
将连续的碎片 cue 合并为完整句子组后翻译,解决"united states"等跨 cue 短语被切断的问题。
同时引入 carry-over 机制,防止句尾碎片(如"but the seeds")因词数上限而被孤立。
翻译调用
每批 5 组,附带前文上下文,字幕专用提示词告知模型允许跨行借用语义,
译文回填到组内所有 cue,组内 cue 共享同一译文。
渲染
requestAnimationFrame驱动时间轴同步,二分搜索定位当前 cue,使用
createElement+textContent安全写入 DOM。YouTube 播放器工具栏注入快捷开关按钮,支持实时切换显示/隐藏。
安全
createElement+textContent,消除 XSS 风险createElementNS构建,移除innerHTMLSummary by Sourcery
Add a video subtitle translation pipeline for supported video platforms and wire it into the extension configuration and content scripts.
New Features:
Enhancements: