🎬 AI 视频本地化平台

一个端到端的 AI 驱动视频翻译与本地化平台，支持将视频内容自动翻译成多种语言并生成配音。

✨ 核心特性

功能	描述
🎙️ 语音识别	WhisperX 高精度 ASR，支持多语言、字级时间戳
🎼 音源分离	Demucs 分离人声/背景音/音乐
📝 字幕处理	自动生成、清洗、翻译字幕（Dify 工作流）
🗣️ 语音合成	OpenVoice/MeloTTS 声音克隆 TTS
🎬 视频合成	FFmpeg 音视频混合与封装
🖥️ 图形界面	PyQt6 交互式管线控制
⚡ GPU 调度	智能显存管理、模型共享、优先级队列

技术亮点

交互式管线：支持步骤确认、重试、跳过，可人工干预处理流程
GPU 智能调度：自动管理显存、模型共享池、动态线程池
信号总线：解耦的事件驱动架构，支持跨线程通信
热重载配置：运行时修改配置无需重启

支持语言

系统要求

要求	最低配置	推荐配置
Python	3.11.x	3.11.0
GPU	NVIDIA 6GB+	NVIDIA 8GB+
CUDA	11.8+	12.x
RAM	16GB	32GB
硬盘	50GB	100GB SSD
系统	Windows 10/Linux/macOS	Windows 11/Ubuntu 22.04

⚠️ 重要：本项目使用 Python 3.11，不支持 3.12+（OpenVoice/MeloTTS 兼容性）

项目状态

阶段	模块	状态
P0	核心基础设施	✅ 100%
P1	底层服务模块	✅ 100%
P2.1	人声分离 (Demucs)	✅ 100%
P2.2	ASR 服务 (WhisperX)	✅ 100%
P2.3	TTS 服务 (OpenVoice/MeloTTS)	✅ 100%
P2.4	GPU 调度器	✅ 100%
P2.5	信号总线	✅ 100%
P3.1	管线框架	✅ 100%
P3.2	PyQt6 GUI	✅ 100%

测试覆盖率: 86%+

快速开始

1. 克隆项目

git clone <repository-url>
cd video-localization-platform

2. 创建虚拟环境

# Windows
py -3.11 -m venv venv
venv\Scripts\activate

# Linux/macOS
python3.11 -m venv venv
source venv/bin/activate

3. 安装依赖

# 基础依赖
pip install -r requirements.txt

# TTS 依赖（需要 C++ 编译器）
pip install git+https://github.com/myshell-ai/OpenVoice.git
pip install git+https://github.com/myshell-ai/MeloTTS.git

4. 安装系统依赖

# Windows
winget install FFmpeg

# Linux
sudo apt install ffmpeg libsndfile1 sox

# macOS
brew install ffmpeg sox

5. 配置环境

# 复制配置模板
cp .env.example .env

# 编辑 .env，配置 Dify API Key
DIFY_SUBTITLE_CLEAN_API_KEY=app-xxx
DIFY_TRANSLATION_API_KEY=app-xxx

6. 创建目录

mkdir -p data/input data/output data/temp models logs

7. 启动应用

# GUI 模式
python main.py

# 或命令行模式（待实现）
# python -m src.cli process video.mp4 --target-lang en

项目架构

├── src/
│   ├── config/                     # 配置管理
│   │   └── settings.py             # Pydantic Settings
│   ├── core/
│   │   └── signal_bus/             # 信号总线（发布/订阅）
│   │       ├── bus.py              # 核心总线
│   │       ├── signals.py          # 信号定义
│   │       └── mixins.py           # Listener/Emitter Mixin
│   ├── GpuScheduler/               # GPU 调度器
│   │   ├── gpu_scheduler.py        # 核心调度器
│   │   ├── pytorch_task.py         # PyTorch 任务
│   │   └── model_pool.py           # 模型共享池
│   ├── Gui/                        # PyQt6 图形界面
│   │   ├── app.py                  # 应用入口
│   │   ├── pipeline_window.py      # 主窗口
│   │   ├── controllers/            # MVC 控制器
│   │   └── widgets/                # UI 组件
│   ├── models/                     # 数据模型
│   │   ├── task.py                 # Task 主数据容器
│   │   ├── enums.py                # 枚举类型
│   │   └── subtitle.py             # 字幕模型
│   ├── Pipeline/                   # 管线框架
│   │   ├── core/                   # 管线核心
│   │   │   ├── interactive_pipeline.py  # 交互式管线基类
│   │   │   ├── pipeline_step.py         # 步骤基类
│   │   │   └── concurrent_step.py       # 并发步骤基类
│   │   ├── pipelines/              # 管线实现 ⭐ 可编辑
│   │   │   └── _pipeline_template.py    # 管线模板
│   │   └── steps/                  # 步骤实现 ⭐ 可编辑
│   │       ├── _step_template.py        # 步骤模板
│   │       └── _concurrent_step_template.py
│   ├── Services/                   # AI 服务
│   │   ├── asr_service.py          # WhisperX 语音识别
│   │   ├── vocal_separator.py      # Demucs 人声分离
│   │   ├── voice_cloning_tts.py    # OpenVoice/MeloTTS
│   │   ├── audio_denoiser.py       # AI 音频降噪
│   │   ├── speaker_diarization.py  # 说话人分离
│   │   ├── singing_detection.py    # 歌唱检测
│   │   ├── audio_extractor.py      # FFmpeg 音频提取
│   │   └── dify_client.py          # Dify API 客户端
│   └── utils/                      # 工具函数
├── tests/                          # 测试
│   ├── unit/                       # 单元测试
│   ├── integration/                # 集成测试
│   └── ml/                         # ML 服务测试
├── data/
│   ├── input/                      # 输入视频
│   ├── output/                     # 输出结果
│   └── temp/                       # 临时文件
├── models/                         # AI 模型文件
├── logs/                           # 日志
├── docs/                           # 文档
└── config.yaml                     # 配置文件

处理流程

输入视频
    ↓
┌─────────────────────────────────────────┐
│  1. 音频提取 (FFmpeg)                    │
│  2. 人声分离 (Demucs) → 人声 + 背景音    │
│  3. 语音识别 (WhisperX) → 字幕 + 时间戳  │
│  4. 字幕清洗 (Dify) → 优化断句           │
│  5. 字幕翻译 (Dify) → 多语言字幕         │
│  6. TTS 配音 (OpenVoice) → 克隆声音      │
│  7. 音频混音 (FFmpeg) → 配音 + 背景音    │
│  8. 视频合成 (FFmpeg) → 最终视频         │
└─────────────────────────────────────────┘
    ↓
输出视频 (多语言版本)

开发指南

运行测试

# 单元测试
pytest tests/unit -v --cov=src

# ML 服务测试
pytest tests/ml -v -s

# 集成测试
pytest tests/integration -v -m integration

代码规范

# 格式化
black src/ tests/
isort src/ tests/

# 类型检查
mypy src/

# 代码检查
flake8 src/ tests/

创建新步骤

参考模板文件：

src/Pipeline/steps/_step_template.py - 基础步骤模板
src/Pipeline/steps/_concurrent_step_template.py - 并发步骤模板

from src.Pipeline.core import PipelineStep, PipelineStepMixin

class MyStep(PipelineStepMixin, PipelineStep):
    step_id = "my_step"
    step_name = "我的步骤"
    depends_on = ["audio_extraction"]
    
    async def execute(self, task, config, **kwargs):
        self.setup(task, config)
        
        # 获取依赖数据
        audio_path = self.inputs["audio_extraction"]["audio_path"]
        
        # 处理逻辑
        result = await process(audio_path)
        
        # 保存结果
        self.save_step_result(data={"output": result})
        return task

创建新管线

参考模板：src/Pipeline/pipelines/_pipeline_template.py

from src.Pipeline.core import InteractivePipeline

class MyPipeline(InteractivePipeline):
    pipeline_name = "我的管线"
    
    def register_steps(self):
        return [
            AudioExtractionStep(),
            MyStep(),
            OutputStep(),
        ]

配置说明

config.yaml

app:
  name: "视频本地化平台"
  debug: false

gpu:
  max_memory_percent: 0.8    # 最大显存使用率
  model_cache_size: 3        # 模型缓存数量

pipeline:
  auto_confirm: false        # 自动确认步骤
  max_retries: 3             # 最大重试次数

dify:
  base_url: "https://api.dify.ai/v1"
  timeout: 300

环境变量 (.env)

# Dify API
DIFY_SUBTITLE_CLEAN_API_KEY=app-xxx
DIFY_TRANSLATION_API_KEY=app-xxx

# GPU
CUDA_VISIBLE_DEVICES=0

# Hugging Face（可选，加速下载）
HF_ENDPOINT=https://hf-mirror.com

常见问题

GPU 相关

Q: 找不到 CUDA/GPU

# 检查 CUDA
nvidia-smi

# 检查 PyTorch
python -c "import torch; print(torch.cuda.is_available())"

Q: 显存不足

# config.yaml - 减少并发
gpu:
  max_memory_percent: 0.6

依赖相关

Q: TTS 安装失败

确保使用 Python 3.11
安装 Visual Studio Build Tools (Windows)
详见 docs/TTS依赖安装说明.md

Q: FFmpeg 未找到

ffmpeg -version  # 测试
# Windows: 添加到 PATH
# Linux: sudo apt install ffmpeg

模型下载慢

# 使用国内镜像
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
export HF_ENDPOINT=https://hf-mirror.com

更新日志

v2.0.0 (2026-01-24)

✅ 新增 GPU 智能调度器（模型共享、优先级队列、动态线程池）
✅ 新增信号总线事件系统
✅ 新增交互式管线框架（支持确认/重试/跳过）
✅ 新增 PyQt6 图形界面
✅ 重构项目架构（MVC 模式）
✅ 新增步骤/管线模板

v1.0.0 (2025-12-29)

✅ 初始版本
✅ 支持多语言翻译
✅ Dify 工作流集成
✅ 本地 Python 环境运行

许可证

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github		.github
data		data
docs		docs
logs		logs
models		models
scripts		scripts
src		src
.flake8		.flake8
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
README_PIPELINE.md		README_PIPELINE.md
config.yaml		config.yaml
main.py		main.py
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt
start_api.bat		start_api.bat
test_output.log		test_output.log

Folders and files

Latest commit

History

Repository files navigation

🎬 AI 视频本地化平台

✨ 核心特性

技术亮点

支持语言

系统要求

项目状态

快速开始

1. 克隆项目

2. 创建虚拟环境

3. 安装依赖

4. 安装系统依赖

5. 配置环境

6. 创建目录

7. 启动应用

项目架构

处理流程

开发指南

运行测试

代码规范

创建新步骤

创建新管线

配置说明

config.yaml

环境变量 (.env)

常见问题

GPU 相关

依赖相关

模型下载慢

更新日志

v2.0.0 (2026-01-24)

v1.0.0 (2025-12-29)

许可证

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages