LRriver · LRriver · May 31, 2026 · Apr 2, 2026 · Apr 5, 2026 · Apr 8, 2026
diff --git a/.gitignore b/.gitignore
@@ -18,6 +18,7 @@ node_modules/
 # 输出目录
 output/
 test_output/
+.playwright-cli/
 temp_buid_note/
 
 # 生成的文件
@@ -27,6 +28,16 @@ temp_buid_note/
 *.pdf
 test_prompts.json
 
+# README 演示素材需要随仓库保留，方便 clone 后直接查看效果
+!docs/
+!docs/assets/
+!docs/assets/*.png
+!docs/assets/*.jpg
+!docs/assets/*.jpeg
+!docs/assets/*.gif
+!docs/assets/*.mp4
+!docs/assets/*.webm
+
 # IDE
 .vscode/
 .idea/
@@ -43,4 +54,5 @@ Thumbs.db
 
 # 本地配置文件
 config.yaml
+.codex/
 .kiro/
diff --git a/README.md b/README.md
@@ -4,12 +4,20 @@
 
 复刻 NotebookLM 的 AI PPT 功能，将论文、文档等资料自动转换为精美的 PPT 图片。
 
+<video src="docs/assets/aippt-demo.webm" poster="docs/assets/aippt-demo-poster.png" controls muted playsinline width="100%"></video>
+
+[查看演示视频](docs/assets/aippt-demo.webm)
+
+演示视频覆盖上传 `doc/L9.md`、填写用户要求、生成并编辑设计大纲、确认逐页设计、生成 6 页 PPT、单页编辑、确认替换、导出 PDF/PPTX；模型等待阶段已做快进剪辑。
+
 ## ✨ 功能特性
 
-- 🎨 **AI 图片生成**：使用 AI 模型将文档内容转换为精美的 PPT 图片
-- 🌐 **WebUI 界面**：提供友好的 Web 界面，支持文件上传、实时预览、编辑和导出
-- 📝 **多格式支持**：支持 Markdown 文档输入，PDF/PPTX 格式导出
-- ✏️ **图生图编辑**：支持对生成的幻灯片进行二次编辑修改
+- 🎨 **逐页图片生成**：先生成可编辑设计大纲和逐页设计，再使用 AI 模型转换为 PPT 页面图片
+- 🌐 **PPT 工作台**：支持资料上传、模型配置、当前页大预览、缩略图列表、编辑历史和导出
+- 📝 **多格式解析**：支持 `.md/.txt/.pdf/.docx/.pptx` 输入，统一转 Markdown 后生成
+- ✏️ **整页图像编辑**：支持对每页幻灯片单独二次编辑、历史回退和确认替换
+- 🔀 **三模型角色**：支持 `prompt_model`、`image_model`、`edit_model` 分别配置
+- 🖼️ **图像结果兼容**：兼容 URL、Markdown 图片链接、data URL、`b64_json` 和纯 base64
 - 💾 **状态持久化**：自动保存工作进度，支持会话恢复
 
 ## 🚀 快速开始
@@ -56,23 +64,26 @@ cd web && npm install && npm run dev
 pip install -r requirements.txt
 
 # 基础用法
-python main.py -i doc/sample_paper.txt -n 5
+python main.py -i doc/L9.md -n 5
 
 # 仅生成 Prompt
-python main.py -i doc/sample_paper.txt -n 5 --prompt-only -o prompts.json
+python main.py -i doc/L9.md -n 5 --prompt-only -o prompts.json
 
 # 从 Prompt 文件生成
 python main.py --from-prompt prompts.json
 ```
 
 ### 3. WebUI 使用流程
 
-1. **上传文档**：在左侧面板拖拽或点击上传 Markdown 文件
-2. **配置 API**：在中间面板填写 API Key 和 Base URL
-3. **设置参数**：选择页数、清晰度、比例等生成参数
-4. **生成 PPT**：点击"开始生成"按钮，实时查看生成进度
-5. **预览编辑**：在右侧面板预览生成的幻灯片，点击可进行编辑
-6. **导出文件**：选择 PDF 或 PPTX 格式导出
+1. **上传文档**：在左侧面板拖拽或点击上传资料文件
+2. **配置模型**：在中间面板配置文本、生图和编辑模型
+3. **设置参数与要求**：选择页数、清晰度、比例、语言、风格、受众，并填写用户定制要求
+4. **确认设计**：先生成设计大纲，用户可编辑后确认，再生成逐页设计预览
+5. **生成 PPT**：确认逐页设计后生成 PPT 图片，实时查看进度
+6. **预览编辑**：在右侧面板预览生成的幻灯片，点击可进行单页编辑
+7. **导出文件**：选择 PDF 或 PPTX 格式导出
+
+仓库内置演示资料为 `doc/L9.md`。该路径是仓库相对路径，clone 后可直接用于 WebUI 上传或命令行示例。
 
 ## 📁 项目结构
 
@@ -83,6 +94,7 @@ OpenNotebookLM-AIPPT/
 ├── web/                    # React 前端
 ├── tests/                  # 测试
 ├── doc/                    # 输入文档目录
+│   └── L9.md               # 默认演示资料
 ├── config.yaml             # 配置文件
 ├── start.sh                # 一键启动脚本
 └── main.py                 # 命令行入口
@@ -91,7 +103,7 @@ OpenNotebookLM-AIPPT/
 ## ⚙️ 配置说明
 
 所有配置统一在 `config.yaml` 中管理，包括：
-- API 配置（图像生成、文本生成）
+- API 配置（文本 prompt、生图、编辑三角色模型）
 - PPT 默认配置（语言、风格、页数）
 - 超时和重试配置
 
@@ -101,11 +113,22 @@ OpenNotebookLM-AIPPT/
 
 ```yaml
 api:
-  text:
-    format: "openai"
-    model: "gpt-4o"
-    base_url: "https://api.openai.com/v1"
-    api_key: "sk-xxx"
+  models:
+    prompt_model:
+      adapter: "openai_chat"
+      model: "gpt-4o"
+      base_url: "https://api.openai.com/v1"
+      api_key: "sk-xxx"
+    image_model:
+      adapter: "raw_chat_multimodal"
+      model: "gpt-image-2"
+      base_url: "https://api.example.com/v1"
+      api_key: "sk-xxx"
+    edit_model:
+      adapter: "raw_chat_multimodal"
+      model: "gpt-image-2"
+      base_url: "https://api.example.com/v1"
+      api_key: "sk-xxx"
 ```
 
 ## 📤 输出结构
@@ -121,8 +144,8 @@ output/ppt_20241201_123456/
 
 ## 📋 TODO
 
-- [ ] 🔌 兼容更多 LLM API 接口
-- [ ] 💾 兼容多种输入格式，更多导出格式
+- [ ] 支持框选局部区域编辑
+- [ ] 增加更多 provider profile 模板
 
 ## 📄 许可证
 

diff --git a/README_en.md b/README_en.md
@@ -4,12 +4,20 @@
 
 Recreate NotebookLM's AI PPT feature to automatically convert papers, documents, and other materials into beautiful PPT images.
 
+<video src="docs/assets/aippt-demo.webm" poster="docs/assets/aippt-demo-poster.png" controls muted playsinline width="100%"></video>
+
+[Watch the demo video](docs/assets/aippt-demo.webm)
+
+The demo covers uploading `doc/L9.md`, entering custom requirements, generating and editing the design outline, confirming page designs, generating a 6-slide deck, editing one slide, confirming the replacement, and exporting PDF/PPTX. Model waiting time is fast-forwarded.
+
 ## ✨ Features
 
-- 🎨 **AI Image Generation**: Use AI models to convert document content into beautiful PPT images
-- 🌐 **WebUI Interface**: User-friendly web interface with file upload, real-time preview, editing, and export
-- 📝 **Multi-format Support**: Markdown document input, PDF/PPTX format export
-- ✏️ **Image-to-Image Editing**: Support secondary editing of generated slides
+- 🎨 **Per-slide image generation**: Create an editable outline and page designs before converting them into PPT page images
+- 🌐 **PPT Workbench**: Upload sources, configure model roles, preview slides, edit pages, track history, and export
+- 📝 **Multi-format parsing**: Supports `.md/.txt/.pdf/.docx/.pptx` input and converts content to Markdown
+- ✏️ **Full-page image editing**: Edit each generated slide independently, revert history, and confirm replacements
+- 🔀 **Three model roles**: Configure `prompt_model`, `image_model`, and `edit_model` separately
+- 🖼️ **Image result compatibility**: Accepts URLs, Markdown image links, data URLs, `b64_json`, and raw base64
 - 💾 **State Persistence**: Auto-save work progress with session recovery
 
 ## 🚀 Quick Start
@@ -56,23 +64,26 @@ cd web && npm install && npm run dev
 pip install -r requirements.txt
 
 # Basic usage
-python main.py -i doc/sample_paper.txt -n 5
+python main.py -i doc/L9.md -n 5
 
 # Generate prompts only
-python main.py -i doc/sample_paper.txt -n 5 --prompt-only -o prompts.json
+python main.py -i doc/L9.md -n 5 --prompt-only -o prompts.json
 
 # Generate from prompt file
 python main.py --from-prompt prompts.json
 ```
 
 ### 3. WebUI Usage Flow
 
-1. **Upload Document**: Drag and drop or click to upload Markdown files in the left panel
-2. **Configure API**: Fill in API Key and Base URL in the center panel
-3. **Set Parameters**: Choose page count, resolution, aspect ratio, etc.
-4. **Generate PPT**: Click "Start Generation" button and watch real-time progress
-5. **Preview & Edit**: Preview generated slides in the right panel, click to edit
-6. **Export**: Choose PDF or PPTX format to export
+1. **Upload Document**: Drag and drop or click to upload a source file in the left panel
+2. **Configure Models**: Configure text, image generation, and image editing model roles
+3. **Set Parameters & Requirements**: Choose page count, resolution, aspect ratio, language, style, audience, and custom requirements
+4. **Confirm Design**: Generate an editable outline, confirm it, then review the generated page designs
+5. **Generate PPT**: Generate slide images after page-design confirmation and watch real-time progress
+6. **Preview & Edit**: Preview generated slides in the right panel and edit a single page when needed
+7. **Export**: Export to PDF or PPTX
+
+The built-in demo source is `doc/L9.md`. This is a repository-relative path, so a fresh clone can use it directly in the WebUI or CLI examples.
 
 ## 📁 Project Structure
 
@@ -83,6 +94,7 @@ OpenNotebookLM-AIPPT/
 ├── web/                    # React frontend
 ├── tests/                  # Tests
 ├── doc/                    # Input documents directory
+│   └── L9.md               # Default demo source
 ├── config.yaml             # Configuration file
 ├── start.sh                # One-click startup script
 └── main.py                 # CLI entry point
@@ -91,7 +103,7 @@ OpenNotebookLM-AIPPT/
 ## ⚙️ Configuration
 
 All configurations are managed in `config.yaml`, including:
-- API configuration (image generation, text generation)
+- API configuration (`prompt_model`, `image_model`, `edit_model`)
 - PPT default settings (language, style, page count)
 - Timeout and retry settings
 
@@ -101,11 +113,22 @@ See `config.example.yaml` for detailed configuration examples.
 
 ```yaml
 api:
-  text:
-    format: "openai"
-    model: "gpt-4o"
-    base_url: "https://api.openai.com/v1"
-    api_key: "sk-xxx"
+  models:
+    prompt_model:
+      adapter: "openai_chat"
+      model: "gpt-4o"
+      base_url: "https://api.openai.com/v1"
+      api_key: "sk-xxx"
+    image_model:
+      adapter: "raw_chat_multimodal"
+      model: "gpt-image-2"
+      base_url: "https://api.example.com/v1"
+      api_key: "sk-xxx"
+    edit_model:
+      adapter: "raw_chat_multimodal"
+      model: "gpt-image-2"
+      base_url: "https://api.example.com/v1"
+      api_key: "sk-xxx"
 ```
 
 ## 📤 Output Structure
@@ -121,8 +144,8 @@ output/ppt_20241201_123456/
 
 ## 📋 TODO
 
-- [ ] 🔌 Support more LLM API interfaces
-- [ ] 💾 Support multiple input formats and export formats
+- [ ] Support region selection for partial slide editing
+- [ ] Add more provider profile templates
 
 ## 📄 License
 

diff --git a/api/README.md b/api/README.md
@@ -4,10 +4,11 @@ FastAPI 后端服务，为 WebUI 前端提供 PPT 生成、编辑和导出功能
 
 ## 功能特性
 
-- **文件上传**: 支持上传 Markdown 文件作为输入材料
+- **文件上传**: 支持 `.md/.txt/.pdf/.docx/.pptx`，统一解析为 Markdown
 - **PPT 生成**: 基于输入材料生成 PPT 幻灯片，支持流式返回进度
 - **图生图编辑**: 对单页幻灯片进行修改
 - **导出功能**: 支持导出为 PDF 和 PPTX 格式
+- **模型路由**: 支持 prompt/image/edit 三角色模型 profile
 
 ## API 端点
 
@@ -22,7 +23,7 @@ POST /api/upload
 Content-Type: multipart/form-data
 
 参数:
-- file: Markdown 文件 (.md)
+- file: 文档文件 (.md/.txt/.pdf/.docx/.pptx)
 
 响应:
 {
@@ -42,8 +43,20 @@ Content-Type: application/json
 {
   "content": "Markdown 内容",
   "config": {
-    "api_key": "your-api-key",
-    "base_url": "https://api.example.com",
+    "model_profiles": {
+      "prompt_model": {
+        "model": "DeepSeek-V4-Pro",
+        "base_url": "https://api.example.com/v1",
+        "api_key": "your-text-key",
+        "adapter": "openai_chat"
+      },
+      "image_model": {
+        "model": "gpt-image-2",
+        "base_url": "https://api.example.com/v1",
+        "api_key": "your-image-key",
+        "adapter": "raw_chat_multimodal"
+      }
+    },
     "page_count": 10,
     "quality": "1K",
     "aspect_ratio": "16:9"
@@ -67,8 +80,11 @@ Content-Type: application/json
   "image_base64": "base64编码的图片",
   "instruction": "修改指令",
   "config": {
-    "api_key": "your-api-key",
-    "base_url": "https://api.example.com",
+    "model_profiles": {
+      "prompt_model": {"model": "DeepSeek-V4-Pro", "base_url": "https://api.example.com/v1", "api_key": "key", "adapter": "openai_chat"},
+      "image_model": {"model": "gpt-image-2", "base_url": "https://api.example.com/v1", "api_key": "key", "adapter": "raw_chat_multimodal"},
+      "edit_model": {"model": "gpt-image-2", "base_url": "https://api.example.com/v1", "api_key": "key", "adapter": "raw_chat_multimodal"}
+    },
     "quality": "1K",
     "aspect_ratio": "16:9"
   }
@@ -92,7 +108,8 @@ Content-Type: application/json
   "slides": [
     {"image_base64": "base64编码的图片"}
   ],
-  "format": "pdf" | "pptx"
+  "format": "pdf" | "pptx",
+  "aspect_ratio": "16:9" | "4:3"
 }
 
 响应: 文件下载
@@ -138,6 +155,7 @@ api/
 │   ├── upload.py        # 文件上传路由
 │   ├── generate.py      # PPT 生成路由
 │   ├── edit.py          # 图生图编辑路由
+│   ├── models.py        # 模型 profile 路由
 │   └── export.py        # 导出路由
 └── README.md
 ```
@@ -146,8 +164,9 @@ api/
 
 API 服务器直接使用项目中现有的 Python 模块：
 
-- `src.generator.PPTGenerator` - PPT 生成器
-- `src.client.AIClient` - AI 客户端
+- `src.model_router.ModelRouter` - 三角色模型路由
+- `src.image_result.ImageResultNormalizer` - URL/base64 图片结果规范化
+- `src.document_parser.DocumentParser` - 文档解析
 - `src.exporter.PDFExporter` - PDF 导出器
 - `src.config` - 配置管理
 

diff --git a/api/main.py b/api/main.py
@@ -9,7 +9,7 @@
 from pathlib import Path
 
 # 导入路由
-from .routes import upload, generate, edit, export
+from .routes import upload, generate, edit, export, models
 
 # 创建 FastAPI 应用
 app = FastAPI(
@@ -39,6 +39,7 @@ async def health_check():
 app.include_router(generate.router)
 app.include_router(edit.router)
 app.include_router(export.router)
+app.include_router(models.router)
 
 # 静态文件服务（用于前端）
 # 检查 web/dist 目录是否存在，必须放在最后