Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ node_modules/
# 输出目录
output/
test_output/
.playwright-cli/
temp_buid_note/

# 生成的文件
Expand All @@ -27,6 +28,16 @@ temp_buid_note/
*.pdf
test_prompts.json

# README 演示素材需要随仓库保留,方便 clone 后直接查看效果
!docs/
!docs/assets/
!docs/assets/*.png
!docs/assets/*.jpg
!docs/assets/*.jpeg
!docs/assets/*.gif
!docs/assets/*.mp4
!docs/assets/*.webm

# IDE
.vscode/
.idea/
Expand All @@ -43,4 +54,5 @@ Thumbs.db

# 本地配置文件
config.yaml
.codex/
.kiro/
63 changes: 43 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,20 @@

复刻 NotebookLM 的 AI PPT 功能,将论文、文档等资料自动转换为精美的 PPT 图片。

<video src="docs/assets/aippt-demo.webm" poster="docs/assets/aippt-demo-poster.png" controls muted playsinline width="100%"></video>

[查看演示视频](docs/assets/aippt-demo.webm)

演示视频覆盖上传 `doc/L9.md`、填写用户要求、生成并编辑设计大纲、确认逐页设计、生成 6 页 PPT、单页编辑、确认替换、导出 PDF/PPTX;模型等待阶段已做快进剪辑。

## ✨ 功能特性

- 🎨 **AI 图片生成**:使用 AI 模型将文档内容转换为精美的 PPT 图片
- 🌐 **WebUI 界面**:提供友好的 Web 界面,支持文件上传、实时预览、编辑和导出
- 📝 **多格式支持**:支持 Markdown 文档输入,PDF/PPTX 格式导出
- ✏️ **图生图编辑**:支持对生成的幻灯片进行二次编辑修改
- 🎨 **逐页图片生成**:先生成可编辑设计大纲和逐页设计,再使用 AI 模型转换为 PPT 页面图片
- 🌐 **PPT 工作台**:支持资料上传、模型配置、当前页大预览、缩略图列表、编辑历史和导出
- 📝 **多格式解析**:支持 `.md/.txt/.pdf/.docx/.pptx` 输入,统一转 Markdown 后生成
- ✏️ **整页图像编辑**:支持对每页幻灯片单独二次编辑、历史回退和确认替换
- 🔀 **三模型角色**:支持 `prompt_model`、`image_model`、`edit_model` 分别配置
- 🖼️ **图像结果兼容**:兼容 URL、Markdown 图片链接、data URL、`b64_json` 和纯 base64
- 💾 **状态持久化**:自动保存工作进度,支持会话恢复

## 🚀 快速开始
Expand Down Expand Up @@ -56,23 +64,26 @@ cd web && npm install && npm run dev
pip install -r requirements.txt

# 基础用法
python main.py -i doc/sample_paper.txt -n 5
python main.py -i doc/L9.md -n 5

# 仅生成 Prompt
python main.py -i doc/sample_paper.txt -n 5 --prompt-only -o prompts.json
python main.py -i doc/L9.md -n 5 --prompt-only -o prompts.json

# 从 Prompt 文件生成
python main.py --from-prompt prompts.json
```

### 3. WebUI 使用流程

1. **上传文档**:在左侧面板拖拽或点击上传 Markdown 文件
2. **配置 API**:在中间面板填写 API Key 和 Base URL
3. **设置参数**:选择页数、清晰度、比例等生成参数
4. **生成 PPT**:点击"开始生成"按钮,实时查看生成进度
5. **预览编辑**:在右侧面板预览生成的幻灯片,点击可进行编辑
6. **导出文件**:选择 PDF 或 PPTX 格式导出
1. **上传文档**:在左侧面板拖拽或点击上传资料文件
2. **配置模型**:在中间面板配置文本、生图和编辑模型
3. **设置参数与要求**:选择页数、清晰度、比例、语言、风格、受众,并填写用户定制要求
4. **确认设计**:先生成设计大纲,用户可编辑后确认,再生成逐页设计预览
5. **生成 PPT**:确认逐页设计后生成 PPT 图片,实时查看进度
6. **预览编辑**:在右侧面板预览生成的幻灯片,点击可进行单页编辑
7. **导出文件**:选择 PDF 或 PPTX 格式导出

仓库内置演示资料为 `doc/L9.md`。该路径是仓库相对路径,clone 后可直接用于 WebUI 上传或命令行示例。

## 📁 项目结构

Expand All @@ -83,6 +94,7 @@ OpenNotebookLM-AIPPT/
├── web/ # React 前端
├── tests/ # 测试
├── doc/ # 输入文档目录
│ └── L9.md # 默认演示资料
├── config.yaml # 配置文件
├── start.sh # 一键启动脚本
└── main.py # 命令行入口
Expand All @@ -91,7 +103,7 @@ OpenNotebookLM-AIPPT/
## ⚙️ 配置说明

所有配置统一在 `config.yaml` 中管理,包括:
- API 配置(图像生成、文本生成
- API 配置(文本 prompt、生图、编辑三角色模型
- PPT 默认配置(语言、风格、页数)
- 超时和重试配置

Expand All @@ -101,11 +113,22 @@ OpenNotebookLM-AIPPT/

```yaml
api:
text:
format: "openai"
model: "gpt-4o"
base_url: "https://api.openai.com/v1"
api_key: "sk-xxx"
models:
prompt_model:
adapter: "openai_chat"
model: "gpt-4o"
base_url: "https://api.openai.com/v1"
api_key: "sk-xxx"
image_model:
adapter: "raw_chat_multimodal"
model: "gpt-image-2"
base_url: "https://api.example.com/v1"
api_key: "sk-xxx"
edit_model:
adapter: "raw_chat_multimodal"
model: "gpt-image-2"
base_url: "https://api.example.com/v1"
api_key: "sk-xxx"
```

## 📤 输出结构
Expand All @@ -121,8 +144,8 @@ output/ppt_20241201_123456/

## 📋 TODO

- [ ] 🔌 兼容更多 LLM API 接口
- [ ] 💾 兼容多种输入格式,更多导出格式
- [ ] 支持框选局部区域编辑
- [ ] 增加更多 provider profile 模板

## 📄 许可证

Expand Down
63 changes: 43 additions & 20 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,20 @@

Recreate NotebookLM's AI PPT feature to automatically convert papers, documents, and other materials into beautiful PPT images.

<video src="docs/assets/aippt-demo.webm" poster="docs/assets/aippt-demo-poster.png" controls muted playsinline width="100%"></video>

[Watch the demo video](docs/assets/aippt-demo.webm)

The demo covers uploading `doc/L9.md`, entering custom requirements, generating and editing the design outline, confirming page designs, generating a 6-slide deck, editing one slide, confirming the replacement, and exporting PDF/PPTX. Model waiting time is fast-forwarded.

## ✨ Features

- 🎨 **AI Image Generation**: Use AI models to convert document content into beautiful PPT images
- 🌐 **WebUI Interface**: User-friendly web interface with file upload, real-time preview, editing, and export
- 📝 **Multi-format Support**: Markdown document input, PDF/PPTX format export
- ✏️ **Image-to-Image Editing**: Support secondary editing of generated slides
- 🎨 **Per-slide image generation**: Create an editable outline and page designs before converting them into PPT page images
- 🌐 **PPT Workbench**: Upload sources, configure model roles, preview slides, edit pages, track history, and export
- 📝 **Multi-format parsing**: Supports `.md/.txt/.pdf/.docx/.pptx` input and converts content to Markdown
- ✏️ **Full-page image editing**: Edit each generated slide independently, revert history, and confirm replacements
- 🔀 **Three model roles**: Configure `prompt_model`, `image_model`, and `edit_model` separately
- 🖼️ **Image result compatibility**: Accepts URLs, Markdown image links, data URLs, `b64_json`, and raw base64
- 💾 **State Persistence**: Auto-save work progress with session recovery

## 🚀 Quick Start
Expand Down Expand Up @@ -56,23 +64,26 @@ cd web && npm install && npm run dev
pip install -r requirements.txt

# Basic usage
python main.py -i doc/sample_paper.txt -n 5
python main.py -i doc/L9.md -n 5

# Generate prompts only
python main.py -i doc/sample_paper.txt -n 5 --prompt-only -o prompts.json
python main.py -i doc/L9.md -n 5 --prompt-only -o prompts.json

# Generate from prompt file
python main.py --from-prompt prompts.json
```

### 3. WebUI Usage Flow

1. **Upload Document**: Drag and drop or click to upload Markdown files in the left panel
2. **Configure API**: Fill in API Key and Base URL in the center panel
3. **Set Parameters**: Choose page count, resolution, aspect ratio, etc.
4. **Generate PPT**: Click "Start Generation" button and watch real-time progress
5. **Preview & Edit**: Preview generated slides in the right panel, click to edit
6. **Export**: Choose PDF or PPTX format to export
1. **Upload Document**: Drag and drop or click to upload a source file in the left panel
2. **Configure Models**: Configure text, image generation, and image editing model roles
3. **Set Parameters & Requirements**: Choose page count, resolution, aspect ratio, language, style, audience, and custom requirements
4. **Confirm Design**: Generate an editable outline, confirm it, then review the generated page designs
5. **Generate PPT**: Generate slide images after page-design confirmation and watch real-time progress
6. **Preview & Edit**: Preview generated slides in the right panel and edit a single page when needed
7. **Export**: Export to PDF or PPTX

The built-in demo source is `doc/L9.md`. This is a repository-relative path, so a fresh clone can use it directly in the WebUI or CLI examples.

## 📁 Project Structure

Expand All @@ -83,6 +94,7 @@ OpenNotebookLM-AIPPT/
├── web/ # React frontend
├── tests/ # Tests
├── doc/ # Input documents directory
│ └── L9.md # Default demo source
├── config.yaml # Configuration file
├── start.sh # One-click startup script
└── main.py # CLI entry point
Expand All @@ -91,7 +103,7 @@ OpenNotebookLM-AIPPT/
## ⚙️ Configuration

All configurations are managed in `config.yaml`, including:
- API configuration (image generation, text generation)
- API configuration (`prompt_model`, `image_model`, `edit_model`)
- PPT default settings (language, style, page count)
- Timeout and retry settings

Expand All @@ -101,11 +113,22 @@ See `config.example.yaml` for detailed configuration examples.

```yaml
api:
text:
format: "openai"
model: "gpt-4o"
base_url: "https://api.openai.com/v1"
api_key: "sk-xxx"
models:
prompt_model:
adapter: "openai_chat"
model: "gpt-4o"
base_url: "https://api.openai.com/v1"
api_key: "sk-xxx"
image_model:
adapter: "raw_chat_multimodal"
model: "gpt-image-2"
base_url: "https://api.example.com/v1"
api_key: "sk-xxx"
edit_model:
adapter: "raw_chat_multimodal"
model: "gpt-image-2"
base_url: "https://api.example.com/v1"
api_key: "sk-xxx"
```

## 📤 Output Structure
Expand All @@ -121,8 +144,8 @@ output/ppt_20241201_123456/

## 📋 TODO

- [ ] 🔌 Support more LLM API interfaces
- [ ] 💾 Support multiple input formats and export formats
- [ ] Support region selection for partial slide editing
- [ ] Add more provider profile templates

## 📄 License

Expand Down
37 changes: 28 additions & 9 deletions api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ FastAPI 后端服务,为 WebUI 前端提供 PPT 生成、编辑和导出功能

## 功能特性

- **文件上传**: 支持上传 Markdown 文件作为输入材料
- **文件上传**: 支持 `.md/.txt/.pdf/.docx/.pptx`,统一解析为 Markdown
- **PPT 生成**: 基于输入材料生成 PPT 幻灯片,支持流式返回进度
- **图生图编辑**: 对单页幻灯片进行修改
- **导出功能**: 支持导出为 PDF 和 PPTX 格式
- **模型路由**: 支持 prompt/image/edit 三角色模型 profile

## API 端点

Expand All @@ -22,7 +23,7 @@ POST /api/upload
Content-Type: multipart/form-data

参数:
- file: Markdown 文件 (.md)
- file: 文档文件 (.md/.txt/.pdf/.docx/.pptx)

响应:
{
Expand All @@ -42,8 +43,20 @@ Content-Type: application/json
{
"content": "Markdown 内容",
"config": {
"api_key": "your-api-key",
"base_url": "https://api.example.com",
"model_profiles": {
"prompt_model": {
"model": "DeepSeek-V4-Pro",
"base_url": "https://api.example.com/v1",
"api_key": "your-text-key",
"adapter": "openai_chat"
},
"image_model": {
"model": "gpt-image-2",
"base_url": "https://api.example.com/v1",
"api_key": "your-image-key",
"adapter": "raw_chat_multimodal"
}
},
"page_count": 10,
"quality": "1K",
"aspect_ratio": "16:9"
Expand All @@ -67,8 +80,11 @@ Content-Type: application/json
"image_base64": "base64编码的图片",
"instruction": "修改指令",
"config": {
"api_key": "your-api-key",
"base_url": "https://api.example.com",
"model_profiles": {
"prompt_model": {"model": "DeepSeek-V4-Pro", "base_url": "https://api.example.com/v1", "api_key": "key", "adapter": "openai_chat"},
"image_model": {"model": "gpt-image-2", "base_url": "https://api.example.com/v1", "api_key": "key", "adapter": "raw_chat_multimodal"},
"edit_model": {"model": "gpt-image-2", "base_url": "https://api.example.com/v1", "api_key": "key", "adapter": "raw_chat_multimodal"}
},
"quality": "1K",
"aspect_ratio": "16:9"
}
Expand All @@ -92,7 +108,8 @@ Content-Type: application/json
"slides": [
{"image_base64": "base64编码的图片"}
],
"format": "pdf" | "pptx"
"format": "pdf" | "pptx",
"aspect_ratio": "16:9" | "4:3"
}

响应: 文件下载
Expand Down Expand Up @@ -138,6 +155,7 @@ api/
│ ├── upload.py # 文件上传路由
│ ├── generate.py # PPT 生成路由
│ ├── edit.py # 图生图编辑路由
│ ├── models.py # 模型 profile 路由
│ └── export.py # 导出路由
└── README.md
```
Expand All @@ -146,8 +164,9 @@ api/

API 服务器直接使用项目中现有的 Python 模块:

- `src.generator.PPTGenerator` - PPT 生成器
- `src.client.AIClient` - AI 客户端
- `src.model_router.ModelRouter` - 三角色模型路由
- `src.image_result.ImageResultNormalizer` - URL/base64 图片结果规范化
- `src.document_parser.DocumentParser` - 文档解析
- `src.exporter.PDFExporter` - PDF 导出器
- `src.config` - 配置管理

Expand Down
3 changes: 2 additions & 1 deletion api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from pathlib import Path

# 导入路由
from .routes import upload, generate, edit, export
from .routes import upload, generate, edit, export, models

# 创建 FastAPI 应用
app = FastAPI(
Expand Down Expand Up @@ -39,6 +39,7 @@ async def health_check():
app.include_router(generate.router)
app.include_router(edit.router)
app.include_router(export.router)
app.include_router(models.router)

# 静态文件服务(用于前端)
# 检查 web/dist 目录是否存在,必须放在最后
Expand Down
Loading
Loading