From 389e9c819f8663c516769ef907187628e52a9d0c Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 11:00:23 +0800 Subject: [PATCH 01/11] docs: add spec for SeeDance2.0 video generation CLI support Part of marswaveai/listenhub-ralph#127 --- docs/specs/listenhub-cli--127-design.md | 129 ++++++++++++++++++++++++ 1 file changed, 129 insertions(+) create mode 100644 docs/specs/listenhub-cli--127-design.md diff --git a/docs/specs/listenhub-cli--127-design.md b/docs/specs/listenhub-cli--127-design.md new file mode 100644 index 0000000..b7df943 --- /dev/null +++ b/docs/specs/listenhub-cli--127-design.md @@ -0,0 +1,129 @@ +# Spec: CLI 支持 SeeDance2.0 视频生成 + +> Issue: marswaveai/listenhub-ralph#127 + +## 背景 + +SDK 0.0.6 已封装 SeeDance2.0 视频生成 API(`v1/video-generation/*`),CLI 需要同步暴露对应命令,让用户通过终端即可创建视频任务、查看任务状态、列出历史任务和预估积分消耗。 + +## 目标 + +在 `listenhub-cli` 中新增 `video` 命令组,覆盖 SeeDance2.0 全部核心操作。 + +## 新增模块 + +``` +source/video/_cli.ts — Commander 注册 +source/video/video.ts — 业务逻辑 +``` + +`source/cli.ts` 新增 `registerVideo` 导入。 + +## 命令设计 + +### `listenhub video create` + +创建视频生成任务。 + +| 参数 | 类型 | 必填 | 默认值 | 说明 | +|------|------|------|--------|------| +| `--prompt ` | string | 是 | — | 视频描述文本 | +| `--model ` | string | 否 | `doubao-seedance-2-pro` | 模型:`doubao-seedance-2-pro` / `doubao-seedance-2-fast` | +| `--resolution ` | string | 否 | `720p` | 分辨率:`480p` / `720p` / `1080p` | +| `--ratio ` | string | 否 | `16:9` | 画面比例:`16:9` / `4:3` / `1:1` / `3:4` / `9:16` / `21:9` | +| `--duration ` | number | 否 | — | 视频时长(秒),不传则使用服务端默认 | +| `--first-frame ` | string | 否 | — | 首帧图片,本地文件或 URL | +| `--last-frame ` | string | 否 | — | 末帧图片 | +| `--reference-image ` | string | 否 | — | 参考图,可重复 | +| `--reference-video ` | string | 否 | — | 参考视频,本地文件或 URL | +| `--reference-audio ` | string | 否 | — | 参考音频 | +| `--generate-audio` | boolean | 否 | `false` | 是否生成音轨 | +| `--seed ` | number | 否 | — | 随机种子 | +| `--no-wait` | boolean | — | — | 提交后立即返回,不轮询 | +| `--timeout ` | number | 否 | `600` | 轮询超时 | +| `-j, --json` | boolean | — | — | JSON 输出 | + +**行为:** +1. 解析 `--prompt` 为 `VideoContentText`。 +2. 依据 `--first-frame`/`--last-frame`/`--reference-image`/`--reference-video`/`--reference-audio` 构建 `content[]`,本地文件通过 `resolveFileOrUrl` 上传后取 URL。 +3. 调用 `client.createVideoGeneration(params)`。 +4. 若 `--no-wait`,打印 `taskId` 后退出。 +5. 否则轮询 `client.getVideoGenerationTask(taskId)` 直到 `success` / `failed` / 超时。 +6. 成功打印视频 URL 与基本信息。 + +### `listenhub video get ` + +获取单个任务详情。 + +| 参数 | 说明 | +|------|------| +| `taskId` | 位置参数 | +| `-j, --json` | JSON 输出 | + +### `listenhub video list` + +列出视频生成任务。 + +| 参数 | 默认 | 说明 | +|------|------|------| +| `--page ` | 1 | 页码 | +| `--page-size ` | 20 | 每页条数 | +| `--status ` | — | 可选筛选:`pending` / `generating` / `uploading` / `success` / `failed` | +| `-j, --json` | — | JSON 输出 | + +### `listenhub video estimate` + +预估积分消耗。 + +| 参数 | 必填 | 默认 | 说明 | +|------|------|------|------| +| `--model ` | 是 | — | 模型 | +| `--resolution ` | 是 | — | 分辨率 | +| `--duration ` | 是 | — | 时长 | +| `--ratio ` | 否 | `16:9` | 比例 | +| `--has-video-input` | 否 | `false` | 是否有参考视频 | +| `--input-video-duration ` | 否 | — | 参考视频时长 | +| `-j, --json` | — | — | JSON 输出 | + +## 改动点 + +| 文件 | 改动 | +|------|------| +| `source/video/_cli.ts` | 新增 — Commander 命令注册 | +| `source/video/video.ts` | 新增 — create / get / list / estimate 逻辑 | +| `source/cli.ts` | 添加 `registerVideo` | +| `source/_shared/polling.ts` | 新增 `pollVideoTaskUntilDone` | +| `source/_shared/upload.ts` | 扩展支持 `video` 类型(`.mp4`/`.mov`/`.webm`,上限 100MB) | +| `package.json` | 升级 `@marswave/listenhub-sdk` 到 `^0.0.6` | +| `README.md` | 添加 `video` 命令说明与示例 | + +## 上传扩展 + +`resolveFileOrUrl` 需要新增 `video` 文件类型: +- 允许后缀:`.mp4`、`.mov`、`.webm` +- 最大体积:100 MB +- MIME:`video/mp4`、`video/quicktime`、`video/webm` +- category:`banana`(同 image) + +## 轮询策略 + +视频生成较慢,采用: +- 间隔 10s(同现有全局 `pollIntervalMs`) +- 默认超时 600s(10 分钟) +- 终态:`success` / `failed` + +## 错误处理 + +- SDK 返回的 `VideoGenerationErrorCode` 映射为可读消息输出。 +- 与现有模块保持一致:`handleError(error, options.json)` 统一格式。 + +## 验收标准 + +1. `listenhub video create --prompt "..." ` 可成功创建任务并轮询到最终结果。 +2. `listenhub video list` 正确展示历史任务列表。 +3. `listenhub video get ` 输出任务详情。 +4. `listenhub video estimate --model ... --resolution ... --duration ...` 输出积分预估。 +5. 本地文件(图片/视频/音频)通过 `--first-frame`/`--reference-video` 等参数上传成功。 +6. `--json` 模式输出合法 JSON。 +7. `pnpm lint` 无错误。 +8. README 包含 `video` 命令最小可用示例。 From 94723f036c778da03ab9fa5e4949071564958760 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 11:11:19 +0800 Subject: [PATCH 02/11] =?UTF-8?q?docs:=20fix=20spec=20=E2=80=94=20add=20in?= =?UTF-8?q?put-video-duration,=20fix=20audio=20default,=20add=20constraint?= =?UTF-8?q?s?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add --input-video-duration (required with --reference-video, range 2-15) - Change --generate-audio to --no-generate-audio (preserve server default: true) - Clarify URLs must be local files or platform asset URLs (not arbitrary external) - Add full parameter validation rules (duration 4-15, seed range, 1080p pro-only, frame/reference mode mutual exclusion, max counts) - Align default model to server default (doubao-seedance-2-fast) - Increase polling timeout to 1200s Part of marswaveai/listenhub-ralph#127 --- docs/specs/listenhub-cli--127-design.md | 98 +++++++++++++++++-------- 1 file changed, 67 insertions(+), 31 deletions(-) diff --git a/docs/specs/listenhub-cli--127-design.md b/docs/specs/listenhub-cli--127-design.md index b7df943..24bbed0 100644 --- a/docs/specs/listenhub-cli--127-design.md +++ b/docs/specs/listenhub-cli--127-design.md @@ -28,28 +28,41 @@ source/video/video.ts — 业务逻辑 | 参数 | 类型 | 必填 | 默认值 | 说明 | |------|------|------|--------|------| | `--prompt ` | string | 是 | — | 视频描述文本 | -| `--model ` | string | 否 | `doubao-seedance-2-pro` | 模型:`doubao-seedance-2-pro` / `doubao-seedance-2-fast` | -| `--resolution ` | string | 否 | `720p` | 分辨率:`480p` / `720p` / `1080p` | -| `--ratio ` | string | 否 | `16:9` | 画面比例:`16:9` / `4:3` / `1:1` / `3:4` / `9:16` / `21:9` | -| `--duration ` | number | 否 | — | 视频时长(秒),不传则使用服务端默认 | -| `--first-frame ` | string | 否 | — | 首帧图片,本地文件或 URL | -| `--last-frame ` | string | 否 | — | 末帧图片 | -| `--reference-image ` | string | 否 | — | 参考图,可重复 | -| `--reference-video ` | string | 否 | — | 参考视频,本地文件或 URL | -| `--reference-audio ` | string | 否 | — | 参考音频 | -| `--generate-audio` | boolean | 否 | `false` | 是否生成音轨 | -| `--seed ` | number | 否 | — | 随机种子 | +| `--model ` | string | 否 | 不传(服务端默认 `doubao-seedance-2-fast`) | 模型:`doubao-seedance-2-pro` / `doubao-seedance-2-fast` | +| `--resolution ` | string | 否 | 不传(服务端默认 `720p`) | 分辨率:`480p` / `720p` / `1080p`(注意:`1080p` 仅 pro 模型支持) | +| `--ratio ` | string | 否 | 不传(服务端默认 `16:9`) | 画面比例:`16:9` / `4:3` / `1:1` / `3:4` / `9:16` / `21:9` | +| `--duration ` | number | 否 | — | 视频时长,范围 4–15 秒 | +| `--first-frame ` | string | 否 | — | 首帧图片,本地文件或平台资产 URL | +| `--last-frame ` | string | 否 | — | 末帧图片(必须同时指定 `--first-frame`) | +| `--reference-image ` | string | 否 | — | 参考图(可重复,最多 9 张),本地文件或平台资产 URL | +| `--reference-video ` | string | 否 | — | 参考视频(可重复,最多 3 个),本地文件或平台资产 URL | +| `--reference-audio ` | string | 否 | — | 参考音频(可重复,最多 3 个),本地文件或平台资产 URL | +| `--input-video-duration ` | number | 否 | — | 参考视频时长,范围 2–15 秒;使用 `--reference-video` 时**必填** | +| `--no-generate-audio` | boolean | — | — | 禁用音轨生成(服务端默认生成音轨) | +| `--seed ` | number | 否 | — | 随机种子,范围 -1 到 4294967295 | | `--no-wait` | boolean | — | — | 提交后立即返回,不轮询 | -| `--timeout ` | number | 否 | `600` | 轮询超时 | +| `--timeout ` | number | 否 | `1200` | 轮询超时 | | `-j, --json` | boolean | — | — | JSON 输出 | +**输入模式互斥规则(CLI 端校验,不满足直接报错退出):** + +- **帧控制模式**(`--first-frame`/`--last-frame`)与**参考模式**(`--reference-image`/`--reference-video`)不可混用。 +- `--last-frame` 必须搭配 `--first-frame`。 +- `--reference-audio` 不能单独使用,必须搭配 prompt 或其他视觉素材。 +- 数量上限:image ≤ 9,video ≤ 3,audio ≤ 3。 + +**URL 约束:** 所有 `` 参数仅接受本地文件路径或 ListenHub 平台资产 URL(GCS bucket / CDN)。外部 URL(如 `https://example.com/v.mp4`)会被后端拒绝。 + **行为:** -1. 解析 `--prompt` 为 `VideoContentText`。 -2. 依据 `--first-frame`/`--last-frame`/`--reference-image`/`--reference-video`/`--reference-audio` 构建 `content[]`,本地文件通过 `resolveFileOrUrl` 上传后取 URL。 -3. 调用 `client.createVideoGeneration(params)`。 -4. 若 `--no-wait`,打印 `taskId` 后退出。 -5. 否则轮询 `client.getVideoGenerationTask(taskId)` 直到 `success` / `failed` / 超时。 -6. 成功打印视频 URL 与基本信息。 +1. 校验输入模式互斥规则和参数范围。 +2. 解析 `--prompt` 为 `VideoContentText`。 +3. 依据素材参数构建 `content[]`,本地文件通过 `resolveFileOrUrl` 上传后取平台 URL。 +4. 仅在用户显式传了 `--model`/`--resolution`/`--ratio`/`--duration`/`--seed` 时才放入请求体,其余由服务端默认。`generateAudio` 仅在 `--no-generate-audio` 时传 `false`。 +5. 若有 `--reference-video`,将 `--input-video-duration` 作为 `inputVideoDuration` 传入(缺失则报错)。 +6. 调用 `client.createVideoGeneration(params)`。 +7. 若 `--no-wait`,打印 `taskId` 后退出。 +8. 否则轮询 `client.getVideoGenerationTask(taskId)` 直到 `success` / `failed` / 超时。 +9. 成功打印视频 URL 与基本信息。 ### `listenhub video get ` @@ -79,10 +92,10 @@ source/video/video.ts — 业务逻辑 |------|------|------|------| | `--model ` | 是 | — | 模型 | | `--resolution ` | 是 | — | 分辨率 | -| `--duration ` | 是 | — | 时长 | +| `--duration ` | 是 | — | 时长(4–15) | | `--ratio ` | 否 | `16:9` | 比例 | | `--has-video-input` | 否 | `false` | 是否有参考视频 | -| `--input-video-duration ` | 否 | — | 参考视频时长 | +| `--input-video-duration ` | 否 | — | 参考视频时长(2–15,`--has-video-input` 时必填) | | `-j, --json` | — | — | JSON 输出 | ## 改动点 @@ -90,7 +103,7 @@ source/video/video.ts — 业务逻辑 | 文件 | 改动 | |------|------| | `source/video/_cli.ts` | 新增 — Commander 命令注册 | -| `source/video/video.ts` | 新增 — create / get / list / estimate 逻辑 | +| `source/video/video.ts` | 新增 — create / get / list / estimate 逻辑 + 输入校验 | | `source/cli.ts` | 添加 `registerVideo` | | `source/_shared/polling.ts` | 新增 `pollVideoTaskUntilDone` | | `source/_shared/upload.ts` | 扩展支持 `video` 类型(`.mp4`/`.mov`/`.webm`,上限 100MB) | @@ -109,21 +122,44 @@ source/video/video.ts — 业务逻辑 视频生成较慢,采用: - 间隔 10s(同现有全局 `pollIntervalMs`) -- 默认超时 600s(10 分钟) +- 默认超时 1200s(20 分钟) - 终态:`success` / `failed` +## CLI 端参数校验 + +在调用 SDK 之前,CLI 需拦截以下非法输入并给出明确错误提示: + +| 规则 | 错误信息 | +|------|----------| +| `--duration` 不在 4–15 | `Duration must be between 4 and 15 seconds` | +| `--seed` 不在 -1 到 4294967295 | `Seed must be between -1 and 4294967295` | +| `--resolution 1080p` + model 非 pro | `1080p resolution requires --model doubao-seedance-2-pro` | +| `--reference-video` 存在但缺 `--input-video-duration` | `--input-video-duration is required when using --reference-video` | +| `--input-video-duration` 不在 2–15 | `Input video duration must be between 2 and 15 seconds` | +| `--last-frame` 无 `--first-frame` | `--last-frame requires --first-frame` | +| 帧控制 + 参考混用 | `Cannot mix frame mode (--first-frame/--last-frame) with reference mode (--reference-image/--reference-video)` | +| `--reference-audio` 无其他视觉素材 | `--reference-audio cannot be used alone` | +| `--reference-image` 超过 9 | `Too many reference images (max 9)` | +| `--reference-video` 超过 3 | `Too many reference videos (max 3)` | +| `--reference-audio` 超过 3 | `Too many reference audios (max 3)` | + ## 错误处理 -- SDK 返回的 `VideoGenerationErrorCode` 映射为可读消息输出。 +- CLI 端校验失败:直接抛 Error,由 `handleError` 统一输出。 +- SDK/后端返回的 `VideoGenerationErrorCode` 映射为可读消息。 - 与现有模块保持一致:`handleError(error, options.json)` 统一格式。 ## 验收标准 -1. `listenhub video create --prompt "..." ` 可成功创建任务并轮询到最终结果。 -2. `listenhub video list` 正确展示历史任务列表。 -3. `listenhub video get ` 输出任务详情。 -4. `listenhub video estimate --model ... --resolution ... --duration ...` 输出积分预估。 -5. 本地文件(图片/视频/音频)通过 `--first-frame`/`--reference-video` 等参数上传成功。 -6. `--json` 模式输出合法 JSON。 -7. `pnpm lint` 无错误。 -8. README 包含 `video` 命令最小可用示例。 +1. `listenhub video create --prompt "..."` 可成功创建任务并轮询到最终结果。 +2. `listenhub video create --prompt "..." --reference-video ./clip.mp4 --input-video-duration 5` 正常工作。 +3. `listenhub video list` 正确展示历史任务列表。 +4. `listenhub video get ` 输出任务详情。 +5. `listenhub video estimate --model ... --resolution ... --duration ...` 输出积分预估。 +6. 本地文件(图片/视频/音频)通过对应参数上传成功。 +7. 传入外部非平台 URL 时后端拒绝,CLI 错误提示清晰。 +8. 输入模式互斥校验:混用帧控制 + 参考模式时 CLI 直接报错。 +9. `--no-generate-audio` 正确禁用音轨;不传时服务端默认生成音轨。 +10. `--json` 模式输出合法 JSON。 +11. `pnpm lint` 无错误。 +12. README 包含 `video` 命令最小可用示例。 From 4b976b239424992b708fd424e8b08c668380aa71 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 11:35:18 +0800 Subject: [PATCH 03/11] docs: fix audio rule and upload category in spec - reference-audio requires reference-image or reference-video (not just prompt) - reference-audio is part of reference mode, cannot mix with frame mode - video command uploads use category=episode (private upload) instead of banana - resolveFileOrUrl gets category override param; existing image cmd unchanged Part of marswaveai/listenhub-ralph#127 --- docs/specs/listenhub-cli--127-design.md | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/docs/specs/listenhub-cli--127-design.md b/docs/specs/listenhub-cli--127-design.md index 24bbed0..6929e12 100644 --- a/docs/specs/listenhub-cli--127-design.md +++ b/docs/specs/listenhub-cli--127-design.md @@ -46,9 +46,9 @@ source/video/video.ts — 业务逻辑 **输入模式互斥规则(CLI 端校验,不满足直接报错退出):** -- **帧控制模式**(`--first-frame`/`--last-frame`)与**参考模式**(`--reference-image`/`--reference-video`)不可混用。 +- **帧控制模式**(`--first-frame`/`--last-frame`)与**参考模式**(`--reference-image`/`--reference-video`/`--reference-audio`)不可混用。 - `--last-frame` 必须搭配 `--first-frame`。 -- `--reference-audio` 不能单独使用,必须搭配 prompt 或其他视觉素材。 +- `--reference-audio` 不能单独使用,必须搭配 `--reference-image` 或 `--reference-video`(纯 prompt + audio 不合法)。 - 数量上限:image ≤ 9,video ≤ 3,audio ≤ 3。 **URL 约束:** 所有 `` 参数仅接受本地文件路径或 ListenHub 平台资产 URL(GCS bucket / CDN)。外部 URL(如 `https://example.com/v.mp4`)会被后端拒绝。 @@ -112,11 +112,20 @@ source/video/video.ts — 业务逻辑 ## 上传扩展 -`resolveFileOrUrl` 需要新增 `video` 文件类型: +`resolveFileOrUrl` 签名扩展为支持 category override: + +```ts +resolveFileOrUrl(client, input, { accept: 'video', category: 'episode' }) +``` + +**新增 `video` 文件类型:** - 允许后缀:`.mp4`、`.mov`、`.webm` - 最大体积:100 MB - MIME:`video/mp4`、`video/quicktime`、`video/webm` -- category:`banana`(同 image) + +**video 命令的 upload category:** 所有素材(image/video/audio)统一使用 `category=episode`(private upload)。后端 `resolveMediaUrl` 对 private bucket URL 通过 `UserFileDao` 校验所有权后签名,这是最稳妥的路径。现有 image 命令继续用 `category=banana` 不受影响。 + +> 技术原因:后端 `resolveMediaUrl` 只接受三种 URL —— public CDN、private bucket(需 UserFileDao 记录)、已签名 URL。虽然 banana public bucket 碰巧在白名单中,但 private upload 语义更明确且不依赖隐式行为。 ## 轮询策略 @@ -137,8 +146,8 @@ source/video/video.ts — 业务逻辑 | `--reference-video` 存在但缺 `--input-video-duration` | `--input-video-duration is required when using --reference-video` | | `--input-video-duration` 不在 2–15 | `Input video duration must be between 2 and 15 seconds` | | `--last-frame` 无 `--first-frame` | `--last-frame requires --first-frame` | -| 帧控制 + 参考混用 | `Cannot mix frame mode (--first-frame/--last-frame) with reference mode (--reference-image/--reference-video)` | -| `--reference-audio` 无其他视觉素材 | `--reference-audio cannot be used alone` | +| 帧控制 + 参考混用 | `Cannot mix frame mode (--first-frame/--last-frame) with reference mode (--reference-image/--reference-video/--reference-audio)` | +| `--reference-audio` 无 image/video 素材 | `--reference-audio requires --reference-image or --reference-video` | | `--reference-image` 超过 9 | `Too many reference images (max 9)` | | `--reference-video` 超过 3 | `Too many reference videos (max 3)` | | `--reference-audio` 超过 3 | `Too many reference audios (max 3)` | From 25f6e875522ade5bd122102e67dc5b48323b255b Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 12:13:40 +0800 Subject: [PATCH 04/11] docs: add implementation plan for video generation CLI 7-step plan covering SDK upgrade, upload extension, polling, video module (create/get/list/estimate), CLI registration, and README. Part of marswaveai/listenhub-ralph#127 --- docs/plans/listenhub-cli--127-plan.md | 302 ++++++++++++++++++++++++++ 1 file changed, 302 insertions(+) create mode 100644 docs/plans/listenhub-cli--127-plan.md diff --git a/docs/plans/listenhub-cli--127-plan.md b/docs/plans/listenhub-cli--127-plan.md new file mode 100644 index 0000000..401035b --- /dev/null +++ b/docs/plans/listenhub-cli--127-plan.md @@ -0,0 +1,302 @@ +# Plan: CLI 支持 SeeDance2.0 视频生成 + +> Issue: marswaveai/listenhub-ralph#127 +> Spec: docs/specs/listenhub-cli--127-design.md + +## 实现步骤 + +### Step 1: 升级 SDK + 扩展 upload 工具 + +**文件:`package.json`** +- `@marswave/listenhub-sdk` 从 `^0.0.4` 改为 `^0.0.6` +- 运行 `pnpm install` 更新 lockfile + +**文件:`source/_shared/upload.ts`** + +1. 新增 `video` accept type: + ```ts + type FileAcceptType = 'audio' | 'image' | 'video'; + ``` + +2. 新增视频相关常量: + ```ts + const videoExtensions = new Set(['.mp4', '.mov', '.webm']); + // maxSizeBytes + video: 100 * 1024 * 1024, + // categoryForType + video: 'episode', + // mimeTypes + ['.mp4', 'video/mp4'], + ['.mov', 'video/quicktime'], + ['.webm', 'video/webm'], + ``` + +3. `allowedExtensions` 函数扩展 video 分支。 + +4. `resolveFileOrUrl` 签名增加可选 `category` override: + ```ts + export async function resolveFileOrUrl( + client: ListenHubClient, + input: string, + options: { accept: FileAcceptType; category?: string }, + ): Promise + ``` + 内部 `const category = options.category ?? categoryForType[options.accept];` + +--- + +### Step 2: 新增视频轮询函数 + +**文件:`source/_shared/polling.ts`** + +在文件末尾新增 `pollVideoTaskUntilDone`: + +```ts +import type { VideoGenerationTaskDetail } from '@marswave/listenhub-sdk'; + +export async function pollVideoTaskUntilDone( + client: ListenHubClient, + taskId: string, + options: { timeout?: number; json?: boolean }, +): Promise { + const timeoutS = options.timeout ?? 1200; + const maxAttempts = Math.ceil(timeoutS / (pollIntervalMs / 1000)); + const spinner = options.json + ? undefined + : ora({ text: `Generating video... (1/${maxAttempts})` }).start(); + + for (let i = 0; i < maxAttempts; i++) { + if (i > 0) await sleep(pollIntervalMs); + const task = await client.getVideoGenerationTask(taskId); + if (task.status === 'success') { + spinner?.succeed('Video created successfully'); + return task; + } + if (task.status === 'failed') { + spinner?.fail('Video creation failed'); + throw new Error('Video creation failed'); + } + if (spinner) { + spinner.text = `Generating video... (${String(i + 2)}/${maxAttempts})`; + } + } + spinner?.fail('Timed out'); + throw new CliTimeoutError(`Timed out after ${timeoutS}s`); +} +``` + +需在顶部 import 区域添加 `VideoGenerationTaskDetail` 类型。 + +--- + +### Step 3: 新增 `source/video/video.ts` — 业务逻辑 + +导出四个函数:`createVideo`、`getVideo`、`listVideos`、`estimateCredits`。 + +**类型定义:** + +```ts +export type VideoCreateOptions = { + prompt: string; + model?: string; + resolution?: string; + ratio?: string; + duration?: number; + firstFrame?: string; + lastFrame?: string; + referenceImage: string[]; + referenceVideo: string[]; + referenceAudio: string[]; + inputVideoDuration?: number; + generateAudio: boolean; // Commander --no-generate-audio 会反转为 generateAudio: false + seed?: number; + wait: boolean; + timeout: number; + json: boolean; +}; +``` + +**`createVideo` 逻辑:** + +1. **校验阶段** — 调用 `validateCreateOptions(options)` 内部函数,按 spec 校验表逐条检查,不满足直接 `throw new Error(msg)`。 + +2. **构建 content 数组:** + ```ts + const content: VideoContentItem[] = []; + // prompt → { type: 'text', text: options.prompt } + // firstFrame → resolveFileOrUrl(client, path, { accept: 'image', category: 'episode' }) + // → { type: 'image_url', image_url: { url }, role: 'first_frame' } + // lastFrame → 同上,role: 'last_frame' + // referenceImage[] → 同上,role: 'reference_image' + // referenceVideo[] → resolveFileOrUrl(client, path, { accept: 'video', category: 'episode' }) + // → { type: 'video_url', video_url: { url }, role: 'reference_video' } + // referenceAudio[] → resolveFileOrUrl(client, path, { accept: 'audio', category: 'episode' }) + // → { type: 'audio_url', audio_url: { url }, role: 'reference_audio' } + ``` + +3. **构建请求参数:** 只传用户显式指定的字段。 + ```ts + const params: CreateVideoGenerationParams = { + content, + ...(options.model && { model: options.model }), + ...(options.resolution && { resolution: options.resolution }), + ...(options.ratio && { ratio: options.ratio }), + ...(options.duration !== undefined && { duration: options.duration }), + ...(!options.generateAudio && { generateAudio: false }), + ...(options.seed !== undefined && { seed: options.seed }), + ...(options.inputVideoDuration !== undefined && { inputVideoDuration: options.inputVideoDuration }), + }; + ``` + +4. **调用 SDK + 轮询/即时返回。** + +5. **输出:** 成功时 `printDetail` 展示 taskId、videoUrl、duration、resolution、ratio、seed、creditCharged。 + +**`getVideo`:** 调用 `client.getVideoGenerationTask(taskId)` → `printDetail` / `printJson`。 + +**`listVideos`:** 调用 `client.listVideoGenerationTasks(params)` → `printTable` 显示 ID / Model / Status / Duration / Created。 + +**`estimateCredits`:** 调用 `client.estimateVideoGenerationCredits(params)` → 输出 tokens 和 credits。 + +--- + +### Step 4: 新增 `source/video/_cli.ts` — Commander 注册 + +```ts +import { type Command, Option } from 'commander'; +import { getClient } from '../_shared/client.js'; +import { handleError } from '../_shared/output.js'; +import { createVideo, getVideo, listVideos, estimateCredits } from './video.js'; + +function collect(value: string, previous: string[]): string[] { + return [...previous, value]; +} + +export function register(program: Command) { + const cmd = program.command('video').description('SeeDance video generation'); + + cmd.command('create') + .description('Create a video generation task') + .requiredOption('--prompt ', 'Video description') + .option('--model ', 'Model: doubao-seedance-2-pro, doubao-seedance-2-fast') + .option('--resolution ', 'Resolution: 480p, 720p, 1080p') + .option('--ratio ', 'Aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9') + .option('--duration ', 'Video duration in seconds (4-15)', Number) + .option('--first-frame ', 'First frame image') + .option('--last-frame ', 'Last frame image (requires --first-frame)') + .option('--reference-image ', 'Reference image (repeatable, max 9)', collect, []) + .option('--reference-video ', 'Reference video (repeatable, max 3)', collect, []) + .option('--reference-audio ', 'Reference audio (repeatable, max 3)', collect, []) + .option('--input-video-duration ', 'Reference video duration (2-15, required with --reference-video)', Number) + .option('--no-generate-audio', 'Disable audio generation') + .option('--seed ', 'Random seed (-1 to 4294967295)', Number) + .option('--no-wait', 'Return immediately without polling') + .option('--timeout ', 'Polling timeout', Number, 1200) + .option('-j, --json', 'Output JSON', false) + .action(async (options) => { ... }); + + cmd.command('get ') + .description('Get video task details') + .option('-j, --json', 'Output JSON', false) + .action(async (taskId, options) => { ... }); + + cmd.command('list') + .description('List video generation tasks') + .option('--page ', 'Page number', Number, 1) + .option('--page-size ', 'Items per page', Number, 20) + .option('--status ', 'Filter: pending, generating, uploading, success, failed') + .option('-j, --json', 'Output JSON', false) + .action(async (options) => { ... }); + + cmd.command('estimate') + .description('Estimate credit cost') + .requiredOption('--model ', 'Model name') + .requiredOption('--resolution ', 'Resolution') + .requiredOption('--duration ', 'Duration (4-15)', Number) + .option('--ratio ', 'Aspect ratio', '16:9') + .option('--has-video-input', 'Has reference video input', false) + .option('--input-video-duration ', 'Reference video duration', Number) + .option('-j, --json', 'Output JSON', false) + .action(async (options) => { ... }); +} +``` + +--- + +### Step 5: 注册到主入口 + +**文件:`source/cli.ts`** + +```ts +import { register as registerVideo } from './video/_cli.js'; +// ... +registerVideo(program); // 放在 registerCreation 之前 +``` + +--- + +### Step 6: 更新 README + +**文件:`README.md`** + +1. Commands 表新增 Video 部分: + ``` + ### Video Generation + + | Command | Description | + | -------------------------- | ----------------------------- | + | `listenhub video create` | Create a video generation task | + | `listenhub video list` | List video tasks | + | `listenhub video get ` | Get video task details | + | `listenhub video estimate` | Estimate credit cost | + ``` + +2. Examples 新增: + ```bash + ### Video generation + + ```bash + # Text-to-video + listenhub video create --prompt "A cat playing piano in a jazz bar" + + # Image-to-video (first frame) + listenhub video create --prompt "Camera slowly zooms out" --first-frame ./scene.png + + # With reference video + listenhub video create --prompt "Same style dancing" \ + --reference-video ./clip.mp4 --input-video-duration 8 + + # Estimate credits + listenhub video estimate --model doubao-seedance-2-pro --resolution 1080p --duration 10 + ``` + +--- + +### Step 7: Lint 检查 + 修复 + +```bash +pnpm lint +# 若有问题则修复后重新运行 +``` + +--- + +## 文件清单 + +| 文件 | 操作 | 行数估算 | +|------|------|----------| +| `package.json` | 修改 | ~1 行 | +| `source/_shared/upload.ts` | 修改 | +15 行 | +| `source/_shared/polling.ts` | 修改 | +30 行 | +| `source/video/video.ts` | 新增 | ~180 行 | +| `source/video/_cli.ts` | 新增 | ~90 行 | +| `source/cli.ts` | 修改 | +2 行 | +| `README.md` | 修改 | +25 行 | + +总新增约 340 行代码。 + +## 风险点 + +1. **SDK 0.0.6 兼容性** — CLI 当前锁定 `^0.0.4`,升级后确认其他命令不受影响(SDK 是向后兼容的增量新增)。 +2. **视频文件上传体积** — 100MB 本地文件上传到 GCS 可能耗时较长,`resolveFileOrUrl` 当前无进度条,大文件体验需留意(不在本次范围内解决)。 +3. **Commander `--no-generate-audio` 语义** — Commander 会自动创建 `generateAudio` 布尔值,默认 `true`,传 `--no-generate-audio` 后变 `false`。需确认 Commander 版本行为。 From cf6af0707a2c0bc05f20d5fca3819031f942cb4c Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 12:29:19 +0800 Subject: [PATCH 05/11] =?UTF-8?q?docs:=20fix=20spec+plan=20=E2=80=94=20tig?= =?UTF-8?q?hten=20format=20limits,=20add=20validations,=20fix=20README?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Video: mp4/mov only, 50MB max (no webm) - Audio for video cmd: mp3/wav only, 15MB max - Add --input-video-duration without --reference-video validation (both directions) - estimate: --input-video-duration and --has-video-input must pair - Step 7: add pnpm build + smoke check (video --help) - Fix README nested fence, add README.zh-CN.md to plan Part of marswaveai/listenhub-ralph#127 --- docs/plans/listenhub-cli--127-plan.md | 47 ++++++++++++++++++------- docs/specs/listenhub-cli--127-design.md | 12 +++++-- 2 files changed, 44 insertions(+), 15 deletions(-) diff --git a/docs/plans/listenhub-cli--127-plan.md b/docs/plans/listenhub-cli--127-plan.md index 401035b..9939f96 100644 --- a/docs/plans/listenhub-cli--127-plan.md +++ b/docs/plans/listenhub-cli--127-plan.md @@ -18,19 +18,26 @@ type FileAcceptType = 'audio' | 'image' | 'video'; ``` -2. 新增视频相关常量: +2. 新增视频相关常量(SeeDance 仅支持 mp4/mov,单文件 < 50MB): ```ts - const videoExtensions = new Set(['.mp4', '.mov', '.webm']); + const videoExtensions = new Set(['.mp4', '.mov']); // maxSizeBytes - video: 100 * 1024 * 1024, + video: 50 * 1024 * 1024, // categoryForType video: 'episode', // mimeTypes ['.mp4', 'video/mp4'], ['.mov', 'video/quicktime'], - ['.webm', 'video/webm'], ``` +3. video 命令中音频素材限制为 `mp3/wav`(SeeDance 支持范围),单文件 < 15MB。 + 在 `resolveFileOrUrl` 调用时,video 命令对 audio 类型传 `{ accept: 'audio', category: 'episode' }` —— + 但需新增一个 `videoAudioExtensions` 集合做额外校验(或在 video.ts 校验层先过滤后缀), + 避免用户传 `.flac`/`.ogg` 等 CLI 层面放行但 provider 拒绝的格式。 + + 实现方式:在 `video.ts` 的 `validateCreateOptions` 中检查 `--reference-audio` 文件后缀, + 不在 `['.mp3', '.wav']` 内的直接报错:`Reference audio must be .mp3 or .wav`。 + 3. `allowedExtensions` 函数扩展 video 分支。 4. `resolveFileOrUrl` 签名增加可选 `category` override: @@ -119,6 +126,10 @@ export type VideoCreateOptions = { **`createVideo` 逻辑:** 1. **校验阶段** — 调用 `validateCreateOptions(options)` 内部函数,按 spec 校验表逐条检查,不满足直接 `throw new Error(msg)`。 + 额外规则: + - 没有 `--reference-video` 时传了 `--input-video-duration` → 报错 `--input-video-duration requires --reference-video` + - `--reference-audio` 文件后缀不在 `.mp3`/`.wav` 内 → 报错 `Reference audio must be .mp3 or .wav` + - `--reference-video` 文件后缀不在 `.mp4`/`.mov` 内 → 报错 `Reference video must be .mp4 or .mov` 2. **构建 content 数组:** ```ts @@ -157,6 +168,7 @@ export type VideoCreateOptions = { **`listVideos`:** 调用 `client.listVideoGenerationTasks(params)` → `printTable` 显示 ID / Model / Status / Duration / Created。 **`estimateCredits`:** 调用 `client.estimateVideoGenerationCredits(params)` → 输出 tokens 和 credits。 +校验:`--input-video-duration` 和 `--has-video-input` 必须成对出现,缺一报错。 --- @@ -251,9 +263,7 @@ registerVideo(program); // 放在 registerCreation 之前 | `listenhub video estimate` | Estimate credit cost | ``` -2. Examples 新增: - ```bash - ### Video generation +2. Examples 新增 Video generation 小节: ```bash # Text-to-video @@ -270,15 +280,27 @@ registerVideo(program); // 放在 registerCreation 之前 listenhub video estimate --model doubao-seedance-2-pro --resolution 1080p --duration 10 ``` +**文件:`README.zh-CN.md`** + +同步更新中文 README,添加对应的 Video Generation 命令表和示例(与英文版对齐)。 + --- -### Step 7: Lint 检查 + 修复 +### Step 7: 构建验证 + Lint + Smoke check ```bash -pnpm lint -# 若有问题则修复后重新运行 +pnpm lint # xo 代码规范 +pnpm run build # TypeScript 编译,确认无类型错误 + +# Smoke check — 确认命令注册正确 +node dist/cli.js video --help +node dist/cli.js video create --help +node dist/cli.js video list --help +node dist/cli.js video estimate --help ``` +若有问题则修复后重新运行。 + --- ## 文件清单 @@ -288,15 +310,16 @@ pnpm lint | `package.json` | 修改 | ~1 行 | | `source/_shared/upload.ts` | 修改 | +15 行 | | `source/_shared/polling.ts` | 修改 | +30 行 | -| `source/video/video.ts` | 新增 | ~180 行 | +| `source/video/video.ts` | 新增 | ~200 行 | | `source/video/_cli.ts` | 新增 | ~90 行 | | `source/cli.ts` | 修改 | +2 行 | | `README.md` | 修改 | +25 行 | +| `README.zh-CN.md` | 修改 | +25 行 | 总新增约 340 行代码。 ## 风险点 1. **SDK 0.0.6 兼容性** — CLI 当前锁定 `^0.0.4`,升级后确认其他命令不受影响(SDK 是向后兼容的增量新增)。 -2. **视频文件上传体积** — 100MB 本地文件上传到 GCS 可能耗时较长,`resolveFileOrUrl` 当前无进度条,大文件体验需留意(不在本次范围内解决)。 +2. **视频文件上传体积** — 50MB 本地文件上传到 GCS 可能耗时较长,`resolveFileOrUrl` 当前无进度条,大文件体验需留意(不在本次范围内解决)。 3. **Commander `--no-generate-audio` 语义** — Commander 会自动创建 `generateAudio` 布尔值,默认 `true`,传 `--no-generate-audio` 后变 `false`。需确认 Commander 版本行为。 diff --git a/docs/specs/listenhub-cli--127-design.md b/docs/specs/listenhub-cli--127-design.md index 6929e12..605a079 100644 --- a/docs/specs/listenhub-cli--127-design.md +++ b/docs/specs/listenhub-cli--127-design.md @@ -119,9 +119,14 @@ resolveFileOrUrl(client, input, { accept: 'video', category: 'episode' }) ``` **新增 `video` 文件类型:** -- 允许后缀:`.mp4`、`.mov`、`.webm` -- 最大体积:100 MB -- MIME:`video/mp4`、`video/quicktime`、`video/webm` +- 允许后缀:`.mp4`、`.mov` +- 最大体积:50 MB +- MIME:`video/mp4`、`video/quicktime` + +**video 命令中音频素材限制:** +- 允许后缀:`.mp3`、`.wav`(SeeDance 支持范围,不含 `.flac`/`.ogg` 等) +- 最大体积:15 MB +- 在 `video.ts` 校验层额外检查后缀,不合格直接报错 **video 命令的 upload category:** 所有素材(image/video/audio)统一使用 `category=episode`(private upload)。后端 `resolveMediaUrl` 对 private bucket URL 通过 `UserFileDao` 校验所有权后签名,这是最稳妥的路径。现有 image 命令继续用 `category=banana` 不受影响。 @@ -144,6 +149,7 @@ resolveFileOrUrl(client, input, { accept: 'video', category: 'episode' }) | `--seed` 不在 -1 到 4294967295 | `Seed must be between -1 and 4294967295` | | `--resolution 1080p` + model 非 pro | `1080p resolution requires --model doubao-seedance-2-pro` | | `--reference-video` 存在但缺 `--input-video-duration` | `--input-video-duration is required when using --reference-video` | +| `--input-video-duration` 存在但无 `--reference-video` | `--input-video-duration requires --reference-video` | | `--input-video-duration` 不在 2–15 | `Input video duration must be between 2 and 15 seconds` | | `--last-frame` 无 `--first-frame` | `--last-frame requires --first-frame` | | 帧控制 + 参考混用 | `Cannot mix frame mode (--first-frame/--last-frame) with reference mode (--reference-image/--reference-video/--reference-audio)` | From d888635233ebffd8ba284158e58d63de74d4ace7 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 12:31:56 +0800 Subject: [PATCH 06/11] =?UTF-8?q?docs:=20plan=20=E2=80=94=20bump=20CLI=20v?= =?UTF-8?q?ersion=20to=200.0.5,=20use=20vp=20check=20for=20validation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Part of marswaveai/listenhub-ralph#127 --- docs/plans/listenhub-cli--127-plan.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/plans/listenhub-cli--127-plan.md b/docs/plans/listenhub-cli--127-plan.md index 9939f96..2e2ec85 100644 --- a/docs/plans/listenhub-cli--127-plan.md +++ b/docs/plans/listenhub-cli--127-plan.md @@ -9,6 +9,7 @@ **文件:`package.json`** - `@marswave/listenhub-sdk` 从 `^0.0.4` 改为 `^0.0.6` +- `"version"` 从 `"0.0.4"` 升为 `"0.0.5"`(新增功能,minor bump) - 运行 `pnpm install` 更新 lockfile **文件:`source/_shared/upload.ts`** @@ -286,20 +287,21 @@ registerVideo(program); // 放在 registerCreation 之前 --- -### Step 7: 构建验证 + Lint + Smoke check +### Step 7: `vp check` + Smoke check ```bash -pnpm lint # xo 代码规范 -pnpm run build # TypeScript 编译,确认无类型错误 +# vp check = fmt --check + lint + type check(三合一) +pnpm check # Smoke check — 确认命令注册正确 +pnpm build node dist/cli.js video --help node dist/cli.js video create --help node dist/cli.js video list --help node dist/cli.js video estimate --help ``` -若有问题则修复后重新运行。 +若有问题则修复后重新运行。`vp check` 必须全通过才能提交 PR。 --- From 70093220989a6bbe383a5eb2fb90c7956cc6df77 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 13:01:32 +0800 Subject: [PATCH 07/11] feat: add video generation commands (SeeDance2.0) New `listenhub video` command group with create, get, list, and estimate subcommands. Upgrades SDK to 0.0.6 and extends upload utility to support video files (.mp4/.mov, max 50MB). --- README.md | 31 +++- README.zh-CN.md | 33 +++- package.json | 4 +- pnpm-lock.yaml | 10 +- source/_shared/polling.ts | 37 +++++ source/_shared/upload.ts | 15 +- source/cli.ts | 2 + source/video/_cli.ts | 100 ++++++++++++ source/video/video.ts | 312 ++++++++++++++++++++++++++++++++++++++ 9 files changed, 530 insertions(+), 14 deletions(-) create mode 100644 source/video/_cli.ts create mode 100644 source/video/video.ts diff --git a/README.md b/README.md index 648f0a4..03c52b4 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # ListenHub CLI -Command-line interface for [ListenHub](https://listenhub.ai) — create podcasts, text-to-speech, explainer videos, slides, AI images, and music from your terminal. +Command-line interface for [ListenHub](https://listenhub.ai) — create podcasts, text-to-speech, explainer videos, slides, AI images, music, and videos from your terminal. [中文文档](README.zh-CN.md) @@ -76,6 +76,15 @@ listenhub tts create --text "Hello, world" --lang en | `listenhub image list` | List AI images | | `listenhub image get ` | Get image details | +### Video Generation + +| Command | Description | +| -------------------------- | ------------------------------ | +| `listenhub video create` | Create a video generation task | +| `listenhub video list` | List video tasks | +| `listenhub video get ` | Get video task details | +| `listenhub video estimate` | Estimate credit cost | + ### Other | Command | Description | @@ -109,6 +118,9 @@ listenhub music cover --audio ./song.mp3 # Local image for reference (jpg, png, webp, gif; max 10MB) listenhub image create --prompt "inspired by this" --reference ./photo.jpg +# Local video for reference (mp4, mov; max 50MB) +listenhub video create --prompt "same style" --reference-video ./clip.mp4 --input-video-duration 5 + # URLs are passed through directly listenhub music cover --audio https://example.com/song.mp3 ``` @@ -158,6 +170,23 @@ listenhub image create \ --aspect-ratio 16:9 --size 4K ``` +### Video generation + +```bash +# Text-to-video +listenhub video create --prompt "A cat playing piano in a jazz bar" + +# Image-to-video (first frame) +listenhub video create --prompt "Camera slowly zooms out" --first-frame ./scene.png + +# With reference video +listenhub video create --prompt "Same style dancing" \ + --reference-video ./clip.mp4 --input-video-duration 8 + +# Estimate credits +listenhub video estimate --model doubao-seedance-2-pro --resolution 1080p --duration 10 +``` + ### JSON output for scripting ```bash diff --git a/README.zh-CN.md b/README.zh-CN.md index fb08f75..eefa0cf 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -1,6 +1,6 @@ # ListenHub CLI -[ListenHub](https://listenhub.ai) 的命令行工具 — 在终端里创建播客、语音合成、讲解视频、幻灯片、AI 图片和音乐。 +[ListenHub](https://listenhub.ai) 的命令行工具 — 在终端里创建播客、语音合成、讲解视频、幻灯片、AI 图片、音乐和视频。 [English](README.md) @@ -76,6 +76,15 @@ listenhub tts create --text "你好世界" --lang zh | `listenhub image list` | 列出图片 | | `listenhub image get ` | 查看图片详情 | +### 视频生成 + +| 命令 | 说明 | +| -------------------------- | ---------------- | +| `listenhub video create` | 创建视频生成任务 | +| `listenhub video list` | 列出视频任务 | +| `listenhub video get ` | 查看视频任务详情 | +| `listenhub video estimate` | 预估积分消耗 | + ### 其他 | 命令 | 说明 | @@ -100,7 +109,7 @@ listenhub tts create --text "你好世界" --lang zh ## 本地文件上传 -`music cover` 和 `image create` 支持引用本地文件。CLI 自动检测本地路径,校验格式和大小,上传到云存储后传给 API。 +`music cover`、`image create` 和 `video create` 支持引用本地文件。CLI 自动检测本地路径,校验格式和大小,上传到云存储后传给 API。 ```bash # 本地音频文件用于翻唱(mp3, wav, flac, m4a, ogg, aac;最大 20MB) @@ -109,6 +118,9 @@ listenhub music cover --audio ./song.mp3 # 本地图片用于参考(jpg, png, webp, gif;最大 10MB) listenhub image create --prompt "以此为灵感" --reference ./photo.jpg +# 本地视频用于参考(mp4, mov;最大 50MB) +listenhub video create --prompt "同样风格" --reference-video ./clip.mp4 --input-video-duration 5 + # URL 直接透传 listenhub music cover --audio https://example.com/song.mp3 ``` @@ -158,6 +170,23 @@ listenhub image create \ --aspect-ratio 16:9 --size 4K ``` +### 视频生成 + +```bash +# 文字生成视频 +listenhub video create --prompt "一只猫在爵士酒吧弹钢琴" + +# 图生视频(首帧) +listenhub video create --prompt "镜头缓缓拉远" --first-frame ./scene.png + +# 带参考视频 +listenhub video create --prompt "相同风格的舞蹈" \ + --reference-video ./clip.mp4 --input-video-duration 8 + +# 预估积分 +listenhub video estimate --model doubao-seedance-2-pro --resolution 1080p --duration 10 +``` + ### 脚本中使用 JSON 输出 ```bash diff --git a/package.json b/package.json index c0f244e..6e8c5af 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@marswave/listenhub-cli", - "version": "0.0.4", + "version": "0.0.5", "description": "Command-line interface for ListenHub", "license": "MIT", "repository": "marswaveai/listenhub-cli", @@ -25,7 +25,7 @@ "prepublishOnly": "pnpm run build" }, "dependencies": { - "@marswave/listenhub-sdk": "^0.0.4", + "@marswave/listenhub-sdk": "^0.0.6", "commander": "^14.0.3", "open": "^10.0.0", "ora": "^8.0.0" diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index f0468a9..e935df0 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -9,8 +9,8 @@ importers: .: dependencies: '@marswave/listenhub-sdk': - specifier: ^0.0.4 - version: 0.0.4 + specifier: ^0.0.6 + version: 0.0.6 commander: specifier: ^14.0.3 version: 14.0.3 @@ -57,8 +57,8 @@ packages: '@jridgewell/sourcemap-codec@1.5.5': resolution: {integrity: sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==} - '@marswave/listenhub-sdk@0.0.4': - resolution: {integrity: sha512-24kmN+TS2xuIObGAN9LzYG4LNucbPsjlbOivRPCvh49w0dmuNO1UPQRVK3bNWyYyT/cdFfajENrtYAfIv+3Atw==} + '@marswave/listenhub-sdk@0.0.6': + resolution: {integrity: sha512-XHb/RqZWPFj4pPrPg9bADBayttGi7wNojGJO5Rm05RheRMUdX7Yrw1g7p3kDqfbtij9hvEFnOXRqC4YFUCMztQ==} engines: {node: '>=20'} '@napi-rs/wasm-runtime@1.1.4': @@ -1118,7 +1118,7 @@ snapshots: '@jridgewell/sourcemap-codec@1.5.5': {} - '@marswave/listenhub-sdk@0.0.4': + '@marswave/listenhub-sdk@0.0.6': dependencies: ky: 1.14.3 diff --git a/source/_shared/polling.ts b/source/_shared/polling.ts index 42cc3dc..9d65dc8 100644 --- a/source/_shared/polling.ts +++ b/source/_shared/polling.ts @@ -4,6 +4,7 @@ import type { ListenHubClient, LyricsTaskDetail, MusicTaskDetail, + VideoGenerationTaskDetail, } from '@marswave/listenhub-sdk'; import ora from 'ora'; import {CliTimeoutError} from './output.js'; @@ -127,6 +128,42 @@ export async function pollMusicTaskUntilDone( throw new CliTimeoutError(`Timed out after ${timeoutS}s`); } +export async function pollVideoTaskUntilDone( + client: ListenHubClient, + taskId: string, + options: {timeout?: number; json?: boolean}, +): Promise { + const timeoutS = options.timeout ?? 1200; + const maxAttempts = Math.ceil(timeoutS / (pollIntervalMs / 1000)); + const spinner = options.json + ? undefined + : ora({text: `Generating video... (1/${maxAttempts})`}).start(); + + for (let i = 0; i < maxAttempts; i++) { + if (i > 0) { + await sleep(pollIntervalMs); // eslint-disable-line no-await-in-loop + } + + const task = await client.getVideoGenerationTask(taskId); // eslint-disable-line no-await-in-loop + if (task.status === 'success') { + spinner?.succeed('Video created successfully'); + return task; + } + + if (task.status === 'failed') { + spinner?.fail('Video creation failed'); + throw new Error('Video creation failed'); + } + + if (spinner) { + spinner.text = `Generating video... (${String(i + 2)}/${maxAttempts})`; + } + } + + spinner?.fail('Timed out'); + throw new CliTimeoutError(`Timed out after ${timeoutS}s`); +} + const lyricsIntervalMs = 5000; export async function pollLyricsTaskUntilDone( diff --git a/source/_shared/upload.ts b/source/_shared/upload.ts index ac88fbe..61ae777 100644 --- a/source/_shared/upload.ts +++ b/source/_shared/upload.ts @@ -2,19 +2,22 @@ import {access, readFile, stat} from 'node:fs/promises'; import path from 'node:path'; import type {ListenHubClient} from '@marswave/listenhub-sdk'; -type FileAcceptType = 'audio' | 'image'; +type FileAcceptType = 'audio' | 'image' | 'video'; const audioExtensions = new Set(['.mp3', '.wav', '.flac', '.m4a', '.ogg', '.aac']); const imageExtensions = new Set(['.jpg', '.jpeg', '.png', '.webp', '.gif']); +const videoExtensions = new Set(['.mp4', '.mov']); const maxSizeBytes: Record = { audio: 20 * 1024 * 1024, image: 10 * 1024 * 1024, + video: 50 * 1024 * 1024, }; const categoryForType: Record = { audio: 'episode', image: 'banana', + video: 'episode', }; const mimeTypes = new Map([ @@ -29,16 +32,20 @@ const mimeTypes = new Map([ ['.png', 'image/png'], ['.webp', 'image/webp'], ['.gif', 'image/gif'], + ['.mp4', 'video/mp4'], + ['.mov', 'video/quicktime'], ]); function allowedExtensions(accept: FileAcceptType): Set { - return accept === 'audio' ? audioExtensions : imageExtensions; + if (accept === 'audio') return audioExtensions; + if (accept === 'video') return videoExtensions; + return imageExtensions; } export async function resolveFileOrUrl( client: ListenHubClient, input: string, - options: {accept: FileAcceptType}, + options: {accept: FileAcceptType; category?: string}, ): Promise { const trimmed = input.trim(); @@ -77,7 +84,7 @@ export async function resolveFileOrUrl( // Get presigned upload URL const contentType = mimeTypes.get(ext)!; const fileKey = path.basename(filePath); - const category = categoryForType[options.accept]; + const category = options.category ?? categoryForType[options.accept]; const {presignedUrl, fileUrl} = await client.createFileUpload({ fileKey, contentType, diff --git a/source/cli.ts b/source/cli.ts index 7390544..d1f800e 100644 --- a/source/cli.ts +++ b/source/cli.ts @@ -10,6 +10,7 @@ import {register as registerPodcast} from './podcast/_cli.js'; import {register as registerSlides} from './slides/_cli.js'; import {register as registerSpeakers} from './speakers/_cli.js'; import {register as registerTts} from './tts/_cli.js'; +import {register as registerVideo} from './video/_cli.js'; const program = new Command(); program.name('listenhub').description('ListenHub CLI').version('0.1.0'); @@ -23,6 +24,7 @@ registerImage(program); registerMusic(program); registerLyrics(program); registerSpeakers(program); +registerVideo(program); registerCreation(program); program.parse(); diff --git a/source/video/_cli.ts b/source/video/_cli.ts new file mode 100644 index 0000000..7e4d8fb --- /dev/null +++ b/source/video/_cli.ts @@ -0,0 +1,100 @@ +import type {Command} from 'commander'; +import {getClient} from '../_shared/client.js'; +import {handleError} from '../_shared/output.js'; +import { + type VideoCreateOptions, + type VideoEstimateOptions, + type VideoListOptions, + createVideo, + estimateCredits, + getVideo, + listVideos, +} from './video.js'; + +function collect(value: string, previous: string[]): string[] { + return [...previous, value]; +} + +export function register(program: Command) { + const cmd = program.command('video').description('SeeDance video generation'); + + cmd + .command('create') + .description('Create a video generation task') + .requiredOption('--prompt ', 'Video description') + .option('--model ', 'Model: doubao-seedance-2-pro, doubao-seedance-2-fast') + .option('--resolution ', 'Resolution: 480p, 720p, 1080p') + .option('--ratio ', 'Aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9') + .option('--duration ', 'Video duration in seconds (4-15)', Number) + .option('--first-frame ', 'First frame image') + .option('--last-frame ', 'Last frame image (requires --first-frame)') + .option('--reference-image ', 'Reference image (repeatable, max 9)', collect, []) + .option('--reference-video ', 'Reference video (repeatable, max 3)', collect, []) + .option('--reference-audio ', 'Reference audio (repeatable, max 3)', collect, []) + .option( + '--input-video-duration ', + 'Reference video duration (2-15, required with --reference-video)', + Number, + ) + .option('--no-generate-audio', 'Disable audio generation') + .option('--seed ', 'Random seed (-1 to 4294967295)', Number) + .option('--no-wait', 'Return immediately without polling') + .option('--timeout ', 'Polling timeout', Number, 1200) + .option('-j, --json', 'Output JSON', false) + .action(async (options: VideoCreateOptions) => { + try { + const client = await getClient(); + await createVideo(client, options); + } catch (error) { + handleError(error, options.json); + } + }); + + cmd + .command('get ') + .description('Get video task details') + .option('-j, --json', 'Output JSON', false) + .action(async (taskId: string, options: {json: boolean}) => { + try { + const client = await getClient(); + await getVideo(client, taskId, options.json); + } catch (error) { + handleError(error, options.json); + } + }); + + cmd + .command('list') + .description('List video generation tasks') + .option('--page ', 'Page number', Number, 1) + .option('--page-size ', 'Items per page', Number, 20) + .option('--status ', 'Filter: pending, generating, uploading, success, failed') + .option('-j, --json', 'Output JSON', false) + .action(async (options: VideoListOptions) => { + try { + const client = await getClient(); + await listVideos(client, options); + } catch (error) { + handleError(error, options.json); + } + }); + + cmd + .command('estimate') + .description('Estimate credit cost') + .requiredOption('--model ', 'Model name') + .requiredOption('--resolution ', 'Resolution') + .requiredOption('--duration ', 'Duration (4-15)', Number) + .option('--ratio ', 'Aspect ratio', '16:9') + .option('--has-video-input', 'Has reference video input', false) + .option('--input-video-duration ', 'Reference video duration', Number) + .option('-j, --json', 'Output JSON', false) + .action(async (options: VideoEstimateOptions) => { + try { + const client = await getClient(); + await estimateCredits(client, options); + } catch (error) { + handleError(error, options.json); + } + }); +} diff --git a/source/video/video.ts b/source/video/video.ts new file mode 100644 index 0000000..263bc0b --- /dev/null +++ b/source/video/video.ts @@ -0,0 +1,312 @@ +import path from 'node:path'; +import type { + CreateVideoGenerationParams, + EstimateVideoGenerationCreditsParams, + ListenHubClient, + VideoContentItem, + VideoGenerationModel, + VideoGenerationRatio, + VideoGenerationResolution, + VideoGenerationTaskStatus, +} from '@marswave/listenhub-sdk'; +import {printDetail, printJson, printTable} from '../_shared/output.js'; +import {pollVideoTaskUntilDone} from '../_shared/polling.js'; +import {resolveFileOrUrl} from '../_shared/upload.js'; + +export type VideoCreateOptions = { + prompt: string; + model?: string; + resolution?: string; + ratio?: string; + duration?: number; + firstFrame?: string; + lastFrame?: string; + referenceImage: string[]; + referenceVideo: string[]; + referenceAudio: string[]; + inputVideoDuration?: number; + generateAudio: boolean; + seed?: number; + wait: boolean; + timeout: number; + json: boolean; +}; + +export type VideoListOptions = { + page: number; + pageSize: number; + status?: string; + json: boolean; +}; + +export type VideoEstimateOptions = { + model: string; + resolution: string; + duration: number; + ratio: string; + hasVideoInput: boolean; + inputVideoDuration?: number; + json: boolean; +}; + +const allowedVideoAudioExtensions = new Set(['.mp3', '.wav']); +const allowedVideoExtensions = new Set(['.mp4', '.mov']); + +function validateCreateOptions(options: VideoCreateOptions): void { + if (options.duration !== undefined && (options.duration < 4 || options.duration > 15)) { + throw new Error('Duration must be between 4 and 15 seconds'); + } + + if (options.seed !== undefined && (options.seed < -1 || options.seed > 4_294_967_295)) { + throw new Error('Seed must be between -1 and 4294967295'); + } + + if ( + options.resolution === '1080p' && + options.model && + options.model !== 'doubao-seedance-2-pro' + ) { + throw new Error('1080p resolution requires --model doubao-seedance-2-pro'); + } + + if (options.lastFrame && !options.firstFrame) { + throw new Error('--last-frame requires --first-frame'); + } + + const hasFrameMode = Boolean(options.firstFrame || options.lastFrame); + const hasReferenceMode = + options.referenceImage.length > 0 || + options.referenceVideo.length > 0 || + options.referenceAudio.length > 0; + + if (hasFrameMode && hasReferenceMode) { + throw new Error( + 'Cannot mix frame mode (--first-frame/--last-frame) with reference mode (--reference-image/--reference-video/--reference-audio)', + ); + } + + if (options.referenceVideo.length > 0 && options.inputVideoDuration === undefined) { + throw new Error('--input-video-duration is required when using --reference-video'); + } + + if (options.inputVideoDuration !== undefined && options.referenceVideo.length === 0) { + throw new Error('--input-video-duration requires --reference-video'); + } + + if ( + options.inputVideoDuration !== undefined && + (options.inputVideoDuration < 2 || options.inputVideoDuration > 15) + ) { + throw new Error('Input video duration must be between 2 and 15 seconds'); + } + + if ( + options.referenceAudio.length > 0 && + options.referenceImage.length === 0 && + options.referenceVideo.length === 0 + ) { + throw new Error('--reference-audio requires --reference-image or --reference-video'); + } + + if (options.referenceImage.length > 9) { + throw new Error('Too many reference images (max 9)'); + } + + if (options.referenceVideo.length > 3) { + throw new Error('Too many reference videos (max 3)'); + } + + if (options.referenceAudio.length > 3) { + throw new Error('Too many reference audios (max 3)'); + } + + for (const file of options.referenceAudio) { + if (!file.startsWith('http://') && !file.startsWith('https://')) { + const ext = path.extname(file).toLowerCase(); + if (!allowedVideoAudioExtensions.has(ext)) { + throw new Error('Reference audio must be .mp3 or .wav'); + } + } + } + + for (const file of options.referenceVideo) { + if (!file.startsWith('http://') && !file.startsWith('https://')) { + const ext = path.extname(file).toLowerCase(); + if (!allowedVideoExtensions.has(ext)) { + throw new Error('Reference video must be .mp4 or .mov'); + } + } + } +} + +export async function createVideo( + client: ListenHubClient, + options: VideoCreateOptions, +): Promise { + validateCreateOptions(options); + + const content: VideoContentItem[] = [{type: 'text', text: options.prompt}]; + + if (options.firstFrame) { + const url = await resolveFileOrUrl(client, options.firstFrame, { + accept: 'image', + category: 'episode', + }); + content.push({type: 'image_url', image_url: {url}, role: 'first_frame'}); + } + + if (options.lastFrame) { + const url = await resolveFileOrUrl(client, options.lastFrame, { + accept: 'image', + category: 'episode', + }); + content.push({type: 'image_url', image_url: {url}, role: 'last_frame'}); + } + + for (const ref of options.referenceImage) { + const url = await resolveFileOrUrl(client, ref, {accept: 'image', category: 'episode'}); // eslint-disable-line no-await-in-loop + content.push({type: 'image_url', image_url: {url}, role: 'reference_image'}); + } + + for (const ref of options.referenceVideo) { + const url = await resolveFileOrUrl(client, ref, {accept: 'video', category: 'episode'}); // eslint-disable-line no-await-in-loop + content.push({type: 'video_url', video_url: {url}, role: 'reference_video'}); + } + + for (const ref of options.referenceAudio) { + const url = await resolveFileOrUrl(client, ref, {accept: 'audio', category: 'episode'}); // eslint-disable-line no-await-in-loop + content.push({type: 'audio_url', audio_url: {url}, role: 'reference_audio'}); + } + + const params: CreateVideoGenerationParams = { + content, + ...(options.model && {model: options.model as VideoGenerationModel}), + ...(options.resolution && {resolution: options.resolution as VideoGenerationResolution}), + ...(options.ratio && {ratio: options.ratio as VideoGenerationRatio}), + ...(options.duration !== undefined && {duration: options.duration}), + ...(!options.generateAudio && {generateAudio: false}), + ...(options.seed !== undefined && {seed: options.seed}), + ...(options.inputVideoDuration !== undefined && { + inputVideoDuration: options.inputVideoDuration, + }), + }; + + const {taskId} = await client.createVideoGeneration(params); + + if (!options.wait) { + if (options.json) { + printJson({taskId}); + } else { + console.log(`✓ Video task submitted: ${taskId}`); + } + + return; + } + + const task = await pollVideoTaskUntilDone(client, taskId, { + timeout: options.timeout, + json: options.json, + }); + + if (options.json) { + printJson(task); + } else { + printDetail('Video created', [ + ['ID:', task.id], + ['Video:', task.videoUrl], + ['Duration:', task.duration ? `${String(task.duration)}s` : undefined], + ['Resolution:', task.resolution], + ['Ratio:', task.ratio], + ['Seed:', task.seed], + ['Credits:', task.creditCharged], + ]); + } +} + +export async function getVideo( + client: ListenHubClient, + taskId: string, + json: boolean, +): Promise { + const task = await client.getVideoGenerationTask(taskId); + + if (json) { + printJson(task); + return; + } + + printDetail('Video task details', [ + ['ID:', task.id], + ['Status:', task.status], + ['Model:', task.model], + ['Video:', task.videoUrl], + ['Duration:', task.duration ? `${String(task.duration)}s` : undefined], + ['Resolution:', task.resolution], + ['Ratio:', task.ratio], + ['Seed:', task.seed], + ['Credits:', task.creditCharged], + ['Created:', new Date(task.createdAt).toISOString()], + ]); +} + +export async function listVideos( + client: ListenHubClient, + options: VideoListOptions, +): Promise { + const {items} = await client.listVideoGenerationTasks({ + page: options.page, + pageSize: options.pageSize, + ...(options.status && {status: options.status as VideoGenerationTaskStatus}), + }); + + if (options.json) { + printJson(items); + return; + } + + const headers = ['ID', 'Model', 'Status', 'Duration', 'Created']; + const rows = items.map((item) => [ + item.id, + item.model, + item.status, + item.params.duration ? `${String(item.params.duration)}s` : '-', + new Date(item.createdAt).toISOString().slice(0, 10), + ]); + printTable(headers, rows); +} + +export async function estimateCredits( + client: ListenHubClient, + options: VideoEstimateOptions, +): Promise { + if (options.hasVideoInput && options.inputVideoDuration === undefined) { + throw new Error('--input-video-duration is required when using --has-video-input'); + } + + if (!options.hasVideoInput && options.inputVideoDuration !== undefined) { + throw new Error('--input-video-duration requires --has-video-input'); + } + + const params: EstimateVideoGenerationCreditsParams = { + model: options.model as VideoGenerationModel, + resolution: options.resolution as VideoGenerationResolution, + duration: options.duration, + ...(options.ratio && {ratio: options.ratio as VideoGenerationRatio}), + ...(options.hasVideoInput && {hasVideoInput: true}), + ...(options.inputVideoDuration !== undefined && { + inputVideoDuration: options.inputVideoDuration, + }), + }; + + const result = await client.estimateVideoGenerationCredits(params); + + if (options.json) { + printJson(result); + return; + } + + printDetail('Credit estimate', [ + ['Tokens:', result.tokens], + ['Credits:', result.credits], + ]); +} From 0a703e74cf7d546dd3b072f868cb3dd7a5a85ad7 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 13:01:39 +0800 Subject: [PATCH 08/11] style: format spec and plan docs --- docs/plans/listenhub-cli--127-plan.md | 148 +++++++++++++----------- docs/specs/listenhub-cli--127-design.md | 124 ++++++++++---------- 2 files changed, 142 insertions(+), 130 deletions(-) diff --git a/docs/plans/listenhub-cli--127-plan.md b/docs/plans/listenhub-cli--127-plan.md index 2e2ec85..457df28 100644 --- a/docs/plans/listenhub-cli--127-plan.md +++ b/docs/plans/listenhub-cli--127-plan.md @@ -8,6 +8,7 @@ ### Step 1: 升级 SDK + 扩展 upload 工具 **文件:`package.json`** + - `@marswave/listenhub-sdk` 从 `^0.0.4` 改为 `^0.0.6` - `"version"` 从 `"0.0.4"` 升为 `"0.0.5"`(新增功能,minor bump) - 运行 `pnpm install` 更新 lockfile @@ -15,11 +16,13 @@ **文件:`source/_shared/upload.ts`** 1. 新增 `video` accept type: + ```ts type FileAcceptType = 'audio' | 'image' | 'video'; ``` 2. 新增视频相关常量(SeeDance 仅支持 mp4/mov,单文件 < 50MB): + ```ts const videoExtensions = new Set(['.mp4', '.mov']); // maxSizeBytes @@ -35,19 +38,19 @@ 在 `resolveFileOrUrl` 调用时,video 命令对 audio 类型传 `{ accept: 'audio', category: 'episode' }` —— 但需新增一个 `videoAudioExtensions` 集合做额外校验(或在 video.ts 校验层先过滤后缀), 避免用户传 `.flac`/`.ogg` 等 CLI 层面放行但 provider 拒绝的格式。 - + 实现方式:在 `video.ts` 的 `validateCreateOptions` 中检查 `--reference-audio` 文件后缀, 不在 `['.mp3', '.wav']` 内的直接报错:`Reference audio must be .mp3 or .wav`。 -3. `allowedExtensions` 函数扩展 video 分支。 +4. `allowedExtensions` 函数扩展 video 分支。 -4. `resolveFileOrUrl` 签名增加可选 `category` override: +5. `resolveFileOrUrl` 签名增加可选 `category` override: ```ts export async function resolveFileOrUrl( - client: ListenHubClient, - input: string, - options: { accept: FileAcceptType; category?: string }, - ): Promise + client: ListenHubClient, + input: string, + options: {accept: FileAcceptType; category?: string}, + ): Promise; ``` 内部 `const category = options.category ?? categoryForType[options.accept];` @@ -60,36 +63,36 @@ 在文件末尾新增 `pollVideoTaskUntilDone`: ```ts -import type { VideoGenerationTaskDetail } from '@marswave/listenhub-sdk'; +import type {VideoGenerationTaskDetail} from '@marswave/listenhub-sdk'; export async function pollVideoTaskUntilDone( - client: ListenHubClient, - taskId: string, - options: { timeout?: number; json?: boolean }, + client: ListenHubClient, + taskId: string, + options: {timeout?: number; json?: boolean}, ): Promise { - const timeoutS = options.timeout ?? 1200; - const maxAttempts = Math.ceil(timeoutS / (pollIntervalMs / 1000)); - const spinner = options.json - ? undefined - : ora({ text: `Generating video... (1/${maxAttempts})` }).start(); - - for (let i = 0; i < maxAttempts; i++) { - if (i > 0) await sleep(pollIntervalMs); - const task = await client.getVideoGenerationTask(taskId); - if (task.status === 'success') { - spinner?.succeed('Video created successfully'); - return task; - } - if (task.status === 'failed') { - spinner?.fail('Video creation failed'); - throw new Error('Video creation failed'); - } - if (spinner) { - spinner.text = `Generating video... (${String(i + 2)}/${maxAttempts})`; - } - } - spinner?.fail('Timed out'); - throw new CliTimeoutError(`Timed out after ${timeoutS}s`); + const timeoutS = options.timeout ?? 1200; + const maxAttempts = Math.ceil(timeoutS / (pollIntervalMs / 1000)); + const spinner = options.json + ? undefined + : ora({text: `Generating video... (1/${maxAttempts})`}).start(); + + for (let i = 0; i < maxAttempts; i++) { + if (i > 0) await sleep(pollIntervalMs); + const task = await client.getVideoGenerationTask(taskId); + if (task.status === 'success') { + spinner?.succeed('Video created successfully'); + return task; + } + if (task.status === 'failed') { + spinner?.fail('Video creation failed'); + throw new Error('Video creation failed'); + } + if (spinner) { + spinner.text = `Generating video... (${String(i + 2)}/${maxAttempts})`; + } + } + spinner?.fail('Timed out'); + throw new CliTimeoutError(`Timed out after ${timeoutS}s`); } ``` @@ -105,22 +108,22 @@ export async function pollVideoTaskUntilDone( ```ts export type VideoCreateOptions = { - prompt: string; - model?: string; - resolution?: string; - ratio?: string; - duration?: number; - firstFrame?: string; - lastFrame?: string; - referenceImage: string[]; - referenceVideo: string[]; - referenceAudio: string[]; - inputVideoDuration?: number; - generateAudio: boolean; // Commander --no-generate-audio 会反转为 generateAudio: false - seed?: number; - wait: boolean; - timeout: number; - json: boolean; + prompt: string; + model?: string; + resolution?: string; + ratio?: string; + duration?: number; + firstFrame?: string; + lastFrame?: string; + referenceImage: string[]; + referenceVideo: string[]; + referenceAudio: string[]; + inputVideoDuration?: number; + generateAudio: boolean; // Commander --no-generate-audio 会反转为 generateAudio: false + seed?: number; + wait: boolean; + timeout: number; + json: boolean; }; ``` @@ -133,6 +136,7 @@ export type VideoCreateOptions = { - `--reference-video` 文件后缀不在 `.mp4`/`.mov` 内 → 报错 `Reference video must be .mp4 or .mov` 2. **构建 content 数组:** + ```ts const content: VideoContentItem[] = []; // prompt → { type: 'text', text: options.prompt } @@ -147,16 +151,19 @@ export type VideoCreateOptions = { ``` 3. **构建请求参数:** 只传用户显式指定的字段。 + ```ts const params: CreateVideoGenerationParams = { - content, - ...(options.model && { model: options.model }), - ...(options.resolution && { resolution: options.resolution }), - ...(options.ratio && { ratio: options.ratio }), - ...(options.duration !== undefined && { duration: options.duration }), - ...(!options.generateAudio && { generateAudio: false }), - ...(options.seed !== undefined && { seed: options.seed }), - ...(options.inputVideoDuration !== undefined && { inputVideoDuration: options.inputVideoDuration }), + content, + ...(options.model && {model: options.model}), + ...(options.resolution && {resolution: options.resolution}), + ...(options.ratio && {ratio: options.ratio}), + ...(options.duration !== undefined && {duration: options.duration}), + ...(!options.generateAudio && {generateAudio: false}), + ...(options.seed !== undefined && {seed: options.seed}), + ...(options.inputVideoDuration !== undefined && { + inputVideoDuration: options.inputVideoDuration, + }), }; ``` @@ -241,9 +248,9 @@ export function register(program: Command) { **文件:`source/cli.ts`** ```ts -import { register as registerVideo } from './video/_cli.js'; +import {register as registerVideo} from './video/_cli.js'; // ... -registerVideo(program); // 放在 registerCreation 之前 +registerVideo(program); // 放在 registerCreation 之前 ``` --- @@ -253,6 +260,7 @@ registerVideo(program); // 放在 registerCreation 之前 **文件:`README.md`** 1. Commands 表新增 Video 部分: + ``` ### Video Generation @@ -307,16 +315,16 @@ node dist/cli.js video estimate --help ## 文件清单 -| 文件 | 操作 | 行数估算 | -|------|------|----------| -| `package.json` | 修改 | ~1 行 | -| `source/_shared/upload.ts` | 修改 | +15 行 | -| `source/_shared/polling.ts` | 修改 | +30 行 | -| `source/video/video.ts` | 新增 | ~200 行 | -| `source/video/_cli.ts` | 新增 | ~90 行 | -| `source/cli.ts` | 修改 | +2 行 | -| `README.md` | 修改 | +25 行 | -| `README.zh-CN.md` | 修改 | +25 行 | +| 文件 | 操作 | 行数估算 | +| --------------------------- | ---- | -------- | +| `package.json` | 修改 | ~1 行 | +| `source/_shared/upload.ts` | 修改 | +15 行 | +| `source/_shared/polling.ts` | 修改 | +30 行 | +| `source/video/video.ts` | 新增 | ~200 行 | +| `source/video/_cli.ts` | 新增 | ~90 行 | +| `source/cli.ts` | 修改 | +2 行 | +| `README.md` | 修改 | +25 行 | +| `README.zh-CN.md` | 修改 | +25 行 | 总新增约 340 行代码。 diff --git a/docs/specs/listenhub-cli--127-design.md b/docs/specs/listenhub-cli--127-design.md index 605a079..ba3e13e 100644 --- a/docs/specs/listenhub-cli--127-design.md +++ b/docs/specs/listenhub-cli--127-design.md @@ -25,24 +25,24 @@ source/video/video.ts — 业务逻辑 创建视频生成任务。 -| 参数 | 类型 | 必填 | 默认值 | 说明 | -|------|------|------|--------|------| -| `--prompt ` | string | 是 | — | 视频描述文本 | -| `--model ` | string | 否 | 不传(服务端默认 `doubao-seedance-2-fast`) | 模型:`doubao-seedance-2-pro` / `doubao-seedance-2-fast` | -| `--resolution ` | string | 否 | 不传(服务端默认 `720p`) | 分辨率:`480p` / `720p` / `1080p`(注意:`1080p` 仅 pro 模型支持) | -| `--ratio ` | string | 否 | 不传(服务端默认 `16:9`) | 画面比例:`16:9` / `4:3` / `1:1` / `3:4` / `9:16` / `21:9` | -| `--duration ` | number | 否 | — | 视频时长,范围 4–15 秒 | -| `--first-frame ` | string | 否 | — | 首帧图片,本地文件或平台资产 URL | -| `--last-frame ` | string | 否 | — | 末帧图片(必须同时指定 `--first-frame`) | -| `--reference-image ` | string | 否 | — | 参考图(可重复,最多 9 张),本地文件或平台资产 URL | -| `--reference-video ` | string | 否 | — | 参考视频(可重复,最多 3 个),本地文件或平台资产 URL | -| `--reference-audio ` | string | 否 | — | 参考音频(可重复,最多 3 个),本地文件或平台资产 URL | -| `--input-video-duration ` | number | 否 | — | 参考视频时长,范围 2–15 秒;使用 `--reference-video` 时**必填** | -| `--no-generate-audio` | boolean | — | — | 禁用音轨生成(服务端默认生成音轨) | -| `--seed ` | number | 否 | — | 随机种子,范围 -1 到 4294967295 | -| `--no-wait` | boolean | — | — | 提交后立即返回,不轮询 | -| `--timeout ` | number | 否 | `1200` | 轮询超时 | -| `-j, --json` | boolean | — | — | JSON 输出 | +| 参数 | 类型 | 必填 | 默认值 | 说明 | +| ---------------------------------- | ------- | ---- | ------------------------------------------- | ------------------------------------------------------------------ | +| `--prompt ` | string | 是 | — | 视频描述文本 | +| `--model ` | string | 否 | 不传(服务端默认 `doubao-seedance-2-fast`) | 模型:`doubao-seedance-2-pro` / `doubao-seedance-2-fast` | +| `--resolution ` | string | 否 | 不传(服务端默认 `720p`) | 分辨率:`480p` / `720p` / `1080p`(注意:`1080p` 仅 pro 模型支持) | +| `--ratio ` | string | 否 | 不传(服务端默认 `16:9`) | 画面比例:`16:9` / `4:3` / `1:1` / `3:4` / `9:16` / `21:9` | +| `--duration ` | number | 否 | — | 视频时长,范围 4–15 秒 | +| `--first-frame ` | string | 否 | — | 首帧图片,本地文件或平台资产 URL | +| `--last-frame ` | string | 否 | — | 末帧图片(必须同时指定 `--first-frame`) | +| `--reference-image ` | string | 否 | — | 参考图(可重复,最多 9 张),本地文件或平台资产 URL | +| `--reference-video ` | string | 否 | — | 参考视频(可重复,最多 3 个),本地文件或平台资产 URL | +| `--reference-audio ` | string | 否 | — | 参考音频(可重复,最多 3 个),本地文件或平台资产 URL | +| `--input-video-duration ` | number | 否 | — | 参考视频时长,范围 2–15 秒;使用 `--reference-video` 时**必填** | +| `--no-generate-audio` | boolean | — | — | 禁用音轨生成(服务端默认生成音轨) | +| `--seed ` | number | 否 | — | 随机种子,范围 -1 到 4294967295 | +| `--no-wait` | boolean | — | — | 提交后立即返回,不轮询 | +| `--timeout ` | number | 否 | `1200` | 轮询超时 | +| `-j, --json` | boolean | — | — | JSON 输出 | **输入模式互斥规则(CLI 端校验,不满足直接报错退出):** @@ -54,6 +54,7 @@ source/video/video.ts — 业务逻辑 **URL 约束:** 所有 `` 参数仅接受本地文件路径或 ListenHub 平台资产 URL(GCS bucket / CDN)。外部 URL(如 `https://example.com/v.mp4`)会被后端拒绝。 **行为:** + 1. 校验输入模式互斥规则和参数范围。 2. 解析 `--prompt` 为 `VideoContentText`。 3. 依据素材参数构建 `content[]`,本地文件通过 `resolveFileOrUrl` 上传后取平台 URL。 @@ -68,62 +69,64 @@ source/video/video.ts — 业务逻辑 获取单个任务详情。 -| 参数 | 说明 | -|------|------| -| `taskId` | 位置参数 | +| 参数 | 说明 | +| ------------ | --------- | +| `taskId` | 位置参数 | | `-j, --json` | JSON 输出 | ### `listenhub video list` 列出视频生成任务。 -| 参数 | 默认 | 说明 | -|------|------|------| -| `--page ` | 1 | 页码 | -| `--page-size ` | 20 | 每页条数 | -| `--status ` | — | 可选筛选:`pending` / `generating` / `uploading` / `success` / `failed` | -| `-j, --json` | — | JSON 输出 | +| 参数 | 默认 | 说明 | +| ------------------- | ---- | ----------------------------------------------------------------------- | +| `--page ` | 1 | 页码 | +| `--page-size ` | 20 | 每页条数 | +| `--status ` | — | 可选筛选:`pending` / `generating` / `uploading` / `success` / `failed` | +| `-j, --json` | — | JSON 输出 | ### `listenhub video estimate` 预估积分消耗。 -| 参数 | 必填 | 默认 | 说明 | -|------|------|------|------| -| `--model ` | 是 | — | 模型 | -| `--resolution ` | 是 | — | 分辨率 | -| `--duration ` | 是 | — | 时长(4–15) | -| `--ratio ` | 否 | `16:9` | 比例 | -| `--has-video-input` | 否 | `false` | 是否有参考视频 | -| `--input-video-duration ` | 否 | — | 参考视频时长(2–15,`--has-video-input` 时必填) | -| `-j, --json` | — | — | JSON 输出 | +| 参数 | 必填 | 默认 | 说明 | +| ---------------------------- | ---- | ------- | ------------------------------------------------ | +| `--model ` | 是 | — | 模型 | +| `--resolution ` | 是 | — | 分辨率 | +| `--duration ` | 是 | — | 时长(4–15) | +| `--ratio ` | 否 | `16:9` | 比例 | +| `--has-video-input` | 否 | `false` | 是否有参考视频 | +| `--input-video-duration ` | 否 | — | 参考视频时长(2–15,`--has-video-input` 时必填) | +| `-j, --json` | — | — | JSON 输出 | ## 改动点 -| 文件 | 改动 | -|------|------| -| `source/video/_cli.ts` | 新增 — Commander 命令注册 | -| `source/video/video.ts` | 新增 — create / get / list / estimate 逻辑 + 输入校验 | -| `source/cli.ts` | 添加 `registerVideo` | -| `source/_shared/polling.ts` | 新增 `pollVideoTaskUntilDone` | -| `source/_shared/upload.ts` | 扩展支持 `video` 类型(`.mp4`/`.mov`/`.webm`,上限 100MB) | -| `package.json` | 升级 `@marswave/listenhub-sdk` 到 `^0.0.6` | -| `README.md` | 添加 `video` 命令说明与示例 | +| 文件 | 改动 | +| --------------------------- | ---------------------------------------------------------- | +| `source/video/_cli.ts` | 新增 — Commander 命令注册 | +| `source/video/video.ts` | 新增 — create / get / list / estimate 逻辑 + 输入校验 | +| `source/cli.ts` | 添加 `registerVideo` | +| `source/_shared/polling.ts` | 新增 `pollVideoTaskUntilDone` | +| `source/_shared/upload.ts` | 扩展支持 `video` 类型(`.mp4`/`.mov`/`.webm`,上限 100MB) | +| `package.json` | 升级 `@marswave/listenhub-sdk` 到 `^0.0.6` | +| `README.md` | 添加 `video` 命令说明与示例 | ## 上传扩展 `resolveFileOrUrl` 签名扩展为支持 category override: ```ts -resolveFileOrUrl(client, input, { accept: 'video', category: 'episode' }) +resolveFileOrUrl(client, input, {accept: 'video', category: 'episode'}); ``` **新增 `video` 文件类型:** + - 允许后缀:`.mp4`、`.mov` - 最大体积:50 MB - MIME:`video/mp4`、`video/quicktime` **video 命令中音频素材限制:** + - 允许后缀:`.mp3`、`.wav`(SeeDance 支持范围,不含 `.flac`/`.ogg` 等) - 最大体积:15 MB - 在 `video.ts` 校验层额外检查后缀,不合格直接报错 @@ -135,6 +138,7 @@ resolveFileOrUrl(client, input, { accept: 'video', category: 'episode' }) ## 轮询策略 视频生成较慢,采用: + - 间隔 10s(同现有全局 `pollIntervalMs`) - 默认超时 1200s(20 分钟) - 终态:`success` / `failed` @@ -143,20 +147,20 @@ resolveFileOrUrl(client, input, { accept: 'video', category: 'episode' }) 在调用 SDK 之前,CLI 需拦截以下非法输入并给出明确错误提示: -| 规则 | 错误信息 | -|------|----------| -| `--duration` 不在 4–15 | `Duration must be between 4 and 15 seconds` | -| `--seed` 不在 -1 到 4294967295 | `Seed must be between -1 and 4294967295` | -| `--resolution 1080p` + model 非 pro | `1080p resolution requires --model doubao-seedance-2-pro` | -| `--reference-video` 存在但缺 `--input-video-duration` | `--input-video-duration is required when using --reference-video` | -| `--input-video-duration` 存在但无 `--reference-video` | `--input-video-duration requires --reference-video` | -| `--input-video-duration` 不在 2–15 | `Input video duration must be between 2 and 15 seconds` | -| `--last-frame` 无 `--first-frame` | `--last-frame requires --first-frame` | -| 帧控制 + 参考混用 | `Cannot mix frame mode (--first-frame/--last-frame) with reference mode (--reference-image/--reference-video/--reference-audio)` | -| `--reference-audio` 无 image/video 素材 | `--reference-audio requires --reference-image or --reference-video` | -| `--reference-image` 超过 9 | `Too many reference images (max 9)` | -| `--reference-video` 超过 3 | `Too many reference videos (max 3)` | -| `--reference-audio` 超过 3 | `Too many reference audios (max 3)` | +| 规则 | 错误信息 | +| ----------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | +| `--duration` 不在 4–15 | `Duration must be between 4 and 15 seconds` | +| `--seed` 不在 -1 到 4294967295 | `Seed must be between -1 and 4294967295` | +| `--resolution 1080p` + model 非 pro | `1080p resolution requires --model doubao-seedance-2-pro` | +| `--reference-video` 存在但缺 `--input-video-duration` | `--input-video-duration is required when using --reference-video` | +| `--input-video-duration` 存在但无 `--reference-video` | `--input-video-duration requires --reference-video` | +| `--input-video-duration` 不在 2–15 | `Input video duration must be between 2 and 15 seconds` | +| `--last-frame` 无 `--first-frame` | `--last-frame requires --first-frame` | +| 帧控制 + 参考混用 | `Cannot mix frame mode (--first-frame/--last-frame) with reference mode (--reference-image/--reference-video/--reference-audio)` | +| `--reference-audio` 无 image/video 素材 | `--reference-audio requires --reference-image or --reference-video` | +| `--reference-image` 超过 9 | `Too many reference images (max 9)` | +| `--reference-video` 超过 3 | `Too many reference videos (max 3)` | +| `--reference-audio` 超过 3 | `Too many reference audios (max 3)` | ## 错误处理 From ea019650bb097f8d7c6f54c8d42b148fd30fcf6d Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 13:44:06 +0800 Subject: [PATCH 09/11] feat: auto-detect video duration from local mp4/mov files When --reference-video points to a local file and --input-video-duration is omitted, the CLI now reads the mvhd atom to extract the duration automatically. URLs still require explicit --input-video-duration. --- source/_shared/mp4-duration.ts | 91 ++++++++++++++++++++++++++++++++++ source/video/video.ts | 18 ++++++- 2 files changed, 108 insertions(+), 1 deletion(-) create mode 100644 source/_shared/mp4-duration.ts diff --git a/source/_shared/mp4-duration.ts b/source/_shared/mp4-duration.ts new file mode 100644 index 0000000..2ac6b16 --- /dev/null +++ b/source/_shared/mp4-duration.ts @@ -0,0 +1,91 @@ +import {open} from 'node:fs/promises'; + +export async function getMp4Duration(filePath: string): Promise { + const file = await open(filePath, 'r'); + try { + const moovOffset = await findAtom(file, 'moov', 0, await fileSize(file)); + if (moovOffset === undefined) { + throw new Error(`Cannot read video duration: moov atom not found in ${filePath}`); + } + + const moovHeader = await readAtomHeader(file, moovOffset); + const moovEnd = moovOffset + moovHeader.size; + const mvhdOffset = await findAtom(file, 'mvhd', moovOffset + 8, moovEnd); + if (mvhdOffset === undefined) { + throw new Error(`Cannot read video duration: mvhd atom not found in ${filePath}`); + } + + const dataOffset = mvhdOffset + 8; + + const versionBuf = Buffer.alloc(1); + await file.read(versionBuf, 0, 1, dataOffset); + const version = versionBuf[0]!; + + let timescale: number; + let duration: bigint; + + if (version === 0) { + const buf = Buffer.alloc(8); + await file.read(buf, 0, 8, dataOffset + 4 + 8); + timescale = buf.readUInt32BE(0); + duration = BigInt(buf.readUInt32BE(4)); + } else if (version === 1) { + const buf = Buffer.alloc(12); + await file.read(buf, 0, 12, dataOffset + 4 + 16); + timescale = buf.readUInt32BE(0); + duration = buf.readBigUInt64BE(4); + } else { + throw new Error(`Cannot read video duration: unsupported mvhd version ${String(version)}`); + } + + if (timescale === 0) { + throw new Error(`Cannot read video duration: timescale is 0`); + } + + return Math.round(Number(duration) / timescale); + } finally { + await file.close(); + } +} + +interface AtomHeader { + size: number; + type: string; +} + +async function readAtomHeader( + file: Awaited>, + offset: number, +): Promise { + const buf = Buffer.alloc(8); + const {bytesRead} = await file.read(buf, 0, 8, offset); + if (bytesRead < 8) { + return {size: 0, type: ''}; + } + + const size = buf.readUInt32BE(0); + const type = buf.toString('ascii', 4, 8); + return {size, type}; +} + +async function findAtom( + file: Awaited>, + target: string, + start: number, + end: number, +): Promise { + let offset = start; + while (offset < end) { + const header = await readAtomHeader(file, offset); // eslint-disable-line no-await-in-loop + if (header.size === 0) break; + if (header.type === target) return offset; + offset += header.size; + } + + return undefined; +} + +async function fileSize(file: Awaited>): Promise { + const stat = await file.stat(); + return stat.size; +} diff --git a/source/video/video.ts b/source/video/video.ts index 263bc0b..e7d4546 100644 --- a/source/video/video.ts +++ b/source/video/video.ts @@ -9,6 +9,7 @@ import type { VideoGenerationResolution, VideoGenerationTaskStatus, } from '@marswave/listenhub-sdk'; +import {getMp4Duration} from '../_shared/mp4-duration.js'; import {printDetail, printJson, printTable} from '../_shared/output.js'; import {pollVideoTaskUntilDone} from '../_shared/polling.js'; import {resolveFileOrUrl} from '../_shared/upload.js'; @@ -86,7 +87,12 @@ function validateCreateOptions(options: VideoCreateOptions): void { } if (options.referenceVideo.length > 0 && options.inputVideoDuration === undefined) { - throw new Error('--input-video-duration is required when using --reference-video'); + const hasLocalVideo = options.referenceVideo.some( + (v) => !v.startsWith('http://') && !v.startsWith('https://'), + ); + if (!hasLocalVideo) { + throw new Error('--input-video-duration is required when using --reference-video with URLs'); + } } if (options.inputVideoDuration !== undefined && options.referenceVideo.length === 0) { @@ -143,6 +149,16 @@ export async function createVideo( client: ListenHubClient, options: VideoCreateOptions, ): Promise { + if (options.referenceVideo.length > 0 && options.inputVideoDuration === undefined) { + const localVideo = options.referenceVideo.find( + (v) => !v.startsWith('http://') && !v.startsWith('https://'), + ); + if (localVideo) { + const filePath = path.resolve(localVideo.trim()); + options.inputVideoDuration = await getMp4Duration(filePath); + } + } + validateCreateOptions(options); const content: VideoContentItem[] = [{type: 'text', text: options.prompt}]; From f55bc9e389bdad5fe6448a999f09bf3ea8348287 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 14:58:46 +0800 Subject: [PATCH 10/11] fix: require --input-video-duration for videos longer than 15s When auto-detected duration exceeds the 2-15s API range, prompt the user to specify how much of the reference video to use instead of silently passing an invalid value. --- source/video/video.ts | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/source/video/video.ts b/source/video/video.ts index e7d4546..0f89034 100644 --- a/source/video/video.ts +++ b/source/video/video.ts @@ -155,7 +155,14 @@ export async function createVideo( ); if (localVideo) { const filePath = path.resolve(localVideo.trim()); - options.inputVideoDuration = await getMp4Duration(filePath); + const detected = await getMp4Duration(filePath); + if (detected >= 2 && detected <= 15) { + options.inputVideoDuration = detected; + } else { + throw new Error( + `Reference video is ${String(detected)}s long; --input-video-duration (2-15) is required to specify how much to use`, + ); + } } } From fbe183b4799e06119337b4b847ce9f2d57368214 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Mon, 18 May 2026 16:05:43 +0800 Subject: [PATCH 11/11] chore: remove spec and plan docs from PR These documents served their purpose during planning and are not needed in the final deliverable. --- docs/plans/listenhub-cli--127-plan.md | 335 ------------------------ docs/specs/listenhub-cli--127-design.md | 184 ------------- 2 files changed, 519 deletions(-) delete mode 100644 docs/plans/listenhub-cli--127-plan.md delete mode 100644 docs/specs/listenhub-cli--127-design.md diff --git a/docs/plans/listenhub-cli--127-plan.md b/docs/plans/listenhub-cli--127-plan.md deleted file mode 100644 index 457df28..0000000 --- a/docs/plans/listenhub-cli--127-plan.md +++ /dev/null @@ -1,335 +0,0 @@ -# Plan: CLI 支持 SeeDance2.0 视频生成 - -> Issue: marswaveai/listenhub-ralph#127 -> Spec: docs/specs/listenhub-cli--127-design.md - -## 实现步骤 - -### Step 1: 升级 SDK + 扩展 upload 工具 - -**文件:`package.json`** - -- `@marswave/listenhub-sdk` 从 `^0.0.4` 改为 `^0.0.6` -- `"version"` 从 `"0.0.4"` 升为 `"0.0.5"`(新增功能,minor bump) -- 运行 `pnpm install` 更新 lockfile - -**文件:`source/_shared/upload.ts`** - -1. 新增 `video` accept type: - - ```ts - type FileAcceptType = 'audio' | 'image' | 'video'; - ``` - -2. 新增视频相关常量(SeeDance 仅支持 mp4/mov,单文件 < 50MB): - - ```ts - const videoExtensions = new Set(['.mp4', '.mov']); - // maxSizeBytes - video: 50 * 1024 * 1024, - // categoryForType - video: 'episode', - // mimeTypes - ['.mp4', 'video/mp4'], - ['.mov', 'video/quicktime'], - ``` - -3. video 命令中音频素材限制为 `mp3/wav`(SeeDance 支持范围),单文件 < 15MB。 - 在 `resolveFileOrUrl` 调用时,video 命令对 audio 类型传 `{ accept: 'audio', category: 'episode' }` —— - 但需新增一个 `videoAudioExtensions` 集合做额外校验(或在 video.ts 校验层先过滤后缀), - 避免用户传 `.flac`/`.ogg` 等 CLI 层面放行但 provider 拒绝的格式。 - - 实现方式:在 `video.ts` 的 `validateCreateOptions` 中检查 `--reference-audio` 文件后缀, - 不在 `['.mp3', '.wav']` 内的直接报错:`Reference audio must be .mp3 or .wav`。 - -4. `allowedExtensions` 函数扩展 video 分支。 - -5. `resolveFileOrUrl` 签名增加可选 `category` override: - ```ts - export async function resolveFileOrUrl( - client: ListenHubClient, - input: string, - options: {accept: FileAcceptType; category?: string}, - ): Promise; - ``` - 内部 `const category = options.category ?? categoryForType[options.accept];` - ---- - -### Step 2: 新增视频轮询函数 - -**文件:`source/_shared/polling.ts`** - -在文件末尾新增 `pollVideoTaskUntilDone`: - -```ts -import type {VideoGenerationTaskDetail} from '@marswave/listenhub-sdk'; - -export async function pollVideoTaskUntilDone( - client: ListenHubClient, - taskId: string, - options: {timeout?: number; json?: boolean}, -): Promise { - const timeoutS = options.timeout ?? 1200; - const maxAttempts = Math.ceil(timeoutS / (pollIntervalMs / 1000)); - const spinner = options.json - ? undefined - : ora({text: `Generating video... (1/${maxAttempts})`}).start(); - - for (let i = 0; i < maxAttempts; i++) { - if (i > 0) await sleep(pollIntervalMs); - const task = await client.getVideoGenerationTask(taskId); - if (task.status === 'success') { - spinner?.succeed('Video created successfully'); - return task; - } - if (task.status === 'failed') { - spinner?.fail('Video creation failed'); - throw new Error('Video creation failed'); - } - if (spinner) { - spinner.text = `Generating video... (${String(i + 2)}/${maxAttempts})`; - } - } - spinner?.fail('Timed out'); - throw new CliTimeoutError(`Timed out after ${timeoutS}s`); -} -``` - -需在顶部 import 区域添加 `VideoGenerationTaskDetail` 类型。 - ---- - -### Step 3: 新增 `source/video/video.ts` — 业务逻辑 - -导出四个函数:`createVideo`、`getVideo`、`listVideos`、`estimateCredits`。 - -**类型定义:** - -```ts -export type VideoCreateOptions = { - prompt: string; - model?: string; - resolution?: string; - ratio?: string; - duration?: number; - firstFrame?: string; - lastFrame?: string; - referenceImage: string[]; - referenceVideo: string[]; - referenceAudio: string[]; - inputVideoDuration?: number; - generateAudio: boolean; // Commander --no-generate-audio 会反转为 generateAudio: false - seed?: number; - wait: boolean; - timeout: number; - json: boolean; -}; -``` - -**`createVideo` 逻辑:** - -1. **校验阶段** — 调用 `validateCreateOptions(options)` 内部函数,按 spec 校验表逐条检查,不满足直接 `throw new Error(msg)`。 - 额外规则: - - 没有 `--reference-video` 时传了 `--input-video-duration` → 报错 `--input-video-duration requires --reference-video` - - `--reference-audio` 文件后缀不在 `.mp3`/`.wav` 内 → 报错 `Reference audio must be .mp3 or .wav` - - `--reference-video` 文件后缀不在 `.mp4`/`.mov` 内 → 报错 `Reference video must be .mp4 or .mov` - -2. **构建 content 数组:** - - ```ts - const content: VideoContentItem[] = []; - // prompt → { type: 'text', text: options.prompt } - // firstFrame → resolveFileOrUrl(client, path, { accept: 'image', category: 'episode' }) - // → { type: 'image_url', image_url: { url }, role: 'first_frame' } - // lastFrame → 同上,role: 'last_frame' - // referenceImage[] → 同上,role: 'reference_image' - // referenceVideo[] → resolveFileOrUrl(client, path, { accept: 'video', category: 'episode' }) - // → { type: 'video_url', video_url: { url }, role: 'reference_video' } - // referenceAudio[] → resolveFileOrUrl(client, path, { accept: 'audio', category: 'episode' }) - // → { type: 'audio_url', audio_url: { url }, role: 'reference_audio' } - ``` - -3. **构建请求参数:** 只传用户显式指定的字段。 - - ```ts - const params: CreateVideoGenerationParams = { - content, - ...(options.model && {model: options.model}), - ...(options.resolution && {resolution: options.resolution}), - ...(options.ratio && {ratio: options.ratio}), - ...(options.duration !== undefined && {duration: options.duration}), - ...(!options.generateAudio && {generateAudio: false}), - ...(options.seed !== undefined && {seed: options.seed}), - ...(options.inputVideoDuration !== undefined && { - inputVideoDuration: options.inputVideoDuration, - }), - }; - ``` - -4. **调用 SDK + 轮询/即时返回。** - -5. **输出:** 成功时 `printDetail` 展示 taskId、videoUrl、duration、resolution、ratio、seed、creditCharged。 - -**`getVideo`:** 调用 `client.getVideoGenerationTask(taskId)` → `printDetail` / `printJson`。 - -**`listVideos`:** 调用 `client.listVideoGenerationTasks(params)` → `printTable` 显示 ID / Model / Status / Duration / Created。 - -**`estimateCredits`:** 调用 `client.estimateVideoGenerationCredits(params)` → 输出 tokens 和 credits。 -校验:`--input-video-duration` 和 `--has-video-input` 必须成对出现,缺一报错。 - ---- - -### Step 4: 新增 `source/video/_cli.ts` — Commander 注册 - -```ts -import { type Command, Option } from 'commander'; -import { getClient } from '../_shared/client.js'; -import { handleError } from '../_shared/output.js'; -import { createVideo, getVideo, listVideos, estimateCredits } from './video.js'; - -function collect(value: string, previous: string[]): string[] { - return [...previous, value]; -} - -export function register(program: Command) { - const cmd = program.command('video').description('SeeDance video generation'); - - cmd.command('create') - .description('Create a video generation task') - .requiredOption('--prompt ', 'Video description') - .option('--model ', 'Model: doubao-seedance-2-pro, doubao-seedance-2-fast') - .option('--resolution ', 'Resolution: 480p, 720p, 1080p') - .option('--ratio ', 'Aspect ratio: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9') - .option('--duration ', 'Video duration in seconds (4-15)', Number) - .option('--first-frame ', 'First frame image') - .option('--last-frame ', 'Last frame image (requires --first-frame)') - .option('--reference-image ', 'Reference image (repeatable, max 9)', collect, []) - .option('--reference-video ', 'Reference video (repeatable, max 3)', collect, []) - .option('--reference-audio ', 'Reference audio (repeatable, max 3)', collect, []) - .option('--input-video-duration ', 'Reference video duration (2-15, required with --reference-video)', Number) - .option('--no-generate-audio', 'Disable audio generation') - .option('--seed ', 'Random seed (-1 to 4294967295)', Number) - .option('--no-wait', 'Return immediately without polling') - .option('--timeout ', 'Polling timeout', Number, 1200) - .option('-j, --json', 'Output JSON', false) - .action(async (options) => { ... }); - - cmd.command('get ') - .description('Get video task details') - .option('-j, --json', 'Output JSON', false) - .action(async (taskId, options) => { ... }); - - cmd.command('list') - .description('List video generation tasks') - .option('--page ', 'Page number', Number, 1) - .option('--page-size ', 'Items per page', Number, 20) - .option('--status ', 'Filter: pending, generating, uploading, success, failed') - .option('-j, --json', 'Output JSON', false) - .action(async (options) => { ... }); - - cmd.command('estimate') - .description('Estimate credit cost') - .requiredOption('--model ', 'Model name') - .requiredOption('--resolution ', 'Resolution') - .requiredOption('--duration ', 'Duration (4-15)', Number) - .option('--ratio ', 'Aspect ratio', '16:9') - .option('--has-video-input', 'Has reference video input', false) - .option('--input-video-duration ', 'Reference video duration', Number) - .option('-j, --json', 'Output JSON', false) - .action(async (options) => { ... }); -} -``` - ---- - -### Step 5: 注册到主入口 - -**文件:`source/cli.ts`** - -```ts -import {register as registerVideo} from './video/_cli.js'; -// ... -registerVideo(program); // 放在 registerCreation 之前 -``` - ---- - -### Step 6: 更新 README - -**文件:`README.md`** - -1. Commands 表新增 Video 部分: - - ``` - ### Video Generation - - | Command | Description | - | -------------------------- | ----------------------------- | - | `listenhub video create` | Create a video generation task | - | `listenhub video list` | List video tasks | - | `listenhub video get ` | Get video task details | - | `listenhub video estimate` | Estimate credit cost | - ``` - -2. Examples 新增 Video generation 小节: - - ```bash - # Text-to-video - listenhub video create --prompt "A cat playing piano in a jazz bar" - - # Image-to-video (first frame) - listenhub video create --prompt "Camera slowly zooms out" --first-frame ./scene.png - - # With reference video - listenhub video create --prompt "Same style dancing" \ - --reference-video ./clip.mp4 --input-video-duration 8 - - # Estimate credits - listenhub video estimate --model doubao-seedance-2-pro --resolution 1080p --duration 10 - ``` - -**文件:`README.zh-CN.md`** - -同步更新中文 README,添加对应的 Video Generation 命令表和示例(与英文版对齐)。 - ---- - -### Step 7: `vp check` + Smoke check - -```bash -# vp check = fmt --check + lint + type check(三合一) -pnpm check - -# Smoke check — 确认命令注册正确 -pnpm build -node dist/cli.js video --help -node dist/cli.js video create --help -node dist/cli.js video list --help -node dist/cli.js video estimate --help -``` - -若有问题则修复后重新运行。`vp check` 必须全通过才能提交 PR。 - ---- - -## 文件清单 - -| 文件 | 操作 | 行数估算 | -| --------------------------- | ---- | -------- | -| `package.json` | 修改 | ~1 行 | -| `source/_shared/upload.ts` | 修改 | +15 行 | -| `source/_shared/polling.ts` | 修改 | +30 行 | -| `source/video/video.ts` | 新增 | ~200 行 | -| `source/video/_cli.ts` | 新增 | ~90 行 | -| `source/cli.ts` | 修改 | +2 行 | -| `README.md` | 修改 | +25 行 | -| `README.zh-CN.md` | 修改 | +25 行 | - -总新增约 340 行代码。 - -## 风险点 - -1. **SDK 0.0.6 兼容性** — CLI 当前锁定 `^0.0.4`,升级后确认其他命令不受影响(SDK 是向后兼容的增量新增)。 -2. **视频文件上传体积** — 50MB 本地文件上传到 GCS 可能耗时较长,`resolveFileOrUrl` 当前无进度条,大文件体验需留意(不在本次范围内解决)。 -3. **Commander `--no-generate-audio` 语义** — Commander 会自动创建 `generateAudio` 布尔值,默认 `true`,传 `--no-generate-audio` 后变 `false`。需确认 Commander 版本行为。 diff --git a/docs/specs/listenhub-cli--127-design.md b/docs/specs/listenhub-cli--127-design.md deleted file mode 100644 index ba3e13e..0000000 --- a/docs/specs/listenhub-cli--127-design.md +++ /dev/null @@ -1,184 +0,0 @@ -# Spec: CLI 支持 SeeDance2.0 视频生成 - -> Issue: marswaveai/listenhub-ralph#127 - -## 背景 - -SDK 0.0.6 已封装 SeeDance2.0 视频生成 API(`v1/video-generation/*`),CLI 需要同步暴露对应命令,让用户通过终端即可创建视频任务、查看任务状态、列出历史任务和预估积分消耗。 - -## 目标 - -在 `listenhub-cli` 中新增 `video` 命令组,覆盖 SeeDance2.0 全部核心操作。 - -## 新增模块 - -``` -source/video/_cli.ts — Commander 注册 -source/video/video.ts — 业务逻辑 -``` - -`source/cli.ts` 新增 `registerVideo` 导入。 - -## 命令设计 - -### `listenhub video create` - -创建视频生成任务。 - -| 参数 | 类型 | 必填 | 默认值 | 说明 | -| ---------------------------------- | ------- | ---- | ------------------------------------------- | ------------------------------------------------------------------ | -| `--prompt ` | string | 是 | — | 视频描述文本 | -| `--model ` | string | 否 | 不传(服务端默认 `doubao-seedance-2-fast`) | 模型:`doubao-seedance-2-pro` / `doubao-seedance-2-fast` | -| `--resolution ` | string | 否 | 不传(服务端默认 `720p`) | 分辨率:`480p` / `720p` / `1080p`(注意:`1080p` 仅 pro 模型支持) | -| `--ratio ` | string | 否 | 不传(服务端默认 `16:9`) | 画面比例:`16:9` / `4:3` / `1:1` / `3:4` / `9:16` / `21:9` | -| `--duration ` | number | 否 | — | 视频时长,范围 4–15 秒 | -| `--first-frame ` | string | 否 | — | 首帧图片,本地文件或平台资产 URL | -| `--last-frame ` | string | 否 | — | 末帧图片(必须同时指定 `--first-frame`) | -| `--reference-image ` | string | 否 | — | 参考图(可重复,最多 9 张),本地文件或平台资产 URL | -| `--reference-video ` | string | 否 | — | 参考视频(可重复,最多 3 个),本地文件或平台资产 URL | -| `--reference-audio ` | string | 否 | — | 参考音频(可重复,最多 3 个),本地文件或平台资产 URL | -| `--input-video-duration ` | number | 否 | — | 参考视频时长,范围 2–15 秒;使用 `--reference-video` 时**必填** | -| `--no-generate-audio` | boolean | — | — | 禁用音轨生成(服务端默认生成音轨) | -| `--seed ` | number | 否 | — | 随机种子,范围 -1 到 4294967295 | -| `--no-wait` | boolean | — | — | 提交后立即返回,不轮询 | -| `--timeout ` | number | 否 | `1200` | 轮询超时 | -| `-j, --json` | boolean | — | — | JSON 输出 | - -**输入模式互斥规则(CLI 端校验,不满足直接报错退出):** - -- **帧控制模式**(`--first-frame`/`--last-frame`)与**参考模式**(`--reference-image`/`--reference-video`/`--reference-audio`)不可混用。 -- `--last-frame` 必须搭配 `--first-frame`。 -- `--reference-audio` 不能单独使用,必须搭配 `--reference-image` 或 `--reference-video`(纯 prompt + audio 不合法)。 -- 数量上限:image ≤ 9,video ≤ 3,audio ≤ 3。 - -**URL 约束:** 所有 `` 参数仅接受本地文件路径或 ListenHub 平台资产 URL(GCS bucket / CDN)。外部 URL(如 `https://example.com/v.mp4`)会被后端拒绝。 - -**行为:** - -1. 校验输入模式互斥规则和参数范围。 -2. 解析 `--prompt` 为 `VideoContentText`。 -3. 依据素材参数构建 `content[]`,本地文件通过 `resolveFileOrUrl` 上传后取平台 URL。 -4. 仅在用户显式传了 `--model`/`--resolution`/`--ratio`/`--duration`/`--seed` 时才放入请求体,其余由服务端默认。`generateAudio` 仅在 `--no-generate-audio` 时传 `false`。 -5. 若有 `--reference-video`,将 `--input-video-duration` 作为 `inputVideoDuration` 传入(缺失则报错)。 -6. 调用 `client.createVideoGeneration(params)`。 -7. 若 `--no-wait`,打印 `taskId` 后退出。 -8. 否则轮询 `client.getVideoGenerationTask(taskId)` 直到 `success` / `failed` / 超时。 -9. 成功打印视频 URL 与基本信息。 - -### `listenhub video get ` - -获取单个任务详情。 - -| 参数 | 说明 | -| ------------ | --------- | -| `taskId` | 位置参数 | -| `-j, --json` | JSON 输出 | - -### `listenhub video list` - -列出视频生成任务。 - -| 参数 | 默认 | 说明 | -| ------------------- | ---- | ----------------------------------------------------------------------- | -| `--page ` | 1 | 页码 | -| `--page-size ` | 20 | 每页条数 | -| `--status ` | — | 可选筛选:`pending` / `generating` / `uploading` / `success` / `failed` | -| `-j, --json` | — | JSON 输出 | - -### `listenhub video estimate` - -预估积分消耗。 - -| 参数 | 必填 | 默认 | 说明 | -| ---------------------------- | ---- | ------- | ------------------------------------------------ | -| `--model ` | 是 | — | 模型 | -| `--resolution ` | 是 | — | 分辨率 | -| `--duration ` | 是 | — | 时长(4–15) | -| `--ratio ` | 否 | `16:9` | 比例 | -| `--has-video-input` | 否 | `false` | 是否有参考视频 | -| `--input-video-duration ` | 否 | — | 参考视频时长(2–15,`--has-video-input` 时必填) | -| `-j, --json` | — | — | JSON 输出 | - -## 改动点 - -| 文件 | 改动 | -| --------------------------- | ---------------------------------------------------------- | -| `source/video/_cli.ts` | 新增 — Commander 命令注册 | -| `source/video/video.ts` | 新增 — create / get / list / estimate 逻辑 + 输入校验 | -| `source/cli.ts` | 添加 `registerVideo` | -| `source/_shared/polling.ts` | 新增 `pollVideoTaskUntilDone` | -| `source/_shared/upload.ts` | 扩展支持 `video` 类型(`.mp4`/`.mov`/`.webm`,上限 100MB) | -| `package.json` | 升级 `@marswave/listenhub-sdk` 到 `^0.0.6` | -| `README.md` | 添加 `video` 命令说明与示例 | - -## 上传扩展 - -`resolveFileOrUrl` 签名扩展为支持 category override: - -```ts -resolveFileOrUrl(client, input, {accept: 'video', category: 'episode'}); -``` - -**新增 `video` 文件类型:** - -- 允许后缀:`.mp4`、`.mov` -- 最大体积:50 MB -- MIME:`video/mp4`、`video/quicktime` - -**video 命令中音频素材限制:** - -- 允许后缀:`.mp3`、`.wav`(SeeDance 支持范围,不含 `.flac`/`.ogg` 等) -- 最大体积:15 MB -- 在 `video.ts` 校验层额外检查后缀,不合格直接报错 - -**video 命令的 upload category:** 所有素材(image/video/audio)统一使用 `category=episode`(private upload)。后端 `resolveMediaUrl` 对 private bucket URL 通过 `UserFileDao` 校验所有权后签名,这是最稳妥的路径。现有 image 命令继续用 `category=banana` 不受影响。 - -> 技术原因:后端 `resolveMediaUrl` 只接受三种 URL —— public CDN、private bucket(需 UserFileDao 记录)、已签名 URL。虽然 banana public bucket 碰巧在白名单中,但 private upload 语义更明确且不依赖隐式行为。 - -## 轮询策略 - -视频生成较慢,采用: - -- 间隔 10s(同现有全局 `pollIntervalMs`) -- 默认超时 1200s(20 分钟) -- 终态:`success` / `failed` - -## CLI 端参数校验 - -在调用 SDK 之前,CLI 需拦截以下非法输入并给出明确错误提示: - -| 规则 | 错误信息 | -| ----------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | -| `--duration` 不在 4–15 | `Duration must be between 4 and 15 seconds` | -| `--seed` 不在 -1 到 4294967295 | `Seed must be between -1 and 4294967295` | -| `--resolution 1080p` + model 非 pro | `1080p resolution requires --model doubao-seedance-2-pro` | -| `--reference-video` 存在但缺 `--input-video-duration` | `--input-video-duration is required when using --reference-video` | -| `--input-video-duration` 存在但无 `--reference-video` | `--input-video-duration requires --reference-video` | -| `--input-video-duration` 不在 2–15 | `Input video duration must be between 2 and 15 seconds` | -| `--last-frame` 无 `--first-frame` | `--last-frame requires --first-frame` | -| 帧控制 + 参考混用 | `Cannot mix frame mode (--first-frame/--last-frame) with reference mode (--reference-image/--reference-video/--reference-audio)` | -| `--reference-audio` 无 image/video 素材 | `--reference-audio requires --reference-image or --reference-video` | -| `--reference-image` 超过 9 | `Too many reference images (max 9)` | -| `--reference-video` 超过 3 | `Too many reference videos (max 3)` | -| `--reference-audio` 超过 3 | `Too many reference audios (max 3)` | - -## 错误处理 - -- CLI 端校验失败:直接抛 Error,由 `handleError` 统一输出。 -- SDK/后端返回的 `VideoGenerationErrorCode` 映射为可读消息。 -- 与现有模块保持一致:`handleError(error, options.json)` 统一格式。 - -## 验收标准 - -1. `listenhub video create --prompt "..."` 可成功创建任务并轮询到最终结果。 -2. `listenhub video create --prompt "..." --reference-video ./clip.mp4 --input-video-duration 5` 正常工作。 -3. `listenhub video list` 正确展示历史任务列表。 -4. `listenhub video get ` 输出任务详情。 -5. `listenhub video estimate --model ... --resolution ... --duration ...` 输出积分预估。 -6. 本地文件(图片/视频/音频)通过对应参数上传成功。 -7. 传入外部非平台 URL 时后端拒绝,CLI 错误提示清晰。 -8. 输入模式互斥校验:混用帧控制 + 参考模式时 CLI 直接报错。 -9. `--no-generate-audio` 正确禁用音轨;不传时服务端默认生成音轨。 -10. `--json` 模式输出合法 JSON。 -11. `pnpm lint` 无错误。 -12. README 包含 `video` 命令最小可用示例。