📋 Previous Releases
| Version | Highlights |
| :--- | :--- |
+| **v2.18.0** | Kiwi-Edit AI video editing, SAM3 + Kiwi-Edit, RTX Video Super Resolution, SeedVR AI Upscaling, FacePoke expression presets |
+| **v2.17.0** | FacePoke interactive face editor, driving video reference, shader effects system, Flux Klein FP8, onion skin compositing, unified audio output mode |
+| **v2.16.0** | ACE-Step AI music generation, SAM-Audio source separation, Video Editor v2 (10 panels), AudioX vocal enhancement, NormalCrafter, Video Depth Anything |
+| **v2.15.0** | MiniMax-Remover, 5 new no-LLM modes, auto-VRAM tile sizing, VRAM management overhaul |
| **v2.14.0** | Video Editor NLE node with timeline, razor, crop, transitions, text overlays, keyboard shortcuts |
| **v2.13.0** | AI Background Removal (BRIA RMBG), FLUX Klein toggle, Edit FFmpeg fallback, smarter defaults |
| **v2.12.0** | AI Face Animation (LivePortrait), MMAudio in-process inference, MCP progressive disclosure, LaMa safetensors conversion |
@@ -90,7 +95,7 @@ Works with **Ollama** (local, free), **OpenAI**, **Anthropic**, **Google Gemini*
### 🎨 200+ Skills
-200+ video editing skills across visual effects, audio processing, spatial transforms, temporal edits, encoding, cinematic presets, vintage looks, social media, creative effects, text animations, editing & composition, audio visualization, multi-input operations, transitions, concat, split screen, and AI-powered skills (Whisper transcription, SAM3 masking, MiniMax-Remover object removal, MMAudio generation, MuseTalk lip sync, LivePortrait face animation, Video Depth estimation, AI Upscaling, Marigold dense vision).
+200+ video editing skills across visual effects, audio processing, spatial transforms, temporal edits, encoding, cinematic presets, vintage looks, social media, creative effects, text animations, editing & composition, audio visualization, multi-input operations, transitions, concat, split screen, and AI-powered skills (DreamID-Omni talking-head generation, FaceCam camera control, Fish Speech TTS, Foundation-1 music samples, ACE-Step music generation, SAM-Audio source separation, AudioX vocal enhancement, Whisper transcription, SAM3 masking, MiniMax-Remover object removal, MMAudio generation, MuseTalk lip sync, LivePortrait face animation, NormalCrafter surface normals, Video Depth estimation, AI Upscaling, Marigold dense vision).
|
@@ -512,6 +517,15 @@ All models are mirrored to first-party [AEmotionStudio](https://huggingface.co/A
| **Marigold** (Dense Vision) | ~2.5 GB per mode | Auto-downloaded by diffusers | `marigold` no-LLM mode (depth/normals/appearance/lighting) | [AEmotionStudio/marigold-depth-v1-1](https://huggingface.co/AEmotionStudio/marigold-depth-v1-1) |
| **AI Upscaler** (Real-ESRGAN / HAT / DAT / SwinIR) | ~17–170 MB per model | `ComfyUI/models/upscale_models/` | `ai_upscale` skill, `ai_upscale` no-LLM mode | [AEmotionStudio/ai-upscale-models](https://huggingface.co/AEmotionStudio/ai-upscale-models) |
| **BRIA RMBG** (rembg) | ~270 MB | `~/.u2net/` | `remove_background` skill | Install with `pip install 'comfyui-ffmpega[masking]'` — model auto-fetched by rembg |
+| **ACE-Step 1.5** (Music Generation) | ~5 GB | `ComfyUI/models/acestep/` | `ace_step` no-LLM mode, `generate_music` skill | [AEmotionStudio/ACE-Step](https://huggingface.co/AEmotionStudio/ACE-Step) |
+| **SAM-Audio** (Source Separation) | ~1.2 GB (large) / ~600 MB (fp8) | `ComfyUI/models/sam_audio/` | `audio_separate` no-LLM mode | [AEmotionStudio/sam-audio](https://huggingface.co/AEmotionStudio/sam-audio) |
+| **AudioX** (Vocal Enhancement) | ~1 GB | `ComfyUI/models/audiox/` | AudioX chaining with ACE-Step | [AEmotionStudio/audiox](https://huggingface.co/AEmotionStudio/audiox) |
+| **NormalCrafter** (Surface Normals) | ~2 GB | `ComfyUI/models/normalcrafter/` | `normalcrafter` no-LLM mode | [AEmotionStudio/NormalCrafter](https://huggingface.co/AEmotionStudio/NormalCrafter) |
+| **Kiwi-Edit** (AI Video Editing) | ~5 GB (FP8) / ~10 GB (BF16) | `ComfyUI/models/kiwi_edit_*` | `kiwi_edit` no-LLM mode | [AEmotionStudio/Kiwi-Edit-Instruct](https://huggingface.co/AEmotionStudio/Kiwi-Edit-Instruct) |
+| **DreamID-Omni** (Talking Head) ⚠️ WIP | ~12 GB (FP8) / ~23 GB (BF16) | `ComfyUI/models/dreamid_omni/` | `dreamid_omni` no-LLM mode | [AEmotionStudio/dreamid-omni](https://huggingface.co/AEmotionStudio/dreamid-omni) |
+| **FaceCam** (Camera Control) | ~16.8 GB (high+low bf16) | `ComfyUI/models/diffusion_models/` | FaceCam node | [AEmotionStudio/facecam-wan2.2-14b-bf16](https://huggingface.co/AEmotionStudio/facecam-wan2.2-14b-bf16) |
+| **Fish Speech S2 Pro** (TTS) | ~6.5 GB (FP8) / ~10.4 GB (BF16) | `ComfyUI/models/fish_speech/` | `fish_speech` no-LLM mode | [AEmotionStudio/fish-speech-s2-pro](https://huggingface.co/AEmotionStudio/fish-speech-s2-pro) |
+| **Foundation-1** (Music Samples) | ~2 GB | `ComfyUI/models/foundation1/` | `foundation1` no-LLM mode | [AEmotionStudio/foundation1-models](https://huggingface.co/AEmotionStudio/foundation1-models) |
> [!NOTE]
> Models are only downloaded when you use the corresponding skill for the first time. Core FFmpeg editing skills (200+ of them) require **zero model downloads**.
@@ -520,7 +534,7 @@ All models are mirrored to first-party [AEmotionStudio](https://huggingface.co/A
## 🎛️ Nodes
-FFMPEGA provides **11 nodes** that work together:
+FFMPEGA provides **14 nodes** that work together:
> [!TIP]
> **One task per run.** Instead of cramming multiple edits into a single prompt, focus each run on one editing task — then feed the output back into FFMPEGA for the next. This keeps context low and model focus high, leading to significantly better results. Chain FFMPEGA Agent → Save Video → Load Video Path → FFMPEGA Agent for multi-step workflows.
@@ -539,7 +553,7 @@ FFMPEGA provides **11 nodes** that work together:
| `video_path` | STRING | Absolute path to source video. Used as ffmpeg input unless `images_a` is connected. |
| `prompt` | STRING | Natural language editing instruction (e.g. *"Add cinematic letterbox"*, *"Speed up 2x"*). Not required in `manual` mode. |
| `llm_model` | DROPDOWN | AI model selection — local Ollama models, CLI tools, or cloud APIs. Select `none` for no-LLM mode. |
-| `no_llm_mode` | DROPDOWN | Mode when `llm_model` is `none`: `manual` (Effects Builder, default), `sam3_masking`, `transcribe`, `karaoke_subtitles`, `generate_audio`, `lip_sync`, `animate_portrait`, `marigold`, `video_depth`, `flux_klein`, `minimax_remover`, `ai_upscale`. |
+| `no_llm_mode` | DROPDOWN | Mode when `llm_model` is `none`: `manual` (Effects Builder, default), `sam3_masking`, `transcribe`, `karaoke_subtitles`, `generate_audio`, `generate_music`, `foundation1`, `fish_speech`, `audio_inpaint`, `audio_separate`, `ace_step`, `lip_sync`, `animate_portrait`, `marigold`, `normalcrafter`, `video_depth`, `flux_klein`, `kiwi_edit`, `minimax_remover`, `dreamid_omni`, `ai_upscale`, `rembg`, `onion_skin`, `comparison`. |
| `quality_preset` | DROPDOWN | Output quality: `draft`, `standard`, `high`, `lossless`. |
| `seed` | INT | Change to force re-execution with the same prompt. Supports randomize control. |
@@ -847,11 +861,74 @@ Scans ComfyUI's output/temp directories for the newest video. Decodes frames (wi
+