Lego mode for acestep-cpp#21
Conversation
* Add LEGO mode: --lego <track> flag for dit-vae, example files, README docs
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new "Lego mode" ( Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/Script
participant CLI as dit-vae (CLI)
participant Request as Request Parser
participant Model as Base Model Checker
participant DitVaeCore as Dit-VAE Processor
participant Files as Audio Files
User->>CLI: run `dit-vae --lego <track> --src-audio ...`
CLI->>Request: parse request (includes `lego` field)
Request-->>CLI: lego value set / validated
CLI->>Model: validate model compatibility (no turbo with lego)
Model-->>CLI: validation result
CLI->>DitVaeCore: build instruction "Generate the {TRACK} track based on the audio context:"
DitVaeCore->>Files: read src-audio, produce lego-transformed .wav
Files-->>User: output .wav ready
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/lego.json`:
- Around line 1-5: This preset must explicitly set non-vocal conditioning so
dit-vae triggers the documented instrumental path; add a top-level "lyrics"
property with the value "[Instrumental]" in the same JSON object alongside
"caption", "audio_cover_strength", "inference_steps", and "guidance_scale" so
the guitar-stem example is forced to non-vocal conditioning.
In `@README.md`:
- Around line 261-269: Update the dit-vae CLI reference to include the new
--lego <track> flag and its required --src-audio option, mirroring the Lego
section: show that only the base model "acestep-v15-base" supports lego mode,
recommend setting audio_cover_strength=1.0 in the request, list the available
track names (vocals, backing_vocals, drums, bass, guitar, keyboard, percussion,
strings, synth, fx, brass, woodwinds), and reference examples/lego.json and
examples/lego.sh as usage examples; ensure the dit-vae usage/flags block
documents the dependency between --lego and --src-audio.
In `@tools/dit-vae.cpp`:
- Around line 116-117: Validate the provided --lego argument against the
documented whitelist instead of accepting any string: in the command-line
parsing branch that checks strcmp(argv[i], "--lego") and assigns lego_track (and
the similar block around the 429-440 range), check the value against a
predefined array or set of allowed track names (e.g., allowed_lego_tracks) and
if it is not present, print a clear error message and exit with a non-zero
status; perform this validation immediately when parsing (or right after
assignment) so typos like "guitarr" fail fast.
- Around line 155-158: The check that --lego requires --src-audio is incomplete:
also fail fast if --lego is used with a non-base DiT; update the validation (the
block that checks lego_track and src_audio_path) to additionally inspect the
variable that holds the chosen DiT variant/name (the same place that later loads
model weights) and return an error when the variant is not the base DiT (e.g.
print "[CLI] ERROR: --lego only supported with base DiT weights" and return 1).
Apply the same augmented validation at the other site mentioned (the later
validation around lines 432-440) so both early checks reject unsupported
configurations before loading turbo/SFT weights.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3475192d-227e-4c60-8fcd-ebe16a66fb53
📒 Files selected for processing (5)
.github/workflows/ci-build.ymlREADME.mdexamples/lego.jsonexamples/lego.shtools/dit-vae.cpp
Removed the echo statement for ensuring the base model.
Removed echo statements for steps in the script.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/lego.sh (1)
14-16: Script assumes execution from theexamples/directory.The relative paths (
../build/,../models/) require the script to be run from theexamples/directory. Consider adding acdto the script's directory or documenting this requirement.🔧 Suggested fix to handle working directory
set -eu +# Ensure we're running from the examples directory +cd "$(dirname "$0")" + # Step 0: ensure the base model is availableAlso applies to: 18-23
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/lego.sh` around lines 14 - 16, The script uses relative paths like ../build/ and ../models/ which require running from the examples/ directory; update the script entrypoint (e.g., in examples/lego.sh) to compute the script directory (using dirname "$0" / script path resolution) and cd to that directory or build absolute paths from that script directory before invoking ../build/ace-qwen3 and the model path so the script works regardless of the current working directory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/lego.sh`:
- Around line 4-5: The "Step zero" comment in examples/lego.sh promises
downloading/ensuring the base DiT model but has no implementation; either add an
idempotent check-and-download block that verifies the base model file or
directory exists and if missing uses curl/wget (or an env var) to fetch it,
validates success and exits non-zero on failure, or else remove the misleading
"step zero"/"ensure the base model is available" comments; update the script to
perform the check before any use of the base model so subsequent steps (the code
referencing the base model) never run without it.
---
Nitpick comments:
In `@examples/lego.sh`:
- Around line 14-16: The script uses relative paths like ../build/ and
../models/ which require running from the examples/ directory; update the script
entrypoint (e.g., in examples/lego.sh) to compute the script directory (using
dirname "$0" / script path resolution) and cd to that directory or build
absolute paths from that script directory before invoking ../build/ace-qwen3 and
the model path so the script works regardless of the current working directory.
|
Quick demo of lego-mode-demo.mp4 |
Add error handling for --lego option requiring base DiT model
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tools/dit-vae.cpp (1)
199-205:⚠️ Potential issue | 🟠 MajorReject unsupported SFT/non-base DiT weights here, not just turbo.
Line 199 only checks
acestep.is_turbo, so LEGO mode still runs with any non-turbo variant. The PR contract is stricter than that: it is base-model-only (acestep-v15-base), with turbo and SFT variants unsupported. Please validate the actual model identity in this guard and fail before proceeding with LEGO mode.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/dit-vae.cpp` around lines 199 - 205, The current guard only checks is_turbo and allows LEGO mode with non-base or SFT variants; update the early-exit that uses lego_track and is_turbo to also validate the model identity and SFT status from the model metadata (read the model id string metadata field and an SFT flag via gf_get_string/gf_get_bool on the gf object) and only allow LEGO when the model id exactly equals "acestep-v15-base" and the SFT flag is false; if the checks fail, emit the same stderr error and perform gf_close(&gf) and dit_ggml_free(&model) then return 1 (same behavior/location as the existing lego_track/is_turbo guard).
🧹 Nitpick comments (1)
tools/dit-vae.cpp (1)
435-442: Avoid silently truncating free-form--legoprompts.Line 436 caps the generated instruction at 255 bytes. Since
--legointentionally accepts free-form values, longer inputs will be clipped before tokenization and the model will see a different instruction than the user supplied. Building this prompt withstd::stringwould remove that hidden limit.Based on learnings, in
tools/dit-vae.cppthe--lego <track>CLI argument intentionally accepts any string without validation against the documented track list.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/dit-vae.cpp` around lines 435 - 442, The code currently builds the prompt into a fixed-size char buffer (lego_instruction) using snprintf, which silently truncates long free-form --lego inputs (lego_track); replace this with a std::string-based construction so the full user-supplied track string is preserved: remove lego_instruction and snprintf and instead concatenate into a std::string (e.g., using std::string instruction = std::string("Generate the ") + lego_track + " track based on the audio context:") and update any references to use that std::string (instruction) so no implicit 255-byte limit remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tools/dit-vae.cpp`:
- Around line 199-205: The current guard only checks is_turbo and allows LEGO
mode with non-base or SFT variants; update the early-exit that uses lego_track
and is_turbo to also validate the model identity and SFT status from the model
metadata (read the model id string metadata field and an SFT flag via
gf_get_string/gf_get_bool on the gf object) and only allow LEGO when the model
id exactly equals "acestep-v15-base" and the SFT flag is false; if the checks
fail, emit the same stderr error and perform gf_close(&gf) and
dit_ggml_free(&model) then return 1 (same behavior/location as the existing
lego_track/is_turbo guard).
---
Nitpick comments:
In `@tools/dit-vae.cpp`:
- Around line 435-442: The code currently builds the prompt into a fixed-size
char buffer (lego_instruction) using snprintf, which silently truncates long
free-form --lego inputs (lego_track); replace this with a std::string-based
construction so the full user-supplied track string is preserved: remove
lego_instruction and snprintf and instead concatenate into a std::string (e.g.,
using std::string instruction = std::string("Generate the ") + lego_track + "
track based on the audio context:") and update any references to use that
std::string (instruction) so no implicit 255-byte limit remains.
|
We'll integrate that quickly :) a little caustic polish and it'll be perfect |
…t field (#21) * apply requested changes
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tools/dit-vae.cpp (1)
455-458: Consider usingstd::transformfor the uppercase conversion.The pipeline flagged this raw loop. Using
std::transformis more idiomatic and concise.Suggested fix
- std::string track_upper = req.lego; - for (char & c : track_upper) { - c = (char) toupper((unsigned char) c); - } + std::string track_upper = req.lego; + std::transform(track_upper.begin(), track_upper.end(), track_upper.begin(), + [](unsigned char c) { return std::toupper(c); });This requires adding
#include <algorithm>at the top of the file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/dit-vae.cpp` around lines 455 - 458, Replace the manual loop that uppercases req.lego (variable track_upper) with an idiomatic std::transform call; include <algorithm> at the top of the file and use std::transform(track_upper.begin(), track_upper.end(), track_upper.begin(), [](unsigned char ch){ return (char)std::toupper(ch); }) to preserve the same unsigned-char casting behavior as the original loop. Ensure the transformation operates on track_upper and preserves the existing variable name and semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tools/dit-vae.cpp`:
- Around line 455-458: Replace the manual loop that uppercases req.lego
(variable track_upper) with an idiomatic std::transform call; include
<algorithm> at the top of the file and use std::transform(track_upper.begin(),
track_upper.end(), track_upper.begin(), [](unsigned char ch){ return
(char)std::toupper(ch); }) to preserve the same unsigned-char casting behavior
as the original loop. Ensure the transformation operates on track_upper and
preserves the existing variable name and semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b7bdc21d-d5ad-418a-9e72-673e5db73d49
📒 Files selected for processing (6)
README.mdexamples/lego.jsonexamples/lego.shsrc/request.cppsrc/request.htools/dit-vae.cpp
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/lego.sh
Implements ACE-Step LEGO mode in
dit-vae: given a source audio file, generate a new instrument stem conditioned on it via--lego <track>+--src-audio. Requires the base model (acestep-v15-base); turbo/SFT do not support it.tools/dit-vae.cpp—--lego <track>flag; builds a"Generate the {track} track based on the audio context:"instruction injected into the DiT alongside--src-audioexamples/lego.json— request preset (audio_cover_strength=1.0,inference_steps=50,guidance_scale=7.0)examples/lego.sh— end-to-end demo: download base model → generate source track → apply legoREADME.md— usage docs and full track name reference./build/dit-vae \ --src-audio backing-track.wav \ --lego guitar \ --request examples/lego.json \ --text-encoder models/Qwen3-Embedding-0.6B-Q8_0.gguf \ --dit models/acestep-v15-base-Q8_0.gguf \ --vae models/vae-BF16.gguf \ --wavAvailable tracks:
vocals,backing_vocals,drums,bass,guitar,keyboard,percussion,strings,synth,fx,brass,woodwindsSummary by CodeRabbit
New Features
Documentation
Examples
Chores