Skip to content

Lego mode for acestep-cpp#21

Merged
ServeurpersoCom merged 6 commits intoServeurpersoCom:masterfrom
audiohacking:lego-mode
Mar 10, 2026
Merged

Lego mode for acestep-cpp#21
ServeurpersoCom merged 6 commits intoServeurpersoCom:masterfrom
audiohacking:lego-mode

Conversation

@lmangani
Copy link
Contributor

@lmangani lmangani commented Mar 10, 2026

Implements ACE-Step LEGO mode in dit-vae: given a source audio file, generate a new instrument stem conditioned on it via --lego <track> + --src-audio. Requires the base model (acestep-v15-base); turbo/SFT do not support it.

  • tools/dit-vae.cpp--lego <track> flag; builds a "Generate the {track} track based on the audio context:" instruction injected into the DiT alongside --src-audio
  • examples/lego.json — request preset (audio_cover_strength=1.0, inference_steps=50, guidance_scale=7.0)
  • examples/lego.sh — end-to-end demo: download base model → generate source track → apply lego
  • README.md — usage docs and full track name reference
./build/dit-vae \
    --src-audio backing-track.wav \
    --lego guitar \
    --request examples/lego.json \
    --text-encoder models/Qwen3-Embedding-0.6B-Q8_0.gguf \
    --dit models/acestep-v15-base-Q8_0.gguf \
    --vae models/vae-BF16.gguf \
    --wav

Available tracks: vocals, backing_vocals, drums, bass, guitar, keyboard, percussion, strings, synth, fx, brass, woodwinds

Summary by CodeRabbit

  • New Features

    • Added a Lego mode for audio transformation via a new --lego option (requires source audio and is limited to base DiT models).
  • Documentation

    • Added Lego usage, configuration notes, example request JSON, required flags, and list of supported track names to the README.
  • Examples

    • Added a sample lego.json and a script demonstrating a three-step Lego audio workflow.
  • Chores

    • CI linting updated to skip additional vendor directories during checks.

* Add LEGO mode: --lego <track> flag for dit-vae, example files, README docs
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new "Lego mode" (--lego <track>) with validation (requires --src-audio, model restrictions, and track-name checks), adds a lego field to requests, updates README and examples for Lego usage, and expands CI clang-format pruning to skip vendor and mp3.

Changes

Cohort / File(s) Summary
CI Configuration
.github/workflows/ci-build.yml
Expanded clang-format prune/exclude to also skip vendor and mp3 directories when locating files to check.
Documentation
README.md
Added Lego mode documentation, usage notes, example JSON snippet, and updated Request JSON reference to include "lego": "".
Examples
examples/lego.json, examples/lego.sh
Added examples/lego.json with Lego parameters and examples/lego.sh, a 3-step pipeline demonstrating base model download, source generation, and dit-vae --lego <track> --src-audio usage.
Tooling / Core CLI
tools/dit-vae.cpp
Implemented --lego <track> handling and validation: enforces --src-audio, disallows turbo with Lego, validates track against a 12-entry list, forces audio_cover_strength=1.0, logs Lego actions, and builds an instruction string: "Generate the {TRACK} track based on the audio context:".
Request Model
src/request.h, src/request.cpp
Added public std::string lego to AceRequest, initialized to empty, parsed from and emitted to JSON when non-empty, and displayed in request dumps when present.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Script
    participant CLI as dit-vae (CLI)
    participant Request as Request Parser
    participant Model as Base Model Checker
    participant DitVaeCore as Dit-VAE Processor
    participant Files as Audio Files

    User->>CLI: run `dit-vae --lego <track> --src-audio ...`
    CLI->>Request: parse request (includes `lego` field)
    Request-->>CLI: lego value set / validated
    CLI->>Model: validate model compatibility (no turbo with lego)
    Model-->>CLI: validation result
    CLI->>DitVaeCore: build instruction "Generate the {TRACK} track based on the audio context:"
    DitVaeCore->>Files: read src-audio, produce lego-transformed .wav
    Files-->>User: output .wav ready
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I chewed a spec and stitched a tune,
A Lego track beneath the moon.
Readme shines and examples hum,
CI skips clutter—off we run.
Small changes, big-imagined drum. 🎸

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Lego mode for acestep-cpp' is fully related to the main change: implementing LEGO mode functionality across multiple files (dit-vae.cpp, request files, examples, and documentation).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@lmangani lmangani marked this pull request as ready for review March 10, 2026 13:14
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/lego.json`:
- Around line 1-5: This preset must explicitly set non-vocal conditioning so
dit-vae triggers the documented instrumental path; add a top-level "lyrics"
property with the value "[Instrumental]" in the same JSON object alongside
"caption", "audio_cover_strength", "inference_steps", and "guidance_scale" so
the guitar-stem example is forced to non-vocal conditioning.

In `@README.md`:
- Around line 261-269: Update the dit-vae CLI reference to include the new
--lego <track> flag and its required --src-audio option, mirroring the Lego
section: show that only the base model "acestep-v15-base" supports lego mode,
recommend setting audio_cover_strength=1.0 in the request, list the available
track names (vocals, backing_vocals, drums, bass, guitar, keyboard, percussion,
strings, synth, fx, brass, woodwinds), and reference examples/lego.json and
examples/lego.sh as usage examples; ensure the dit-vae usage/flags block
documents the dependency between --lego and --src-audio.

In `@tools/dit-vae.cpp`:
- Around line 116-117: Validate the provided --lego argument against the
documented whitelist instead of accepting any string: in the command-line
parsing branch that checks strcmp(argv[i], "--lego") and assigns lego_track (and
the similar block around the 429-440 range), check the value against a
predefined array or set of allowed track names (e.g., allowed_lego_tracks) and
if it is not present, print a clear error message and exit with a non-zero
status; perform this validation immediately when parsing (or right after
assignment) so typos like "guitarr" fail fast.
- Around line 155-158: The check that --lego requires --src-audio is incomplete:
also fail fast if --lego is used with a non-base DiT; update the validation (the
block that checks lego_track and src_audio_path) to additionally inspect the
variable that holds the chosen DiT variant/name (the same place that later loads
model weights) and return an error when the variant is not the base DiT (e.g.
print "[CLI] ERROR: --lego only supported with base DiT weights" and return 1).
Apply the same augmented validation at the other site mentioned (the later
validation around lines 432-440) so both early checks reject unsupported
configurations before loading turbo/SFT weights.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3475192d-227e-4c60-8fcd-ebe16a66fb53

📥 Commits

Reviewing files that changed from the base of the PR and between a56c9c6 and fc318db.

📒 Files selected for processing (5)
  • .github/workflows/ci-build.yml
  • README.md
  • examples/lego.json
  • examples/lego.sh
  • tools/dit-vae.cpp

Removed the echo statement for ensuring the base model.
Removed echo statements for steps in the script.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/lego.sh (1)

14-16: Script assumes execution from the examples/ directory.

The relative paths (../build/, ../models/) require the script to be run from the examples/ directory. Consider adding a cd to the script's directory or documenting this requirement.

🔧 Suggested fix to handle working directory
 set -eu
 
+# Ensure we're running from the examples directory
+cd "$(dirname "$0")"
+
 # Step 0: ensure the base model is available

Also applies to: 18-23

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/lego.sh` around lines 14 - 16, The script uses relative paths like
../build/ and ../models/ which require running from the examples/ directory;
update the script entrypoint (e.g., in examples/lego.sh) to compute the script
directory (using dirname "$0" / script path resolution) and cd to that directory
or build absolute paths from that script directory before invoking
../build/ace-qwen3 and the model path so the script works regardless of the
current working directory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/lego.sh`:
- Around line 4-5: The "Step zero" comment in examples/lego.sh promises
downloading/ensuring the base DiT model but has no implementation; either add an
idempotent check-and-download block that verifies the base model file or
directory exists and if missing uses curl/wget (or an env var) to fetch it,
validates success and exits non-zero on failure, or else remove the misleading
"step zero"/"ensure the base model is available" comments; update the script to
perform the check before any use of the base model so subsequent steps (the code
referencing the base model) never run without it.

---

Nitpick comments:
In `@examples/lego.sh`:
- Around line 14-16: The script uses relative paths like ../build/ and
../models/ which require running from the examples/ directory; update the script
entrypoint (e.g., in examples/lego.sh) to compute the script directory (using
dirname "$0" / script path resolution) and cd to that directory or build
absolute paths from that script directory before invoking ../build/ace-qwen3 and
the model path so the script works regardless of the current working directory.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e447849b-e792-4896-9e84-5b61292cd832

📥 Commits

Reviewing files that changed from the base of the PR and between fc318db and 715be9a.

📒 Files selected for processing (1)
  • examples/lego.sh

@lmangani
Copy link
Contributor Author

Quick demo of lego.sh generating a random track (bottom) and a matching guitar lego track (top)

lego-mode-demo.mp4

Add error handling for --lego option requiring base DiT model
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tools/dit-vae.cpp (1)

199-205: ⚠️ Potential issue | 🟠 Major

Reject unsupported SFT/non-base DiT weights here, not just turbo.

Line 199 only checks acestep.is_turbo, so LEGO mode still runs with any non-turbo variant. The PR contract is stricter than that: it is base-model-only (acestep-v15-base), with turbo and SFT variants unsupported. Please validate the actual model identity in this guard and fail before proceeding with LEGO mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/dit-vae.cpp` around lines 199 - 205, The current guard only checks
is_turbo and allows LEGO mode with non-base or SFT variants; update the
early-exit that uses lego_track and is_turbo to also validate the model identity
and SFT status from the model metadata (read the model id string metadata field
and an SFT flag via gf_get_string/gf_get_bool on the gf object) and only allow
LEGO when the model id exactly equals "acestep-v15-base" and the SFT flag is
false; if the checks fail, emit the same stderr error and perform gf_close(&gf)
and dit_ggml_free(&model) then return 1 (same behavior/location as the existing
lego_track/is_turbo guard).
🧹 Nitpick comments (1)
tools/dit-vae.cpp (1)

435-442: Avoid silently truncating free-form --lego prompts.

Line 436 caps the generated instruction at 255 bytes. Since --lego intentionally accepts free-form values, longer inputs will be clipped before tokenization and the model will see a different instruction than the user supplied. Building this prompt with std::string would remove that hidden limit.

Based on learnings, in tools/dit-vae.cpp the --lego <track> CLI argument intentionally accepts any string without validation against the documented track list.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/dit-vae.cpp` around lines 435 - 442, The code currently builds the
prompt into a fixed-size char buffer (lego_instruction) using snprintf, which
silently truncates long free-form --lego inputs (lego_track); replace this with
a std::string-based construction so the full user-supplied track string is
preserved: remove lego_instruction and snprintf and instead concatenate into a
std::string (e.g., using std::string instruction = std::string("Generate the ")
+ lego_track + " track based on the audio context:") and update any references
to use that std::string (instruction) so no implicit 255-byte limit remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tools/dit-vae.cpp`:
- Around line 199-205: The current guard only checks is_turbo and allows LEGO
mode with non-base or SFT variants; update the early-exit that uses lego_track
and is_turbo to also validate the model identity and SFT status from the model
metadata (read the model id string metadata field and an SFT flag via
gf_get_string/gf_get_bool on the gf object) and only allow LEGO when the model
id exactly equals "acestep-v15-base" and the SFT flag is false; if the checks
fail, emit the same stderr error and perform gf_close(&gf) and
dit_ggml_free(&model) then return 1 (same behavior/location as the existing
lego_track/is_turbo guard).

---

Nitpick comments:
In `@tools/dit-vae.cpp`:
- Around line 435-442: The code currently builds the prompt into a fixed-size
char buffer (lego_instruction) using snprintf, which silently truncates long
free-form --lego inputs (lego_track); replace this with a std::string-based
construction so the full user-supplied track string is preserved: remove
lego_instruction and snprintf and instead concatenate into a std::string (e.g.,
using std::string instruction = std::string("Generate the ") + lego_track + "
track based on the audio context:") and update any references to use that
std::string (instruction) so no implicit 255-byte limit remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1aa6596a-bf33-437c-b0e7-39339f03d1c4

📥 Commits

Reviewing files that changed from the base of the PR and between f774da5 and 90c365a.

📒 Files selected for processing (1)
  • tools/dit-vae.cpp

@ServeurpersoCom
Copy link
Owner

We'll integrate that quickly :) a little caustic polish and it'll be perfect

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tools/dit-vae.cpp (1)

455-458: Consider using std::transform for the uppercase conversion.

The pipeline flagged this raw loop. Using std::transform is more idiomatic and concise.

Suggested fix
-            std::string track_upper = req.lego;
-            for (char & c : track_upper) {
-                c = (char) toupper((unsigned char) c);
-            }
+            std::string track_upper = req.lego;
+            std::transform(track_upper.begin(), track_upper.end(), track_upper.begin(),
+                           [](unsigned char c) { return std::toupper(c); });

This requires adding #include <algorithm> at the top of the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/dit-vae.cpp` around lines 455 - 458, Replace the manual loop that
uppercases req.lego (variable track_upper) with an idiomatic std::transform
call; include <algorithm> at the top of the file and use
std::transform(track_upper.begin(), track_upper.end(), track_upper.begin(),
[](unsigned char ch){ return (char)std::toupper(ch); }) to preserve the same
unsigned-char casting behavior as the original loop. Ensure the transformation
operates on track_upper and preserves the existing variable name and semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tools/dit-vae.cpp`:
- Around line 455-458: Replace the manual loop that uppercases req.lego
(variable track_upper) with an idiomatic std::transform call; include
<algorithm> at the top of the file and use std::transform(track_upper.begin(),
track_upper.end(), track_upper.begin(), [](unsigned char ch){ return
(char)std::toupper(ch); }) to preserve the same unsigned-char casting behavior
as the original loop. Ensure the transformation operates on track_upper and
preserves the existing variable name and semantics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b7bdc21d-d5ad-418a-9e72-673e5db73d49

📥 Commits

Reviewing files that changed from the base of the PR and between 90c365a and aaaa027.

📒 Files selected for processing (6)
  • README.md
  • examples/lego.json
  • examples/lego.sh
  • src/request.cpp
  • src/request.h
  • tools/dit-vae.cpp
✅ Files skipped from review due to trivial changes (1)
  • README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/lego.sh

@ServeurpersoCom ServeurpersoCom merged commit a891944 into ServeurpersoCom:master Mar 10, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants