Eliza/token trie sampler#16
Merged
Merged
Conversation
…::all
Two unrelated CI failures uncovered by the wave-9 CI greening pass:
1. ggml_istft layout mismatch — caused arm64-cpu-high-perf SIGABRT
ggml_istft (ggml.c:5105) declares the mag_phase input as
ne[0]=2 (mag/phase channel), ne[1]=F (n_fft/2+1), ne[2]=T (frames)
but the CPU op (ggml-cpu/ops.cpp:11505) AND the CUDA kernel
(ggml-cuda/istft.cu:137) were both reading T from ne[0] and asserting
T_ne == T, where T = ne[2] (= 2 in the new test). That fired
GGML_ASSERT(T_ne == T) inside test-backend-ops -b CPU on arm64,
aborting the entire test suite immediately after the ATTN_SCORE_POLAR
cases.
Fix the kernel + the test together to match the API contract:
- tests/test-backend-ops.cpp: ggml_new_tensor_3d(F32, 2, F, n_frames)
(was wrongly (n_frames, F, 2) — backwards from the API)
- ggml-cpu/ops.cpp: read CH=ne[0], F=ne[1], T=ne[2], and index the
interleaved [ch, f, t] storage as data[t*(2*F) + f*2 + ch].
- ggml-cuda/istft.cu: same change, plus take a single mag_phase
base pointer (channels are interleaved, not split planes).
Verified: ./build-codex-merge/bin/test-backend-ops -b CPU passes
15822/15822 tests (4 ISTFT cases now run for real, not aborting).
2. ggml::all missing from ggml-config.cmake — broke build-cmake-pkg/linux
examples/simple-cmake-pkg links against `ggml::all`, but our
ggml/cmake/ggml-config.cmake.in (rewritten to use ggml-targets.cmake)
dropped the synthetic INTERFACE target that upstream provides.
Add it back: aggregate every ggml::<backend> from GGML_AVAILABLE_BACKENDS
into a single INTERFACE IMPORTED target, matching upstream behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three independent failures on every push to eliza/token-trie-sampler: 1. Check vendor — vendor/sheredom/subprocess.h was stale vs the pinned upstream URL (b49c56e9fe21...). The fork's earlier "stopping_thread hang on child process exit" diff (67a7818) has already been incorporated upstream at that pin, so the local file just needed to be re-synced with `scripts/sync_vendor.py`. Drop the 5 obsolete lines so check-vendor reports a clean tree. 2. flake8 — convert_hf_to_gguf.py imported `transformers.AutoConfig` but only referenced it in comments. F401 unused-import. Removed. 3. Python Type-Check (ty) — conversion/qwen.py defines `_Qwen35MtpMixin` whose `super().*` chain resolves at runtime via the composed Model subclasses in convert_hf_to_gguf.py. ty cannot see that composition at the mixin level, so it flagged super().filter_tensors, super().set_gguf_parameters, super().prepare_metadata, super().modify_tensors and the inherited `ftype` / `metadata` attributes as unresolved. Add ./conversion/** to the existing unresolved-* override block (same treatment as tools/kokoro/tools). Verified locally: - `python3 scripts/sync_vendor.py` leaves vendor/ unchanged. - `python3 -m flake8 convert_hf_to_gguf.py` is clean for F401. - ty override covers the previously-flagged lines. Refs: PR #15
The vulkan-x64 Windows job fails with 'SPIRV-Headers config not found' because LunarG's Vulkan SDK installer (1.4.313.2) does not ship a CMake config for SPIRV-Headers on Windows, but ggml/src/ggml-vulkan/ CMakeLists.txt has 'find_package(SPIRV-Headers REQUIRED)'. Fix: - Clone + install KhronosGroup/SPIRV-Headers from the matching SDK tag (vulkan-sdk-<VULKAN_VERSION>) into RUNNER_TEMP and expose SPIRV-Headers_DIR + CMAKE_PREFIX_PATH so find_package succeeds. - Add 'fail-fast: false' to the windows-latest matrix so a single failing variant no longer cancels every other Windows build.
Adds the missing Metal backend dispatch for the iSTFT op so test-backend-ops on MTL0 now runs (and passes) all four ISTFT parity cases instead of reporting "not supported". Concrete changes: - ggml/src/ggml-metal/eliza-shipped/istft.metal: NEW corrected kernel. Uses the actual ggml_istft tensor layout [2, F, T] (ne[0]=2 mag/phase channel, interleaved as data[t*(2*F) + f*2 + ch]) — matches the CPU reference at ggml/src/ggml-cpu/ops.cpp fixed in 3de6dc8. The previously-staged version under eliza-kernels/istft.metal read mag_phase from split mag/phase half-planes which never matched the ggml_istft contract. Supports an optional `use_window` flag so the with_window=false test cases synthesise a periodic Hann inline. - ggml/src/ggml-metal/CMakeLists.txt: compile eliza-shipped/istft.metal into default.metallib for both the embedded-library path (iOS) and the on-disk metallib path (macOS). - ggml/src/ggml-metal/ggml-metal-impl.h: ggml_metal_kargs_istft (mirrors IstftParams in the shader). - ggml/src/ggml-metal/ggml-metal-device.{h,cpp}: ggml_metal_library_get_pipeline_istft kernel-name lookup. - ggml/src/ggml-metal/ggml-metal-ops.{h,cpp}: ggml_metal_op_istft dispatch (binds src0/src1/dst + IstftParams, dispatches one thread per output sample in a 32-wide threadgroup capped at n_out). - ggml/src/ggml-metal/ggml-metal-device.m: GGML_OP_ISTFT case in ggml_metal_supports_op, gated on F32 contiguous src0 with ne[0]==2 and (optional) F32 contiguous src1. Verified on Apple M4 Max: build-codex-merge/bin/test-backend-ops -b MTL0 -o ISTFT -> 4/4 PASS (n_fft=20/256/512, with_window=0 and 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…OICE Cross-platform compile fixes that broke the previous default-on attempt (7c36249, reverted in 5b53366): - Windows MSVC: kokoro-istft.cpp used M_PI (a POSIX/GNU extension behind _USE_MATH_DEFINES). Replaced with a local K_PI constexpr, same pattern as the prior ggml/src/ggml-cpu/ops.cpp iSTFT fix (91c580a). Also added an explicit #include <string> to kokoro-predictor.h, which references std::string in its public signature. - Android NDK: tools/omnivoice/CMakeLists.txt unconditionally linked omnivoice-dac-parity against ggml-cpu / ggml-base. When those CMake targets are not defined on a given cross-compile config, CMake silently falls back to literal `-lggml-cpu` link flags against a phantom library. Guarded both link calls with `if(TARGET ggml-cpu)` / `if(TARGET ggml-base)`, matching tools/kokoro/CMakeLists.txt. - iOS XCFramework (LLAMA_BUILD_TOOLS=OFF, LLAMA_BUILD_MTMD=OFF): the fused `elizainference` library links mtmd_* symbols via eliza-inference-ffi.cpp (the ASR pipeline). Gated the whole `elizainference` target on `if(TARGET mtmd)` so iOS builds skip it entirely. omnivoice_lib + omnivoice-tts continue to build (they don't touch mtmd); the Eliza iOS runtime can opt back in by setting LLAMA_BUILD_MTMD=ON when ASR is needed. K5 Kokoro + OmniVoice now build clean on Darwin (verified locally: kokoro-tts, omnivoice-tts, omnivoice-dac-parity, elizainference, plus llama-cli smoke test passing). Cross-platform CI (Windows, Android, Apple/Xcode) will confirm the other three platforms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pairs with 28603bd (Metal ISTFT dispatch). The Vulkan iSTFT path had two correctness bugs that left it at 0/4 on test-backend-ops parity vs CPU despite the dispatch + supports_op being fully wired: 1. Tensor layout mismatch. Both istft.comp and the .cpp dispatch were reading T from src0->ne[0] and indexing mag_phase as two split planes (mag at [0..F*T], phase at [F*T..2*F*T]). ggml_istft (ggml.c:5105) actually lays mag_phase out as [2, F, T] with ne[0]=2 (mag/phase channel), ne[1]=F, ne[2]=T and interleaves channels — element [ch,f,t] sits at data[t*(2*F) + f*2 + ch]. Same fix that landed for CPU + CUDA in 3de6dc8, now applied to Vulkan. 2. Window tensor was ignored. When the user passes an explicit window (test_istft with_window=true), the shader silently substituted a periodic Hann, so the parity check fed two different windows to the two backends and naturally diverged. The shader now binds the window buffer at binding 1 and gates on a new use_window push constant; the dispatch path passes src1 through to ggml_vk_op_f32 unconditionally (aliasing src0 when no window tensor exists, which keeps the descriptor set valid since the shader never reads it in that case). Side-cleanup: the unused vk_op_istft_ola_push_constants struct is gone; the OLA-pass shader (istft_ola.comp) was never registered or dispatched and has been left untouched — it remains dead but out-of-scope for this fix. Verified on Apple M4 Max (MoltenVK): build-vulkan-validate/bin/test-backend-ops -b Vulkan0 -o ISTFT -> 4/4 PASS (n_fft=20/256/512, with_window=0 and 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that the Metal (28603bd) and Vulkan (e59d53f) iSTFT dispatch paths are wired and pass all 4 test-backend-ops parity cases against the CPU reference at ggml-cpu/ops.cpp:ggml_compute_forward_istft_f32, promote ISTFT from "no CI coverage" straight to hard-fail in both validation workflows. - eliza-metal-validation.yml: add ISTFT to the test-backend-ops op sweep and add a gated check block that fails the job on any ISTFT FAIL line or "not supported" line. - eliza-vulkan-validation.yml: same pattern, against build-vulkan. A FAIL means the GPU iSTFT diverges from the CPU reference (typically a layout regression or a missing Hermitian-symmetry term). A "not supported" line means the supports_op gate regressed and the parity test silently stopped running — also a hard fail because we need to know the moment GPU ISTFT coverage drops out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds scripts/cuda-mtp-validate.sh and extends the cuda-runtime-validation job in eliza-cuda-validation.yml to actually exercise the MTP K-snapshot path on a real NVIDIA GPU. Closes the TODO(cuda-mtp-validation) marker in ggml/src/ggml-cuda/gated_delta_net.cu (from commit 142e7ac -- port of upstream PR #22673 multi-token-prediction state snapshots). What's covered: - Full test-backend-ops -b CUDA0 -o GATED_DELTA_NET sweep (CPU reference vs CUDA, all registered K=1 + K>1 cases including KDA + permuted). - Assertion that >=4 K>1 cases were actually scheduled (guards against silent skip / kernel registration regressions). - Optional llama-cli end-to-end smoke with --spec-type draft-mtp on the unsloth Qwen3.5-2B MTP GGUF (skipped cleanly when the model is absent). The script works in two modes: bash scripts/cuda-mtp-validate.sh --list-tests # no GPU, dry-run bash scripts/cuda-mtp-validate.sh --build-only # compile only bash scripts/cuda-mtp-validate.sh # real GPU required The CI runtime job remains gated on the self-hosted-cuda runner label and is auto-skipped when that runner isn't provisioned -- no false positives on hosts without a GPU. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Kokoro decode path at tools/kokoro/src/kokoro.cpp:547 built the iSTFT
input tensor as ne={T,F,2,1} (time, freq, channel, batch), but commit
3de6dc8 fixed ggml_istft + Metal/Vulkan/CPU kernels to use the API
contract ne={2,F,T} (channel-first interleaved). The K5 Kokoro caller
wasn't updated then, so kokoro-tts aborted with
GGML_ASSERT(CH_ne == 2) failed in ggml_compute_forward_istft.
This commit aligns the caller. Verified end-to-end: kokoro-tts now
produces a valid WAV from the staged GGUF in both eliza-1-2b and
eliza-1-4b bundles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ing decl ggml_graph_compute_with_ctx was removed from upstream ggml in favor of the ggml-backend interface. omnivoice-dac-parity referenced the removed symbol -> undefined reference on CI (android, vulkan, 3rd-party). Swap to ggml_backend_cpu_init + ggml_backend_cpu_set_n_threads + ggml_backend_graph_compute + ggml_backend_free. Tensors live in the user-managed ctx buffer; the CPU backend computes against them directly. Also adds a forward declaration for kokoro_model_ggml_ctx immediately before its definition in kokoro.cpp so -Wmissing-declarations sees a prior declaration at the definition site. The matching extern lives in the sibling TU (kokoro-predictor.cpp). CI (3rd-party / android / vulkan) now build clean. Smoke test verified: kokoro-tts still produces real audio (36480 samples @ 24kHz, peak 0.0861) from the staged GGUF in eliza-1-2b.bundle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Additional information
Requirements