Optimize forward DFT with Cooley-Tukey FFT#125
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #125 +/- ##
==========================================
+ Coverage 82.66% 82.88% +0.21%
==========================================
Files 28 29 +1
Lines 6046 6110 +64
==========================================
+ Hits 4998 5064 +66
+ Misses 812 811 -1
+ Partials 236 235 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
2f99996 to
1f7b4bd
Compare
1f7b4bd to
fa9b6b9
Compare
JoTurk
left a comment
There was a problem hiding this comment.
Thank you it would be cool to add a few benchmarks.
There was a problem hiding this comment.
Pull request overview
This PR speeds up the CELT encoder’s forward complex DFT (used in forwardMDCT) by replacing the previous O(N²) implementation with a Cooley–Tukey mixed-radix FFT optimized for CELT frame sizes (2^k × 15). This aligns the encoder path with the existing FFT-based approach already used on the decoder side.
Changes:
- Replaced
forwardComplexDFT’s naive DFT with an FFT call plus 1/N scaling. - Added a mixed-radix FFT implementation (
radix-2stages + direct DFT for the odd factor). - Added unit tests and benchmarks validating FFT correctness vs a naive DFT and round-trip properties.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| internal/celt/mdct.go | Switches encoder-side forward complex DFT to FFT + scaling. |
| internal/celt/fft.go | Introduces mixed-radix FFT implementation used by forwardComplexDFT. |
| internal/celt/fft_test.go | Adds correctness tests (vs naive DFT / round-trip) and benchmarks for the new FFT. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
35a38b8 to
fd5be78
Compare
Description
forwardComplexDFT in the encoder path was doing an O(N²) naive DFT, at N=480 thats ~230k multiply-accumulates per call.
Added a Cooley-Tukey mixed-radix FFT in fft.go. CELT frame sizes factor as 2^k × 15, so it handles the radix-2 part with bit-reversal +
butterfly and uses a direct O(K²) DFT for the odd factor 15 — small enough that it doesn't matter.
Benchmarks on my machine (N=480): ~6080µs naive vs ~233µs with FFT, roughly 26x.
The decoder side (inverseComplexDFT) already had the butterfly FFT from a previous pass, this just closes the gap on the encoder side.
Reference issue
Fixes #...