Optimize CELT inverse FFT#121
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #121 +/- ##
==========================================
+ Coverage 82.87% 82.98% +0.11%
==========================================
Files 26 26
Lines 5711 5825 +114
==========================================
+ Hits 4733 4834 +101
- Misses 751 765 +14
+ Partials 227 226 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
RFC 6716 / 8251 conformationStatus: pass The action extracts the RFC 6716 reference implementation, applies the RFC 8251 decoder update patch, and then builds the patched reference tools. Legend: numeric cells are Inputs use the shared RFC 6716 / RFC 8251 bitstream corpus; accepted references follow RFC 8251 Section 11.
Run output |
There was a problem hiding this comment.
Pull request overview
Replaces CELT's recursive complex inverse FFT with an iterative mixed-radix (2/3/4/5) implementation that precomputes the FFT factorization, twiddle table, and bit-reversal permutation when the inverse-transform plan is constructed. The surrounding inverse-MDCT pipeline, decoder API, and bitstream handling are unchanged. The author reports a ~33% throughput improvement on the 48 kHz stereo conformance benchmark, with go test ./... and the full RFC 6716 conformance matrix passing.
Changes:
- Replaces
fftCos/fftSinarrays with a unifiedfftTwiddles []complex32table, and addsfftBitrevandfftFactorsto the plan. - Removes the recursive driver (
inverseComplexFFTRecursive,fftRadix,fftTwiddle) and replaces it with an iterative driver plus specialized radix-2/3/4/5 butterflies and a generic fallback. - Adds plan-construction helpers (
newFFTFactors,newFFTBitrev,fillFFTBitrev) and smallcomplex32arithmetic helpers.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Replace CELT's recursive complex inverse FFT with an iterative mixed-radix implementation.
This PR changes only
internal/celt/synthesis.go. It is a pure-Go optimization: the surrounding inverse MDCT mapping, CELT synthesis behavior, decoder API, and encoded-bitstream handling remain unchanged.Context
CELT reconstructs time-domain audio through the following synthesis pipeline:
This PR changes only the complex inverse FFT in the middle of that pipeline.
Previous algorithm
The previous implementation recursively divided each FFT into smaller sub-transforms:
The recursive implementation is mathematically correct, but it adds avoidable work in a CELT decode hotspot:
New algorithm
The new implementation performs the same mixed-radix inverse FFT iteratively.
When the inverse-transform plan is created, it now precomputes:
During decoding, the hot path becomes:
A butterfly combines smaller transforms into a larger one. For radix 2, the operation is conceptually:
The radix-3, radix-4, and radix-5 paths apply equivalent specialized formulas. These avoid the generic inner summation loop for the normal Opus transform sizes.
Why this is faster
The asymptotic complexity remains
O(N log N). The improvement comes from reducing constant costs in the decode hot path:Performance
Measured with the production-shaped RFC conformance-vector benchmark: 48 kHz stereo serial
DecodeToInt16, with verification excluded from benchmark timing.The comparison used alternating full-corpus runs to reduce machine-load bias.
Validation
go test ./...GOLANGCI_LINT_CACHE=/private/tmp/golangci-lint-celt-iterative-fft-pr golangci-lint runAll checks pass.