Skip to content

Optimize CELT inverse FFT#121

Merged
zshang-oai merged 1 commit into
mainfrom
codex/optimize-celt-iterative-fft
Jun 1, 2026
Merged

Optimize CELT inverse FFT#121
zshang-oai merged 1 commit into
mainfrom
codex/optimize-celt-iterative-fft

Conversation

@zshang-oai

Copy link
Copy Markdown
Contributor

Summary

Replace CELT's recursive complex inverse FFT with an iterative mixed-radix implementation.

This PR changes only internal/celt/synthesis.go. It is a pure-Go optimization: the surrounding inverse MDCT mapping, CELT synthesis behavior, decoder API, and encoded-bitstream handling remain unchanged.

Context

CELT reconstructs time-domain audio through the following synthesis pipeline:

frequency coefficients
  -> MDCT pre-rotation
  -> complex inverse FFT
  -> MDCT post-rotation
  -> windowing and overlap-add
  -> PCM samples

This PR changes only the complex inverse FFT in the middle of that pipeline.

Previous algorithm

The previous implementation recursively divided each FFT into smaller sub-transforms:

FFT(N)
  -> FFT(N / radix)
  -> FFT(N / radix)
  -> ...
  -> combine sub-transforms with twiddle factors

The recursive implementation is mathematically correct, but it adds avoidable work in a CELT decode hotspot:

  • recursive function calls for every transform stage
  • repeated slice construction while descending into sub-transforms
  • generic nested summation loops for every radix
  • repeated twiddle-factor index calculations
  • less predictable memory access during recursive traversal

New algorithm

The new implementation performs the same mixed-radix inverse FFT iteratively.

When the inverse-transform plan is created, it now precomputes:

  1. The FFT factorization into small radices.
  2. The twiddle-factor table.
  3. The input permutation used to arrange values for iterative processing.

During decoding, the hot path becomes:

1. Reorder inputs once using the cached permutation.
2. Process each FFT stage iteratively.
3. Use a specialized butterfly for radix 2, 3, 4, or 5.
4. Fall back to the generic butterfly only for unexpected radices.

A butterfly combines smaller transforms into a larger one. For radix 2, the operation is conceptually:

a = input0
b = input1 * twiddle

output0 = a + b
output1 = a - b

The radix-3, radix-4, and radix-5 paths apply equivalent specialized formulas. These avoid the generic inner summation loop for the normal Opus transform sizes.

Why this is faster

The asymptotic complexity remains O(N log N). The improvement comes from reducing constant costs in the decode hot path:

  • removes recursive traversal
  • caches FFT factorization
  • caches twiddle values
  • caches the input permutation
  • uses specialized small-radix arithmetic
  • processes iterative stages with more predictable memory access

Performance

Measured with the production-shaped RFC conformance-vector benchmark: 48 kHz stereo serial DecodeToInt16, with verification excluded from benchmark timing.

BenchmarkPionDecodeToInt16Conformance48kStereoSerial
merged main median: 20,517 packets/s
this PR median:     27,313 packets/s
improvement:        +33.1%

The comparison used alternating full-corpus runs to reduce machine-load bias.

Validation

  • go test ./...
  • GOLANGCI_LINT_CACHE=/private/tmp/golangci-lint-celt-iterative-fft-pr golangci-lint run
  • Full RFC conformance matrix:
OPUS_RFC6716_REFERENCE=/Users/zshang/code/archieve/opus-rfc6716 \
OPUS_RFC6716_TESTVECTORS=/Users/zshang/code/archieve/opus-rfc6716/testvectors \
go test -tags conformance -run '^TestRFC6716Conformance$' -count=1

All checks pass.

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.37500% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.98%. Comparing base (2920812) to head (97ba6a8).

Files with missing lines Patch % Lines
internal/celt/synthesis.go 89.37% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
+ Coverage   82.87%   82.98%   +0.11%     
==========================================
  Files          26       26              
  Lines        5711     5825     +114     
==========================================
+ Hits         4733     4834     +101     
- Misses        751      765      +14     
+ Partials      227      226       -1     
Flag Coverage Δ
go 82.98% <89.37%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

RFC 6716 / 8251 conformation

Status: pass

The action extracts the RFC 6716 reference implementation, applies the RFC 8251 decoder update patch, and then builds the patched reference tools.

Legend: numeric cells are opus_compare quality percentages; FAIL means the vector did not pass.

Inputs use the shared RFC 6716 / RFC 8251 bitstream corpus; accepted references follow RFC 8251 Section 11.

rate ch 01 02 03 04 05 06 07 08 09 10 11 12
8000 1 91.4 59.7 66.3 75.1 75.0 67.8 76.0 70.0 75.5 85.9 91.0 43.4
8000 2 93.3 57.6 66.1 75.3 75.2 67.9 76.0 70.4 76.2 86.0 93.0 43.7
12000 1 95.6 83.4 71.8 79.1 77.0 69.0 85.1 81.6 84.8 88.1 94.9 66.0
12000 2 96.0 83.3 71.3 79.2 77.3 69.1 85.1 81.8 85.2 87.0 95.8 66.1
16000 1 95.3 91.4 88.1 81.6 77.2 68.9 89.9 86.2 78.8 89.5 96.3 56.5
16000 2 94.7 90.7 88.1 80.6 77.6 69.1 89.8 87.6 78.9 87.5 96.4 56.7
24000 1 96.7 92.0 83.2 85.9 77.5 68.4 93.9 92.4 89.2 95.4 97.9 68.5
24000 2 96.8 90.6 82.8 86.1 77.8 68.8 93.9 93.5 92.1 87.7 98.1 68.6
48000 1 98.4 92.1 87.7 85.9 77.4 68.3 98.1 96.2 95.9 96.0 98.4 88.8
48000 2 99.8 90.6 87.8 86.1 77.7 68.6 99.6 93.7 94.4 87.7 99.7 88.9
Run output
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector07
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector09: Opus quality metric: 95.9 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector06
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector10: Opus quality metric: 96.0 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector05
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector08: Opus quality metric: 96.2 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector04
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector07: Opus quality metric: 98.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector03
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector06: Opus quality metric: 68.3 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector02
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector05: Opus quality metric: 77.4 %
=== CONT  TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector01
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector03: Opus quality metric: 87.7 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector12
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector04: Opus quality metric: 85.9 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector11
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector02: Opus quality metric: 90.6 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector10
TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector01: Opus quality metric: 98.4 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector09
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector12: Opus quality metric: 68.6 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector08
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector11: Opus quality metric: 98.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector07
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector10: Opus quality metric: 87.7 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector06
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector09: Opus quality metric: 92.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector05
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector08: Opus quality metric: 93.5 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector04
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector07: Opus quality metric: 93.9 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector03
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector06: Opus quality metric: 68.8 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector08
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector03: Opus quality metric: 82.8 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector01
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector04: Opus quality metric: 86.1 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector12
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector05: Opus quality metric: 77.8 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector11
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector08: Opus quality metric: 92.4 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector10
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector12: Opus quality metric: 68.5 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector09
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector11: Opus quality metric: 97.9 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector05
TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector01: Opus quality metric: 96.8 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector07
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector09: Opus quality metric: 89.2 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector06
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector10: Opus quality metric: 95.4 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector04
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector05: Opus quality metric: 77.5 %
=== CONT  TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector03
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector07: Opus quality metric: 93.9 %
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector03: Opus quality metric: 83.2 %
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector06: Opus quality metric: 68.4 %
TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector04: Opus quality metric: 85.9 %
Opus conformance matrix
Legend: numeric cells are opus_compare quality percentages; FAIL means the vector did not pass.
Inputs use the shared RFC 6716 / RFC 8251 bitstream corpus; accepted references follow RFC 8251 Section 11.
+----------+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| rate     | ch | 01    | 02    | 03    | 04    | 05    | 06    | 07    | 08    | 09    | 10    | 11    | 12    |
+----------+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| 8000     | 1  | 91.4  | 59.7  | 66.3  | 75.1  | 75.0  | 67.8  | 76.0  | 70.0  | 75.5  | 85.9  | 91.0  | 43.4  |
| 8000     | 2  | 93.3  | 57.6  | 66.1  | 75.3  | 75.2  | 67.9  | 76.0  | 70.4  | 76.2  | 86.0  | 93.0  | 43.7  |
| 12000    | 1  | 95.6  | 83.4  | 71.8  | 79.1  | 77.0  | 69.0  | 85.1  | 81.6  | 84.8  | 88.1  | 94.9  | 66.0  |
| 12000    | 2  | 96.0  | 83.3  | 71.3  | 79.2  | 77.3  | 69.1  | 85.1  | 81.8  | 85.2  | 87.0  | 95.8  | 66.1  |
| 16000    | 1  | 95.3  | 91.4  | 88.1  | 81.6  | 77.2  | 68.9  | 89.9  | 86.2  | 78.8  | 89.5  | 96.3  | 56.5  |
| 16000    | 2  | 94.7  | 90.7  | 88.1  | 80.6  | 77.6  | 69.1  | 89.8  | 87.6  | 78.9  | 87.5  | 96.4  | 56.7  |
| 24000    | 1  | 96.7  | 92.0  | 83.2  | 85.9  | 77.5  | 68.4  | 93.9  | 92.4  | 89.2  | 95.4  | 97.9  | 68.5  |
| 24000    | 2  | 96.8  | 90.6  | 82.8  | 86.1  | 77.8  | 68.8  | 93.9  | 93.5  | 92.1  | 87.7  | 98.1  | 68.6  |
| 48000    | 1  | 98.4  | 92.1  | 87.7  | 85.9  | 77.4  | 68.3  | 98.1  | 96.2  | 95.9  | 96.0  | 98.4  | 88.8  |
| 48000    | 2  | 99.8  | 90.6  | 87.8  | 86.1  | 77.7  | 68.6  | 99.6  | 93.7  | 94.4  | 87.7  | 99.7  | 88.9  |
+----------+----+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
--- PASS: TestRFC6716Conformance (129.85s)
    --- PASS: TestRFC6716Conformance/vectors (0.00s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector02 (2.24s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector08 (2.25s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector01 (2.36s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector02 (3.55s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector07 (1.90s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector06 (2.05s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector05 (2.22s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector04 (2.09s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector03 (1.68s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector02 (1.93s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector01 (2.59s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector12 (3.79s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector11 (4.52s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector10 (4.98s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector09 (4.34s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector08 (4.03s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector07 (3.40s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector06 (3.77s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector05 (4.01s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector03 (3.03s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector04 (3.82s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector01 (2.90s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector12 (4.03s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector11 (4.71s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector09 (4.66s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector10 (5.27s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector08 (4.27s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector07 (3.56s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector06 (3.93s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector05 (4.26s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector04 (4.03s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector03 (3.20s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector12 (2.06s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector02 (3.76s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector11 (2.51s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_2/testvector01 (4.71s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector09 (2.52s)
        --- PASS: TestRFC6716Conformance/vectors/rate_16000/channels_1/testvector10 (2.85s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector12 (1.93s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector11 (2.39s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_2/testvector01 (4.49s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector10 (2.73s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector09 (2.39s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector07 (1.79s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector08 (2.09s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector06 (1.98s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector05 (2.11s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector03 (1.54s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector04 (1.97s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector02 (1.82s)
        --- PASS: TestRFC6716Conformance/vectors/rate_12000/channels_1/testvector01 (2.42s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector12 (3.68s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector11 (4.32s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector10 (4.82s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector09 (4.23s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector08 (3.87s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector07 (3.25s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector06 (3.59s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector05 (3.89s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector03 (2.91s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector04 (3.67s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector02 (3.43s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector12 (1.92s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector11 (2.28s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_2/testvector01 (4.29s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector09 (2.33s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector10 (2.64s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector08 (2.00s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector07 (1.72s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector06 (1.90s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector05 (2.03s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector03 (1.49s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector04 (1.88s)
        --- PASS: TestRFC6716Conformance/vectors/rate_8000/channels_1/testvector02 (1.73s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector02 (3.38s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector12 (7.10s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector11 (8.14s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector10 (8.92s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector09 (7.72s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector08 (7.33s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector07 (6.15s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector06 (6.84s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector05 (7.40s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector03 (5.64s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector04 (7.10s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector12 (3.60s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector02 (6.64s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector11 (4.18s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_2/testvector01 (8.05s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector09 (4.09s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector10 (4.70s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector08 (3.76s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector07 (3.15s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector06 (3.51s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector05 (3.80s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector03 (2.88s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector04 (3.61s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector02 (4.31s)
        --- PASS: TestRFC6716Conformance/vectors/rate_48000/channels_1/testvector01 (4.23s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector12 (4.63s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector11 (5.38s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector10 (5.95s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector09 (5.26s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector08 (4.85s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector07 (4.04s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector06 (4.51s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector03 (3.68s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector04 (4.65s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector05 (4.91s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector08 (2.50s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector12 (2.39s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector11 (2.82s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_2/testvector01 (5.40s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector09 (2.79s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector10 (3.21s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector05 (2.53s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector07 (2.14s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector03 (1.87s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector06 (2.27s)
        --- PASS: TestRFC6716Conformance/vectors/rate_24000/channels_1/testvector04 (2.22s)
PASS
ok  	github.com/pion/opus	129.858s

@zshang-oai zshang-oai marked this pull request as ready for review June 1, 2026 17:07
@zshang-oai zshang-oai requested review from JoTurk, Sean-Der and Copilot June 1, 2026 17:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces CELT's recursive complex inverse FFT with an iterative mixed-radix (2/3/4/5) implementation that precomputes the FFT factorization, twiddle table, and bit-reversal permutation when the inverse-transform plan is constructed. The surrounding inverse-MDCT pipeline, decoder API, and bitstream handling are unchanged. The author reports a ~33% throughput improvement on the 48 kHz stereo conformance benchmark, with go test ./... and the full RFC 6716 conformance matrix passing.

Changes:

  • Replaces fftCos/fftSin arrays with a unified fftTwiddles []complex32 table, and adds fftBitrev and fftFactors to the plan.
  • Removes the recursive driver (inverseComplexFFTRecursive, fftRadix, fftTwiddle) and replaces it with an iterative driver plus specialized radix-2/3/4/5 butterflies and a generic fallback.
  • Adds plan-construction helpers (newFFTFactors, newFFTBitrev, fillFFTBitrev) and small complex32 arithmetic helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zshang-oai zshang-oai merged commit 71d5847 into main Jun 1, 2026
21 checks passed
@zshang-oai zshang-oai deleted the codex/optimize-celt-iterative-fft branch June 1, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants