Skip to content

Performance update for permutations.jl#205

Open
depial wants to merge 13 commits intoJuliaMath:masterfrom
depial:permutations-update
Open

Performance update for permutations.jl#205
depial wants to merge 13 commits intoJuliaMath:masterfrom
depial:permutations-update

Conversation

@depial
Copy link

@depial depial commented Dec 18, 2025

This update contains improvements to the performance of permutations.jl, keeping the underlying algorithm (nextpermutation()) in place. See below for various benchmarks (more can be found in Issue #204 with the benchmarking after the first post directly relevant to this implementation).

The main strategy was to move potential overhead away from the performance critical methods of nextpermutation and iterate to the constructors by standardizing input. One step involves separating Permutations from MultiSetPermutations by reverting to a previous version of the iterate method for Permutations (resulting in a large performance boost).

Special attention was paid to reducing the number of allocations made by both permutations() and multiset_permutations(). To this end, one of the larger changes involves now modifying the state in place during iteration (while the data in the structs remains unchanged). I can't currently see this as an issue since the algorithm is serial and can't be parallelized.

In total, these modifications see a cut in allocations to 1/3 and 1/2 their current numbers for permutations() and multiset_permutations(), respectively, and bring their performance into line with Heap's Permutation Algorithm.

Other notes:

  • mutlitset_permutations() and permutations() now have the same performance on collections with unique elements (where v.1.1.0 has multiset_permutations() outperforming permutations()).
  • The constructors now homogenize the input to the structs, with data and m always being a Vector{T} where T is the element type of the input.
  • Attention has been paid to accept any input which is indexable (i.e. no need to be iterable).
  • Tested and working with Vectors, Multidimensional Arrays, Sparse Arrays and Offset Arrays (i.e. covering LinearIndex, CartesianIndex and offset indices).
  • permutations() is now type safe (in line with multiset_permutations()), always returning a Permutations type.
  • multiset_permutations(m, t) is now linear (vs the current $O(n^2)$ version), however, this is likely not terribly important since input size is highly limited.
Benchmarks

collect(permutations(1:10))

Before (v1.1.0)

BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min  max):  570.045 ms     1.280 s  ┊ GC (min  max):  0.00%  55.55%
 Time  (median):     940.828 ms               ┊ GC (median):    37.16%
 Time  (mean ± σ):   917.961 ms ± 231.521 ms  ┊ GC (mean ± σ):  36.47% ± 18.44%

  ▁                   ▁         ▁  █                          ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  570 ms           Histogram: frequency by time          1.28 s <

 Memory estimate: 1.49 GiB, allocs estimate: 21772839.

After

BenchmarkTools.Trial: 12 samples with 1 evaluation per sample.
 Range (min  max):  359.795 ms  588.693 ms  ┊ GC (min  max): 35.43%  50.36%
 Time  (median):     377.968 ms               ┊ GC (median):    35.71%
 Time  (mean ± σ):   418.433 ms ±  75.727 ms  ┊ GC (mean ± σ):  39.63% ±  6.85%

  ▁▁█ █▁   ▁             ▁ ▁                      ▁           ▁  
  ███▁██▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█ ▁
  360 ms           Histogram: frequency by time          589 ms <

 Memory estimate: 526.03 MiB, allocs estimate: 7257607.

collect(multiset_permutations(1:10))

Before (v1.1.0)

BenchmarkTools.Trial: 8 samples with 1 evaluation per sample.
 Range (min  max):  321.493 ms  807.490 ms  ┊ GC (min  max):  0.00%  45.44%
 Time  (median):     670.635 ms               ┊ GC (median):    41.88%
 Time  (mean ± σ):   625.491 ms ± 149.446 ms  ┊ GC (mean ± σ):  38.67% ± 15.86%

  ▁                     ▁                 ▁▁   ▁█             ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  321 ms           Histogram: frequency by time          807 ms <

 Memory estimate: 1.00 GiB, allocs estimate: 14515235.

After

BenchmarkTools.Trial: 13 samples with 1 evaluation per sample.
 Range (min  max):  257.694 ms  564.993 ms  ┊ GC (min  max):  0.00%  56.52%
 Time  (median):     378.982 ms               ┊ GC (median):    37.36%
 Time  (mean ± σ):   396.513 ms ±  80.342 ms  ┊ GC (mean ± σ):  38.71% ± 13.44%

  █         █  █       █████     █  █      █    █             █  
  █▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁█████▁▁▁▁▁█▁▁█▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  258 ms           Histogram: frequency by time          565 ms <

 Memory estimate: 526.03 MiB, allocs estimate: 7257644.

collect(multiset_permutations([2,4,5,6,3,4,1,8,9,3,2]))

Before (v1.1.0)

BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min  max):  518.051 ms     1.083 s  ┊ GC (min  max):  0.00%  47.98%
 Time  (median):     805.651 ms               ┊ GC (median):    36.66%
 Time  (mean ± σ):   834.476 ms ± 200.360 ms  ┊ GC (mean ± σ):  35.33% ± 16.90%

  █                           ██  █                    █      █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█ ▁
  518 ms           Histogram: frequency by time          1.08 s <

 Memory estimate: 1.38 GiB, allocs estimate: 19958439.

After

BenchmarkTools.Trial: 10 samples with 1 evaluation per sample.
 Range (min  max):  494.901 ms  668.593 ms  ┊ GC (min  max): 29.28%  49.77%
 Time  (median):     533.120 ms               ┊ GC (median):    36.22%
 Time  (mean ± σ):   556.938 ms ±  60.735 ms  ┊ GC (mean ± σ):  38.80% ±  6.97%

  ▁▁▁        █  ▁           ▁                ▁ ▁              ▁  
  ███▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  495 ms           Histogram: frequency by time          669 ms <

 Memory estimate: 723.29 MiB, allocs estimate: 9979241.
Heap's Algorithm performance comparison

heapermutations(1:10)

BenchmarkTools.Trial: 13 samples with 1 evaluation per sample.
 Range (min  max):  206.685 ms  598.413 ms  ┊ GC (min  max):  0.00%  65.33%
 Time  (median):     382.252 ms               ┊ GC (median):    41.20%
 Time  (mean ± σ):   403.536 ms ±  94.463 ms  ┊ GC (mean ± σ):  45.00% ± 15.82%

  ▁                   █   ▁▁▁▁ ▁         ▁▁█                  ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁████▁█▁▁▁▁▁▁▁▁▁███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  207 ms           Histogram: frequency by time          598 ms <

 Memory estimate: 570.24 MiB, allocs estimate: 7257632.

collect(permutations(1:10))

BenchmarkTools.Trial: 13 samples with 1 evaluation per sample.
 Range (min  max):  246.140 ms  585.436 ms  ┊ GC (min  max):  0.00%  59.40%
 Time  (median):     371.256 ms               ┊ GC (median):    35.27%
 Time  (mean ± σ):   395.530 ms ±  89.370 ms  ┊ GC (mean ± σ):  39.57% ± 15.16%

  ▁           ▁   ▁▁  ▁ █▁      ▁ ▁      ▁         ▁          ▁  
  █▁▁▁▁▁▁▁▁▁▁▁█▁▁▁██▁▁█▁██▁▁▁▁▁▁█▁█▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
  246 ms           Histogram: frequency by time          585 ms <

 Memory estimate: 526.03 MiB, allocs estimate: 7257607.

Note: Below is the implementation of Heap's algorithm which I used to compare performance. It was written to be comparable in structure to nextpermutation() used in this update, but it's not actually correct for permlen < length(data), since it always produces all permutations (i.e. there are potential duplicates).

function heapermutations(data, permlen=length(data), perm=collect(eachindex(data)), datalen=length(data))
    state = ones(Int, datalen)
    output = [data[view(perm, 1:permlen)]]
    i = 1
    while i  datalen
        @inbounds(if state[i] < i
            if isodd(i)
                perm[1], perm[i] = perm[i], perm[1]
            else
                perm[state[i]], perm[i] = perm[i], perm[state[i]]
            end
            push!(output, data[view(perm, 1:permlen)])
            state[i] += 1
            i = 1
        else
            state[i] = 1
            i += 1
        end)
    end
    output
end

Update docstrings
Update for string comparison in `derangements`
@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 98.87640% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 97.19%. Comparing base (b808ce2) to head (b9479c4).

Files with missing lines Patch % Lines
src/permutations.jl 98.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #205      +/-   ##
==========================================
+ Coverage   97.17%   97.19%   +0.02%     
==========================================
  Files           8        8              
  Lines         813      857      +44     
==========================================
+ Hits          790      833      +43     
- Misses         23       24       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Delete non-mutating `nextpermutation()`
@depial depial changed the title Update permutations.jl Performance update for permutations.jl Dec 18, 2025
Provide a derangement-specific implementation. Improving performance and providing further functionality.
@depial
Copy link
Author

depial commented Dec 19, 2025

I noticed that derangements() was a filter of multiset_permutations(), so I've coded up a derangement-specific implementation which mirrors that of permutations and multiset permutations. It now has it's own type Derangements and nextderangement() method, which is an iterative version of Rohl's algorithm described here in recursive form.

Some benefits of the new implementation:

  • More performant in both time and space.
  • Increasingly more performant as multiplicity increases in multisets.
  • Includes support for derangements of size t.
  • Similar type implementation to Permutations and MultiSetPermutations.
  • Only one inner constructor.
Benchmarking

Unique set elements

Current implementation (v1.1.0) run on collect(derangements(1:10))

BenchmarkTools.Trial: 10 samples with 1 evaluation per sample.
 Range (min  max):  404.076 ms  703.335 ms  ┊ GC (min  max): 24.18%  44.09%
 Time  (median):     496.590 ms               ┊ GC (median):    28.75%
 Time  (mean ± σ):   517.083 ms ±  91.147 ms  ┊ GC (mean ± σ):  31.96% ±  6.32%

  █     █    █  █   ██    █ █                    █            █  
  █▁▁▁▁▁█▁▁▁▁█▁▁█▁▁▁██▁▁▁▁█▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  404 ms           Histogram: frequency by time          703 ms <

 Memory estimate: 1.16 GiB, allocs estimate: 19479014.

New Derangements implementation run on collect(derangements(1:10))

BenchmarkTools.Trial: 31 samples with 1 evaluation per sample.
 Range (min  max):  112.177 ms  278.214 ms  ┊ GC (min  max):  0.00%  59.36%
 Time  (median):     159.637 ms               ┊ GC (median):    27.81%
 Time  (mean ± σ):   168.818 ms ±  43.172 ms  ┊ GC (mean ± σ):  32.69% ± 16.97%

    ▁▁         ▁▄  █                  ▁ ▁                        
  ▆▁██▆▆▁▁▆▁▁▆▁██▆▁█▆▁▁▁▁▆▁▁▁▆▁▁▁▁▆▆▁▁█▁█▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▁▁▁▆ ▁
  112 ms           Histogram: frequency by time          278 ms <

 Memory estimate: 218.85 MiB, allocs estimate: 2669983.

Multiset elements

Current implementation (v1.1.0) run on collect(derangements([1, 1, 2, 4, 5, 5, 6, 7, 7, 9])

BenchmarkTools.Trial: 90 samples with 1 evaluation per sample.
 Range (min  max):  31.160 ms  86.085 ms  ┊ GC (min  max):  0.00%  38.91%
 Time  (median):     57.960 ms              ┊ GC (median):    29.18%
 Time  (mean ± σ):   55.699 ms ± 13.952 ms  ┊ GC (mean ± σ):  24.51% ± 15.05%

   ▃       ▁   ▃        ▁     ▆  ▁█ ▁        ▁      ▃          
  ▄█▄▄▇▄▇▁▁█▄▇▁█▄▁▄▇▇▄▁▇█▁▇▄▄▁█▁▄██▄█▇▄▇▄▇▇▇▄█▇▄▁▁▇▁█▄▁▁▄▄▄▄▇ ▁
  31.2 ms         Histogram: frequency by time          81 ms <

 Memory estimate: 142.82 MiB, allocs estimate: 2352077.

New Derangements implementation run on collect(derangements([1, 1, 2, 4, 5, 5, 6, 7, 7, 9]))

BenchmarkTools.Trial: 731 samples with 1 evaluation per sample.
 Range (min  max):  4.366 ms  21.888 ms  ┊ GC (min  max):  0.00%  71.92%
 Time  (median):     5.298 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   6.828 ms ±  3.301 ms  ┊ GC (mean ± σ):  17.23% ± 19.83%

  ██▆▆▄▃▂▂▁▄▃▁▂▂ ▁ ▁                                          
  ██████████████████████▆▁▅▄▄█▅▇▇▅▇▆▆▇▆▇▇▅▇▅▅▆▆▇▆▆▅▆▄▁▅▇▄▁▄▅ █
  4.37 ms      Histogram: log(frequency) by time     18.2 ms <

 Memory estimate: 13.37 MiB, allocs estimate: 168114.

Iteration has been streamlined a bit more and some comments have been added to help with future maintenance.
@depial
Copy link
Author

depial commented Dec 21, 2025

Note: I've changed the three argument multiset_permutations(m::Vector, f::Vector{<:Integer}, t::Integer) to be more clearly an outer constructor MultiSetPermutations(m::Vector, f::Vector{<:Integer}, t::Integer) since it appears this method is not actually meant to be exported.

If it is meant to be exported, I believe it would need a docstring which explains how to construct m and f in the way it is done in the two argument method multiset_permutations(a, t::Integer).

Limits constructors for the types `Derangements` and `DerangementsIterState` to one inner constructor a piece.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant