Skip to content

performance: distrname: use @generated#2050

Open
fonsp wants to merge 1 commit intoJuliaStats:masterfrom
fonsp:distrname-generated
Open

performance: distrname: use @generated#2050
fonsp wants to merge 1 commit intoJuliaStats:masterfrom
fonsp:distrname-generated

Conversation

@fonsp
Copy link
Copy Markdown

@fonsp fonsp commented Apr 1, 2026

Hi! This is my first contribution to Distributions.jl :)

This PR improves the performance of distrname (and other display methods that use it). By using @generated, the result can be computed at JIT time for each input type and cached, making the function instant and non-allocating.

In my logging application, I found that calls to Distributions.distrname were a bottleneck. This PR improves my logging performance 7x.

Benchmark

Example distributions:

d1 = Normal(1,2)

d2 = MixtureModel(
    [
        truncated(
            LocationScale(
                2.0,
                0.5,
                MixtureModel(
                    [
                        Normal(0.0, 1.0),
                        GeneralizedExtremeValue(0.0, 1.0, 0.5),
                        LocationScale(-1.0, 2.0, Gamma(2.0, 3.0)),
                    ],
                    Categorical([0.4, 0.35, 0.25]),
                ),
            );
            lower = -20.0,
            upper = 20.0,
        ),

        censored(
            LocationScale(
                -3.0,
                1.5,
                MixtureModel(
                    [
                        Levy(0.0, 1.0),
                        InverseGaussian(1.0, 3.0),
                        truncated(LogNormal(0.0, 0.5); lower = 0.1, upper = 50.0),
                    ],
                    [0.3, 0.3, 0.4],
                ),
            );
            lower = -100.0,
            upper = 100.0,
        ),

        LocationScale(
            10.0,
            0.1,
            LocationScale(
                -5.0,
                2.0,
                LocationScale(0.0, 1.0, Beta(2.0, 5.0)),
            ),
        ),
    ],
    [0.5, 0.3, 0.2],
)

Benchmark before the PR:

julia> @benchmark Distributions.distrname($d1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  40.041 μs …  7.643 ms  ┊ GC (min … max): 0.00% … 96.07%
 Time  (median):     41.458 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   43.412 μs ± 76.519 μs  ┊ GC (mean ± σ):  1.69% ±  0.96%

   █▆▂                                                         
  ▆███▆▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▂▂▂▂▁▁▁▂▂▁▂▂ ▃
  40 μs           Histogram: frequency by time        71.6 μs <

 Memory estimate: 13.30 KiB, allocs estimate: 43.

julia> @benchmark Distributions.distrname($d2)
BenchmarkTools.Trial: 9498 samples with 1 evaluation per sample.
 Range (min … max):  463.083 μs …   3.380 ms  ┊ GC (min … max): 0.00% … 73.34%
 Time  (median):     482.667 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   524.898 μs ± 197.804 μs  ┊ GC (mean ± σ):  2.04% ±  5.54%

  █▆▆▅▄▃▃▃▂▁▁                                                   ▂
  ███████████▇▇▇▇▇▇▅▆▅▅▅▆▆▆▄▅▅▄▅▅▅▃▃▃▄▃▄▄▄▅▅▅▄▅▄▄▄▅▄▁▄▃▄▃▃▃▅▄▃▅ █
  463 μs        Histogram: log(frequency) by time       1.49 ms <

 Memory estimate: 144.47 KiB, allocs estimate: 432.

After the PR:

julia> @benchmark Distributions.distrname($d1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  2.166 ns … 35.375 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.250 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.365 ns ±  1.091 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▆█ ▇▅▅ ▃▂ ▁▁                                              ▂
  ▇██▁███▁██▁██▇▁▅▅▁▁▃▃▁▃▁▇▁▇▅▄▄▅▁▁▄▄▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▅▄ █
  2.17 ns      Histogram: log(frequency) by time     3.92 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Distributions.distrname($d2)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  2.250 ns … 996.625 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.375 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.082 ns ±  11.668 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃ ▁   ▂▄                                                   ▁
  ██▆█▇▄▁██▅▄▁▄▃▄▄▃▁▄▄▄▄▃▄▃▄▄▄▄▁▃▄▃▃▄▄▁▃▁▄▅▃▃▅▄▄▅▄▄▅▄▄▆▄▄▃▄▄▅ █
  2.25 ns      Histogram: log(frequency) by time      14.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

First call time

The first call time also improve. The benchmarks above, with @time instead of @benchmark on a cold system:

Before the PR:

  0.005251 seconds (873 allocations: 53.484 KiB, 63.19% compilation time)
  0.016345 seconds (1.26 k allocations: 184.703 KiB, 14.42% compilation time)

After the PR:

  0.003518 seconds (597 allocations: 38.656 KiB, 98.91% compilation time)
  0.013299 seconds (1.66 k allocations: 201.078 KiB, 99.74% compilation time)

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.46%. Comparing base (196d79b) to head (b0b5082).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2050      +/-   ##
==========================================
- Coverage   86.46%   86.46%   -0.01%     
==========================================
  Files         147      147              
  Lines        8837     8836       -1     
==========================================
- Hits         7641     7640       -1     
  Misses       1196     1196              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@devmotion
Copy link
Copy Markdown
Member

Generally, IMO we should rather try to keep the amount of @generated functions minimal... A priori I don't see a strong reason for this change and hence would lean towards not introducing a new @generated function - the existing implementation doesn't suffer from the limitations of @generated functions, it's only used in already not performance-critical show methods, and the compiler should be able to optimise the existing code.

@devmotion
Copy link
Copy Markdown
Member

devmotion commented Apr 1, 2026

There's one obvious inefficiency here, namely constructing a string that is only every used in print(io, ...). This method should just directly write to io.

Apart from that, I would usually consider performance of show methods irrelevant and not worth adding a @generated function. In base Julia, show methods are even explicitly @nospecialized on the input arguments. IIRC the fallback show method is not super-performant (I think I read due to looking up type aliases), so if this is a problem in your use case upstream Julia might be the right place for further improvements.

Last but not least, this change breaks tests.

@fonsp
Copy link
Copy Markdown
Author

fonsp commented Apr 5, 2026

Hi! I tested your PR #2051 and in my case it does not increase performance. My app still has 90% of CPU time spent on the type-to-string mechanism. I believe that the PR still converts types to strings, but it gets delegated to Base.print internals.

Could you say a bit more about why you don't want to use @generated?

Performance of show functionality is relevant in interactive work and observability, which I find important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants