Skip to content

bench.py's output is sub-par #523

@xmo-odoo

Description

@xmo-odoo

Because bench.py just runs pyperf to stdout, it's pretty annoying to use especially when getting bench stability warnings in the mix:

.....................
short escape native: Mean +- std dev: 546 ns +- 14 ns
.....................
short escape speedups: Mean +- std dev: 363 ns +- 12 ns
.....................
long escape native: Mean +- std dev: 21.8 us +- 0.5 us
.....................
long escape speedups: Mean +- std dev: 8.39 us +- 0.56 us
.....................
short plain native: Mean +- std dev: 423 ns +- 11 ns
.....................
short plain speedups: Mean +- std dev: 291 ns +- 14 ns
.....................
long plain native: Mean +- std dev: 22.1 us +- 0.9 us
.....................
long plain speedups: Mean +- std dev: 8.44 us +- 0.52 us
.....................
long suffix native: Mean +- std dev: 171 us +- 3 us
.....................
long suffix speedups: Mean +- std dev: 138 us +- 5 us

even more so with lots of bench stability warnings

.....................
WARNING: the benchmark result may be unstable

  • Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

short escape native: Mean +- std dev: 546 ns +- 14 ns
.....................
WARNING: the benchmark result may be unstable

  • Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

short escape speedups: Mean +- std dev: 363 ns +- 12 ns
.....................
long escape native: Mean +- std dev: 21.8 us +- 0.5 us
.....................
WARNING: the benchmark result may be unstable

  • Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

long escape speedups: Mean +- std dev: 8.39 us +- 0.56 us
.....................
WARNING: the benchmark result may be unstable

  • Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

short plain native: Mean +- std dev: 423 ns +- 11 ns
.....................
WARNING: the benchmark result may be unstable

  • Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

short plain speedups: Mean +- std dev: 291 ns +- 14 ns
.....................
WARNING: the benchmark result may be unstable

  • Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

long plain native: Mean +- std dev: 22.1 us +- 0.9 us
.....................
WARNING: the benchmark result may be unstable

  • Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

long plain speedups: Mean +- std dev: 8.44 us +- 0.52 us
.....................
long suffix native: Mean +- std dev: 171 us +- 3 us
.....................
WARNING: the benchmark result may be unstable

  • Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

long suffix speedups: Mean +- std dev: 138 us +- 5 us

By using pyperf's ability to store benchmark reports as JSON, then compare_to, the bench's output could be significantly cleaner whether using the default format:

short escape: Mean +- std dev: [native] 552 ns +- 16 ns -> [speedups] 369 ns +- 15 ns: 1.50x faster
long escape: Mean +- std dev: [native] 113 us +- 6 us -> [speedups] 57.1 us +- 2.4 us: 1.97x faster
short plain: Mean +- std dev: [native] 459 ns +- 68 ns -> [speedups] 299 ns +- 21 ns: 1.54x faster
long plain: Mean +- std dev: [native] 22.0 us +- 0.9 us -> [speedups] 8.51 us +- 0.54 us: 2.58x faster
long prefix: Mean +- std dev: [native] 22.7 us +- 0.6 us -> [speedups] 19.5 us +- 1.1 us: 1.16x faster
long suffix: Mean +- std dev: [native] 171 us +- 3 us -> [speedups] 140 us +- 8 us: 1.22x faster

Geometric mean: 1.60x faster

or the table format:

+----------------+---------+-----------------------+
| Benchmark      | native  | speedups              |
+================+=========+=======================+
| short escape   | 573 ns  | 376 ns: 1.53x faster  |
+----------------+---------+-----------------------+
| long escape    | 124 us  | 57.7 us: 2.15x faster |
+----------------+---------+-----------------------+
| short plain    | 456 ns  | 296 ns: 1.54x faster  |
+----------------+---------+-----------------------+
| long plain     | 23.4 us | 8.65 us: 2.70x faster |
+----------------+---------+-----------------------+
| long prefix    | 23.7 us | 19.5 us: 1.22x faster |
+----------------+---------+-----------------------+
| long suffix    | 176 us  | 143 us: 1.23x faster  |
+----------------+---------+-----------------------+
| Geometric mean | (ref)   | 1.65x faster          |
+----------------+---------+-----------------------+

pyperf even supports a markdown table output so bench changes can be posted straight to github nicely formatted:

Benchmark native speedups
short escape 567 ns 364 ns: 1.56x faster
long escape 114 us 56.6 us: 2.01x faster
short plain 445 ns 325 ns: 1.37x faster
long plain 22.9 us 8.27 us: 2.77x faster
long prefix 26.8 us 19.3 us: 1.39x faster
long suffix 177 us 139 us: 1.27x faster
Geometric mean (ref) 1.66x faster

Note that these improvements do translate over to adding new speedups (e.g. #438).

Also, the "long escape" bench is the same as "long plain", and there's no dependency on pyperf declared anywhere so it's a bit annoying to run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions