`bench.py`'s output is sub-par

Because `bench.py` just runs `pyperf` to stdout, it's pretty annoying to use especially when getting bench stability warnings in the mix:

> .....................
> short escape native: Mean +- std dev: 546 ns +- 14 ns
> .....................
> short escape speedups: Mean +- std dev: 363 ns +- 12 ns
> .....................
> long escape native: Mean +- std dev: 21.8 us +- 0.5 us
> .....................
> long escape speedups: Mean +- std dev: 8.39 us +- 0.56 us
> .....................
> short plain native: Mean +- std dev: 423 ns +- 11 ns
> .....................
> short plain speedups: Mean +- std dev: 291 ns +- 14 ns
> .....................
> long plain native: Mean +- std dev: 22.1 us +- 0.9 us
> .....................
> long plain speedups: Mean +- std dev: 8.44 us +- 0.52 us
> .....................
> long suffix native: Mean +- std dev: 171 us +- 3 us
> .....................
> long suffix speedups: Mean +- std dev: 138 us +- 5 us

<details><summary>even more so with lots of bench stability warnings</summary>

.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

short escape native: Mean +- std dev: 546 ns +- 14 ns
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

short escape speedups: Mean +- std dev: 363 ns +- 12 ns
.....................
long escape native: Mean +- std dev: 21.8 us +- 0.5 us
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

long escape speedups: Mean +- std dev: 8.39 us +- 0.56 us
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

short plain native: Mean +- std dev: 423 ns +- 11 ns
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

short plain speedups: Mean +- std dev: 291 ns +- 14 ns
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

long plain native: Mean +- std dev: 22.1 us +- 0.9 us
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

long plain speedups: Mean +- std dev: 8.44 us +- 0.52 us
.....................
long suffix native: Mean +- std dev: 171 us +- 3 us
.....................
WARNING: the benchmark result may be unstable
* Not enough samples to get a stable result (95% certainly of less than 1% variation)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

long suffix speedups: Mean +- std dev: 138 us +- 5 us
</details>

By using pyperf's ability to store benchmark reports as JSON, then `compare_to`, the bench's output could be significantly cleaner whether using the default format:

> short escape: Mean +- std dev: [native] 552 ns +- 16 ns -> [speedups] 369 ns +- 15 ns: 1.50x faster
> long escape: Mean +- std dev: [native] 113 us +- 6 us -> [speedups] 57.1 us +- 2.4 us: 1.97x faster
> short plain: Mean +- std dev: [native] 459 ns +- 68 ns -> [speedups] 299 ns +- 21 ns: 1.54x faster
> long plain: Mean +- std dev: [native] 22.0 us +- 0.9 us -> [speedups] 8.51 us +- 0.54 us: 2.58x faster
> long prefix: Mean +- std dev: [native] 22.7 us +- 0.6 us -> [speedups] 19.5 us +- 1.1 us: 1.16x faster
> long suffix: Mean +- std dev: [native] 171 us +- 3 us -> [speedups] 140 us +- 8 us: 1.22x faster
>
> Geometric mean: 1.60x faster

or the table format:
```
+----------------+---------+-----------------------+
| Benchmark      | native  | speedups              |
+================+=========+=======================+
| short escape   | 573 ns  | 376 ns: 1.53x faster  |
+----------------+---------+-----------------------+
| long escape    | 124 us  | 57.7 us: 2.15x faster |
+----------------+---------+-----------------------+
| short plain    | 456 ns  | 296 ns: 1.54x faster  |
+----------------+---------+-----------------------+
| long plain     | 23.4 us | 8.65 us: 2.70x faster |
+----------------+---------+-----------------------+
| long prefix    | 23.7 us | 19.5 us: 1.22x faster |
+----------------+---------+-----------------------+
| long suffix    | 176 us  | 143 us: 1.23x faster  |
+----------------+---------+-----------------------+
| Geometric mean | (ref)   | 1.65x faster          |
+----------------+---------+-----------------------+
```
pyperf even supports a markdown table output so bench changes can be posted straight to github nicely formatted:
| Benchmark      | native  | speedups              |
|----------------|:-------:|:---------------------:|
| short escape   | 567 ns  | 364 ns: 1.56x faster  |
| long escape    | 114 us  | 56.6 us: 2.01x faster |
| short plain    | 445 ns  | 325 ns: 1.37x faster  |
| long plain     | 22.9 us | 8.27 us: 2.77x faster |
| long prefix    | 26.8 us | 19.3 us: 1.39x faster |
| long suffix    | 177 us  | 139 us: 1.27x faster  |
| Geometric mean | (ref)   | 1.66x faster          |

Note that these improvements do translate over to adding new speedups (e.g. #438).

Also, the "long escape" bench is the same as "long plain", and there's no dependency on pyperf declared anywhere so it's a bit annoying to run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`bench.py`'s output is sub-par #523

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Benchmark	native	speedups
short escape	567 ns	364 ns: 1.56x faster
long escape	114 us	56.6 us: 2.01x faster
short plain	445 ns	325 ns: 1.37x faster
long plain	22.9 us	8.27 us: 2.77x faster
long prefix	26.8 us	19.3 us: 1.39x faster
long suffix	177 us	139 us: 1.27x faster
Geometric mean	(ref)	1.66x faster

Uh oh!

bench.py's output is sub-par #523

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`bench.py`'s output is sub-par #523