Add compare-between mode by akx · Pull Request #302 · ionelmc/pytest-benchmark

akx · 2026-02-26T11:23:44Z

This PR adds a compare --between mode, effectively a pivot table between 2..N result files.

I needed this for django/asgiref#551 and cleaned it up for general use :)

Example output with color:

codecov · 2026-02-26T15:59:08Z

Codecov Report

❌ Patch coverage is 93.93939% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.72%. Comparing base (86ae1ed) to head (25ff537).
⚠️ Report is 10 commits behind head on master.

Files with missing lines	Patch %	Lines
src/pytest_benchmark/table.py	91.57%	4 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #302      +/-   ##
==========================================
+ Coverage   90.19%   90.72%   +0.53%     
==========================================
  Files          28       28              
  Lines        2936     3052     +116     
  Branches      319      336      +17     
==========================================
+ Hits         2648     2769     +121     
+ Misses        213      208       -5     
  Partials       75       75

Flag	Coverage Δ
py310-pytest84-nodist-cover (macos/arm64)	`86.40% <93.93%> (+0.36%)`	⬆️
py310-pytest84-nodist-cover (ubuntu/x64)	`86.69% <93.93%> (+0.32%)`	⬆️
py310-pytest84-nodist-cover (windows/x64)	`86.17% <93.93%> (+0.37%)`	⬆️
py310-pytest84-xdist-cover (macos/arm64)	`86.73% <93.93%> (?)`
py310-pytest84-xdist-cover (ubuntu/x64)	`87.05% <93.93%> (?)`
py310-pytest84-xdist-cover (windows/x64)	`86.50% <93.93%> (?)`
py310-pytest90-nodist-cover (macos/arm64)	`86.36% <93.93%> (+0.36%)`	⬆️
py310-pytest90-nodist-cover (ubuntu/x64)	`86.86% <93.93%> (+0.55%)`	⬆️
py310-pytest90-nodist-cover (windows/x64)	`85.94% <93.93%> (+0.38%)`	⬆️
py310-pytest90-xdist-cover (macos/arm64)	`86.69% <93.93%> (?)`
py310-pytest90-xdist-cover (ubuntu/x64)	`87.18% <93.93%> (?)`
py310-pytest90-xdist-cover (windows/x64)	`86.46% <93.93%> (?)`
py311-pytest84-nodist-cover (macos/arm64)	`86.40% <93.93%> (+0.36%)`	⬆️
py311-pytest84-nodist-cover (ubuntu/x64)	`86.73% <93.93%> (+0.38%)`	⬆️
py311-pytest84-nodist-cover (windows/x64)	`86.17% <93.93%> (+0.58%)`	⬆️
py311-pytest84-xdist-cover (macos/arm64)	`86.73% <93.93%> (?)`
py311-pytest84-xdist-cover (ubuntu/x64)	`87.05% <93.93%> (?)`
py311-pytest84-xdist-cover (windows/x64)	`86.50% <93.93%> (?)`
py311-pytest90-nodist-cover (macos/arm64)	`86.36% <93.93%> (+0.36%)`	⬆️
py311-pytest90-nodist-cover (ubuntu/x64)	`86.69% <93.93%> (+0.15%)`	⬆️
py311-pytest90-nodist-cover (windows/x64)	`86.14% <93.93%> (+0.58%)`	⬆️
py311-pytest90-xdist-cover (macos/arm64)	`86.69% <93.93%> (?)`
py311-pytest90-xdist-cover (ubuntu/x64)	`86.99% <93.93%> (?)`
py311-pytest90-xdist-cover (windows/x64)	`86.46% <93.93%> (?)`
py312-pytest84-nodist-cover (macos/arm64)	`86.40% <93.93%> (+0.36%)`	⬆️
py312-pytest84-nodist-cover (ubuntu/x64)	`86.69% <93.93%> (+0.35%)`	⬆️
py312-pytest84-nodist-cover (windows/x64)	`86.17% <93.93%> (+0.37%)`	⬆️
py312-pytest84-xdist-cover (macos/arm64)	`86.53% <93.93%> (?)`
py312-pytest84-xdist-cover (ubuntu/x64)	`87.02% <93.93%> (?)`
py312-pytest84-xdist-cover (windows/x64)	`86.50% <93.93%> (?)`
py312-pytest90-nodist-cover (macos/arm64)	`86.36% <93.93%> (+0.36%)`	⬆️
py312-pytest90-nodist-cover (ubuntu/x64)	`86.86% <93.93%> (+0.51%)`	⬆️
py312-pytest90-nodist-cover (windows/x64)	`86.14% <93.93%> (+0.58%)`	⬆️
py312-pytest90-xdist-cover (macos/arm64)	`86.69% <93.93%> (?)`
py312-pytest90-xdist-cover (ubuntu/x64)	`86.99% <93.93%> (?)`
py312-pytest90-xdist-cover (windows/x64)	`86.27% <93.93%> (?)`
py313-pytest84-nodist-cover (macos/arm64)	`86.20% <93.93%> (+0.17%)`	⬆️
py313-pytest84-nodist-cover (ubuntu/x64)	`86.73% <93.93%> (+0.35%)`	⬆️
py313-pytest84-nodist-cover (windows/x64)	`86.17% <93.93%> (+0.37%)`	⬆️
py313-pytest84-xdist-cover (macos/arm64)	`86.73% <93.93%> (?)`
py313-pytest84-xdist-cover (ubuntu/x64)	`87.05% <93.93%> (?)`
py313-pytest84-xdist-cover (windows/x64)	`86.50% <93.93%> (?)`
py313-pytest90-nodist-cover (macos/arm64)	`86.36% <93.93%> (+0.36%)`	⬆️
py313-pytest90-nodist-cover (ubuntu/x64)	`86.69% <93.93%> (+0.38%)`	⬆️
py313-pytest90-nodist-cover (windows/x64)	`86.14% <93.93%> (+0.58%)`	⬆️
py313-pytest90-xdist-cover (macos/arm64)	`86.69% <93.93%> (?)`
py313-pytest90-xdist-cover (ubuntu/x64)	`86.99% <93.93%> (?)`
py313-pytest90-xdist-cover (windows/x64)	`86.27% <93.93%> (?)`
py314-pytest84-nodist-cover (macos/arm64)	`89.60% <93.93%> (+0.24%)`	⬆️
py314-pytest84-nodist-cover (ubuntu/x64)	`90.10% <93.93%> (+0.45%)`	⬆️
py314-pytest84-nodist-cover (windows/x64)	`89.34% <93.93%> (+0.25%)`	⬆️
py314-pytest84-xdist-cover (macos/arm64)	`89.93% <93.93%> (?)`
py314-pytest84-xdist-cover (ubuntu/x64)	`90.39% <93.93%> (?)`
py314-pytest84-xdist-cover (windows/x64)	`89.67% <93.93%> (?)`
py314-pytest90-nodist-cover (macos/arm64)	`89.60% <93.93%> (+0.24%)`	⬆️
py314-pytest90-nodist-cover (ubuntu/x64)	`90.06% <93.93%> (+0.42%)`	⬆️
py314-pytest90-nodist-cover (windows/x64)	`89.34% <93.93%> (+0.25%)`	⬆️
py314-pytest90-xdist-cover (macos/arm64)	`89.93% <93.93%> (?)`
py314-pytest90-xdist-cover (ubuntu/x64)	`90.19% <93.93%> (?)`
py314-pytest90-xdist-cover (windows/x64)	`89.67% <93.93%> (?)`
pypy311-pytest84-nodist-cover (macos/arm64)	`85.58% <93.93%> (+0.39%)`	⬆️
pypy311-pytest84-nodist-cover (ubuntu/x64)	`86.10% <93.93%> (+0.61%)`	⬆️
pypy311-pytest84-nodist-cover (windows/x64)	`85.35% <93.93%> (+0.61%)`	⬆️
pypy311-pytest84-xdist-cover (macos/arm64)	`85.91% <93.93%> (?)`
pypy311-pytest84-xdist-cover (ubuntu/x64)	`86.40% <93.93%> (?)`
pypy311-pytest84-xdist-cover (windows/x64)	`85.68% <93.93%> (?)`
pypy311-pytest90-nodist-cover (macos/arm64)	`85.58% <93.93%> (+0.39%)`	⬆️
pypy311-pytest90-nodist-cover (ubuntu/x64)	`85.91% <93.93%> (+0.42%)`	⬆️
pypy311-pytest90-nodist-cover (windows/x64)	`85.35% <93.93%> (+0.40%)`	⬆️
pypy311-pytest90-xdist-cover (macos/arm64)	`85.91% <93.93%> (?)`
pypy311-pytest90-xdist-cover (ubuntu/x64)	`86.20% <93.93%> (?)`
pypy311-pytest90-xdist-cover (windows/x64)	`85.68% <93.93%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ionelmc · 2026-02-26T17:20:01Z

Hmmm nice, looks like your did some refactorings, I'll try find time this week to review.

akx · 2026-02-26T18:13:27Z

looks like your did some refactorings

Very tiny ones, separated into the first commit for ease of review. ef81697

ionelmc · 2026-03-17T16:13:51Z

src/pytest_benchmark/cli.py

    add_display_options(compare_command.add_argument, prefix='')
    add_histogram_options(compare_command.add_argument, prefix='')
+    compare_command.add_argument(
+        '--compare-between',


This should use the prefix.

The prefix seems to be forced to '' for all compare subcommand options (and there's no prefix available in this function as far as I can see..?)?

Ah oops I kinda forgot what this function was for. What I wanted is --compare-between to be --between instead (cause the command already has "compare" - pointless to just repeat "compare" all over).

ionelmc · 2026-03-17T16:20:27Z

Sorry for the delays, I finally got to try this and I have an example to discuss. I have this stuff locally (bunch of crappy stats for 2 platforms):

> pytest-benchmark list

/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0001_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005508_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0001_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141615_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0002_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005552_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0002_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141718_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0003_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005844_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0003_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141813_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0004_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_010137_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0004_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_143038_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0005_4330f5597d413b9c0d0e54928bac300679822cac_20190107_010839_uncommited-changes.json

If I run pytest-benchmark compare --compare-between --columns=min I get this (without --columns= it's even worse):

--------------------------------------------------------------------------------------------------------------------------------------------- benchmark: 9 tests, 9 sources ---------------------------------------------------------------------------------------------------------------------------------------------
Name (time in ns)        0001_9aa5319 Min  0001_bf76dd3 Min  0002_9aa5319 Min  0002_bf76dd3 Min  0003_9aa5319 Min  0003_bf76dd3 Min  0004_9aa5319 Min  0004_bf76dd3 Min  0005_4330f55 Min  Chg:0001_b/Min  Chg:0002_9/Min  Chg:0002_b/Min  Chg:0003_9/Min  Chg:0003_b/Min  Chg:0004_9/Min  Chg:0004_b/Min  Chg:0005_4/Min
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_xfast                        59.6046           34.2931           61.9888           34.3178           61.9888           34.3048           61.9888           34.3178           59.6046          -42.5%           +4.0%          -42.4%           +4.0%          -42.4%           +4.0%          -42.4%           +0.0%
test_fast                     17,200.0000        6,499.9999       15,400.0008        9,600.0003       14,900.0007        9,400.0002       14,799.9999        9,299.9999       15,799.9966          -62.2%          -10.5%          -44.2%          -13.4%          -45.3%          -14.0%          -45.9%           -8.1%
test_parametrized[2]          17,900.0017       69,300.0002       17,700.0002       51,900.0000       42,400.0027       24,800.0001       21,399.9992       28,399.9998       19,399.9986         +287.2%           -1.1%         +189.9%         +136.9%          +38.5%          +19.6%          +58.7%           +8.4%
test_parametrized[4]          22,799.9990       74,600.0001       64,200.0014       41,500.0000       22,699.9982       25,400.0001       25,299.9998       53,499.9999       19,700.0008         +227.2%         +181.6%          +82.0%           -0.4%          +11.4%          +11.0%         +134.6%          -13.6%
test_parametrized[3]          23,699.9986       74,300.0001       16,900.0014       37,400.0001       23,700.0022       42,199.9998       20,399.9989       21,000.0003       18,599.9997         +213.5%          -28.7%          +57.8%           +0.0%          +78.1%          -13.9%          -11.4%          -21.5%
test_parametrized[0]          29,499.9991       73,700.0000       19,899.9987       53,400.0001       35,799.9998       39,999.9999       22,001.0006       77,100.0000       23,999.9972         +149.8%          -32.5%          +81.0%          +21.4%          +35.6%          -25.4%         +161.4%          -18.6%
test_parametrized[1]          49,499.9986       43,099.9999       18,899.9984       47,999.9999       51,700.0008       32,599.9999       22,699.9982       53,499.9999       20,299.9981          -12.9%          -61.8%           -3.0%           +4.4%          -34.1%          -54.1%           +8.1%          -59.0%
test_slow                  1,060,704.9990    1,031,199.9999    1,061,405.0007    1,066,400.0001    1,049,705.0025    1,064,200.0002    1,039,404.0000    1,066,299.9998    1,036,099.9986           -2.8%           +0.1%           +0.5%           -1.0%           +0.3%           -2.0%           +0.5%           -2.3%
test_slower               10,039,549.0008   10,069,902.0004   10,072,847.9993   10,073,599.9999   10,072,043.9987   10,068,999.9999   10,072,840.0030   10,073,999.9998   10,067,498.9990           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

So the way I understand it 0001_9aa5319 is the reference and then everything is differentiated to that (correct me if I am wrong).

I think it would be better is the differences appear right after the column that the diff is made of.

Currently it's: reference, run1, run2, run3, diff1, diff2, diff3. I think reference, run1, diff1, run2, diff2, run3, diff3 would be better, and you'd be able to make the column headers more compact (eg: just show "difference" or "diff" instead of repeating and trying to fit the run name from the previous columns).

Also, about columns, I would like to propose this idea: make --between be exclusive with --columns. Cause it doesn't make sense to use --between with all the columns unless you're comparing just 2 runs. Actually I am not sure but the defaults are producing too many columns. It should default somehow to either compare 1 stat or compare all the default stats on just 2 runs. Maybe go with these:

--between (no value) does some magic: if 2 runs are in the input, compare all the default columns, if more than 2 runs then only compare a default stat like min or mean (I guess we have to decide which is the most useful one to show by default - what to you usually use?)
--between=min,max,etc shows those columns (either overrides what --columns sets or errors out by being exclusive option)

ionelmc · 2026-03-17T16:31:44Z

So lemme know what you think, to sum it up, my ideas where to compact/remove redundant info from the column names (like Chg:runname just repeats runname from previous column) and give better defaults/prevent users from outputting something with 100 columns by default.

akx · 2026-03-18T07:20:08Z

So the way I understand it 0001_9aa5319 is the reference and then everything is differentiated to that (correct me if I am wrong).

Exactly. Maybe the 1st item's header should have a (*) or something and a post-table legend would explain that it's the reference item?

I think it would be better is the differences appear right after the column that the diff is made of.

Currently it's: reference, run1, run2, run3, diff1, diff2, diff3. I think reference, run1, diff1, run2, diff2, run3, diff3 would be better, and you'd be able to make the column headers more compact (eg: just show "difference" or "diff" instead of repeating and trying to fit the run name from the previous columns).

Sure, good idea. I admit I just basically use --compare-between --columns=ops, so I didn't think about the UX enough when there are more columns.

--between (no value) does some magic: if 2 runs are in the input, compare all the default columns, if more than 2 runs then only compare a default stat like min or mean (I guess we have to decide which is the most useful one to show by default - what to you usually use?)

I use OPS, but I'm not sure it's the best metric to default to. 🤔

akx · 2026-03-18T08:18:59Z

Reworked the column order, made the reference numbers blue, added the reference indicator.

I didn't touch the argument parsing yet though, not sure about the (implicit) default.

ionelmc · 2026-03-18T12:16:08Z

Reworked the column order, made the reference numbers blue, added the reference indicator.

That delta sign is better but can you reorder the columns? Then you don't need the (*) part and everything is easier to follow if the delta always appears right after the referenced column.

ionelmc · 2026-03-18T12:17:30Z

I use OPS, but I'm not sure it's the best metric to default to.

Maybe no defaults then, and make --between always require the columns?

akx · 2026-03-18T15:55:36Z

That delta sign is better but can you reorder the columns? Then you don't need the (*) part and everything is easier to follow if the delta always appears right after the referenced column.

So

Okay! That'll work OK with the blue color for the reference.

Maybe no defaults then, and make --between always require the columns?

uv run pytest-benchmark compare .benchmarks/Darwin-CPython-3.14-64bit/*.json --between=ops

? Sounds good to me 👍

akx · 2026-03-24T16:07:53Z

--betweenified and simplified:

ionelmc · 2026-03-24T19:23:20Z

Looks great. Two last things to do I think:

adding a screenshot in the readme
the column size is already trimmed to 12 characters - perhaps you could use --name instead to provide the actual rendering of the column name? It would be passed as name_format to the renderer class.

akx · 2026-03-24T21:26:42Z

Switched to using name_format and added a screenshot (meticulously squeezed to 32 colors and optimized 😁).

ionelmc reviewed Mar 17, 2026

View reviewed changes

akx force-pushed the compare-between branch from ee802f9 to b482bdc Compare March 18, 2026 08:16

akx changed the title ~~Add --compare-between mode~~ Add compare-between mode Mar 24, 2026

akx force-pushed the compare-between branch from b482bdc to 72982c0 Compare March 24, 2026 16:07

akx force-pushed the compare-between branch from 72982c0 to 5ebe1c6 Compare March 24, 2026 16:09

akx force-pushed the compare-between branch from 5ebe1c6 to db52808 Compare March 24, 2026 20:01

akx added 3 commits March 24, 2026 22:01

Add self to AUTHORS.rst

bfd6e5a

Refactor table.py for reuse

e8a4434

Add compare-between mode (--between)

37e6ec2

akx force-pushed the compare-between branch from db52808 to 37e6ec2 Compare March 24, 2026 21:16

akx requested a review from ionelmc March 24, 2026 21:26

Add missing space/wildcard.

25ff537

ionelmc merged commit cf97f26 into ionelmc:master Mar 25, 2026
222 of 223 checks passed

Conversation

akx commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ionelmc commented Feb 26, 2026

Uh oh!

akx commented Feb 26, 2026

Uh oh!

ionelmc Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

akx Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

ionelmc Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

ionelmc commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ionelmc commented Mar 17, 2026

Uh oh!

akx commented Mar 18, 2026

Uh oh!

akx commented Mar 18, 2026

Uh oh!

ionelmc commented Mar 18, 2026

Uh oh!

ionelmc commented Mar 18, 2026

Uh oh!

akx commented Mar 18, 2026

Uh oh!

akx commented Mar 24, 2026

Uh oh!

ionelmc commented Mar 24, 2026

Uh oh!

akx commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akx commented Feb 26, 2026 •

edited

Loading

codecov bot commented Feb 26, 2026 •

edited

Loading

ionelmc commented Mar 17, 2026 •

edited

Loading