Skip to content

Add compare-between mode#302

Merged
ionelmc merged 4 commits intoionelmc:masterfrom
akx:compare-between
Mar 25, 2026
Merged

Add compare-between mode#302
ionelmc merged 4 commits intoionelmc:masterfrom
akx:compare-between

Conversation

@akx
Copy link
Copy Markdown
Contributor

@akx akx commented Feb 26, 2026

This PR adds a compare --between mode, effectively a pivot table between 2..N result files.

I needed this for django/asgiref#551 and cleaned it up for general use :)

Example output with color:

bench

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 93.93939% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.72%. Comparing base (86ae1ed) to head (25ff537).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
src/pytest_benchmark/table.py 91.57% 4 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #302      +/-   ##
==========================================
+ Coverage   90.19%   90.72%   +0.53%     
==========================================
  Files          28       28              
  Lines        2936     3052     +116     
  Branches      319      336      +17     
==========================================
+ Hits         2648     2769     +121     
+ Misses        213      208       -5     
  Partials       75       75              
Flag Coverage Δ
py310-pytest84-nodist-cover (macos/arm64) 86.40% <93.93%> (+0.36%) ⬆️
py310-pytest84-nodist-cover (ubuntu/x64) 86.69% <93.93%> (+0.32%) ⬆️
py310-pytest84-nodist-cover (windows/x64) 86.17% <93.93%> (+0.37%) ⬆️
py310-pytest84-xdist-cover (macos/arm64) 86.73% <93.93%> (?)
py310-pytest84-xdist-cover (ubuntu/x64) 87.05% <93.93%> (?)
py310-pytest84-xdist-cover (windows/x64) 86.50% <93.93%> (?)
py310-pytest90-nodist-cover (macos/arm64) 86.36% <93.93%> (+0.36%) ⬆️
py310-pytest90-nodist-cover (ubuntu/x64) 86.86% <93.93%> (+0.55%) ⬆️
py310-pytest90-nodist-cover (windows/x64) 85.94% <93.93%> (+0.38%) ⬆️
py310-pytest90-xdist-cover (macos/arm64) 86.69% <93.93%> (?)
py310-pytest90-xdist-cover (ubuntu/x64) 87.18% <93.93%> (?)
py310-pytest90-xdist-cover (windows/x64) 86.46% <93.93%> (?)
py311-pytest84-nodist-cover (macos/arm64) 86.40% <93.93%> (+0.36%) ⬆️
py311-pytest84-nodist-cover (ubuntu/x64) 86.73% <93.93%> (+0.38%) ⬆️
py311-pytest84-nodist-cover (windows/x64) 86.17% <93.93%> (+0.58%) ⬆️
py311-pytest84-xdist-cover (macos/arm64) 86.73% <93.93%> (?)
py311-pytest84-xdist-cover (ubuntu/x64) 87.05% <93.93%> (?)
py311-pytest84-xdist-cover (windows/x64) 86.50% <93.93%> (?)
py311-pytest90-nodist-cover (macos/arm64) 86.36% <93.93%> (+0.36%) ⬆️
py311-pytest90-nodist-cover (ubuntu/x64) 86.69% <93.93%> (+0.15%) ⬆️
py311-pytest90-nodist-cover (windows/x64) 86.14% <93.93%> (+0.58%) ⬆️
py311-pytest90-xdist-cover (macos/arm64) 86.69% <93.93%> (?)
py311-pytest90-xdist-cover (ubuntu/x64) 86.99% <93.93%> (?)
py311-pytest90-xdist-cover (windows/x64) 86.46% <93.93%> (?)
py312-pytest84-nodist-cover (macos/arm64) 86.40% <93.93%> (+0.36%) ⬆️
py312-pytest84-nodist-cover (ubuntu/x64) 86.69% <93.93%> (+0.35%) ⬆️
py312-pytest84-nodist-cover (windows/x64) 86.17% <93.93%> (+0.37%) ⬆️
py312-pytest84-xdist-cover (macos/arm64) 86.53% <93.93%> (?)
py312-pytest84-xdist-cover (ubuntu/x64) 87.02% <93.93%> (?)
py312-pytest84-xdist-cover (windows/x64) 86.50% <93.93%> (?)
py312-pytest90-nodist-cover (macos/arm64) 86.36% <93.93%> (+0.36%) ⬆️
py312-pytest90-nodist-cover (ubuntu/x64) 86.86% <93.93%> (+0.51%) ⬆️
py312-pytest90-nodist-cover (windows/x64) 86.14% <93.93%> (+0.58%) ⬆️
py312-pytest90-xdist-cover (macos/arm64) 86.69% <93.93%> (?)
py312-pytest90-xdist-cover (ubuntu/x64) 86.99% <93.93%> (?)
py312-pytest90-xdist-cover (windows/x64) 86.27% <93.93%> (?)
py313-pytest84-nodist-cover (macos/arm64) 86.20% <93.93%> (+0.17%) ⬆️
py313-pytest84-nodist-cover (ubuntu/x64) 86.73% <93.93%> (+0.35%) ⬆️
py313-pytest84-nodist-cover (windows/x64) 86.17% <93.93%> (+0.37%) ⬆️
py313-pytest84-xdist-cover (macos/arm64) 86.73% <93.93%> (?)
py313-pytest84-xdist-cover (ubuntu/x64) 87.05% <93.93%> (?)
py313-pytest84-xdist-cover (windows/x64) 86.50% <93.93%> (?)
py313-pytest90-nodist-cover (macos/arm64) 86.36% <93.93%> (+0.36%) ⬆️
py313-pytest90-nodist-cover (ubuntu/x64) 86.69% <93.93%> (+0.38%) ⬆️
py313-pytest90-nodist-cover (windows/x64) 86.14% <93.93%> (+0.58%) ⬆️
py313-pytest90-xdist-cover (macos/arm64) 86.69% <93.93%> (?)
py313-pytest90-xdist-cover (ubuntu/x64) 86.99% <93.93%> (?)
py313-pytest90-xdist-cover (windows/x64) 86.27% <93.93%> (?)
py314-pytest84-nodist-cover (macos/arm64) 89.60% <93.93%> (+0.24%) ⬆️
py314-pytest84-nodist-cover (ubuntu/x64) 90.10% <93.93%> (+0.45%) ⬆️
py314-pytest84-nodist-cover (windows/x64) 89.34% <93.93%> (+0.25%) ⬆️
py314-pytest84-xdist-cover (macos/arm64) 89.93% <93.93%> (?)
py314-pytest84-xdist-cover (ubuntu/x64) 90.39% <93.93%> (?)
py314-pytest84-xdist-cover (windows/x64) 89.67% <93.93%> (?)
py314-pytest90-nodist-cover (macos/arm64) 89.60% <93.93%> (+0.24%) ⬆️
py314-pytest90-nodist-cover (ubuntu/x64) 90.06% <93.93%> (+0.42%) ⬆️
py314-pytest90-nodist-cover (windows/x64) 89.34% <93.93%> (+0.25%) ⬆️
py314-pytest90-xdist-cover (macos/arm64) 89.93% <93.93%> (?)
py314-pytest90-xdist-cover (ubuntu/x64) 90.19% <93.93%> (?)
py314-pytest90-xdist-cover (windows/x64) 89.67% <93.93%> (?)
pypy311-pytest84-nodist-cover (macos/arm64) 85.58% <93.93%> (+0.39%) ⬆️
pypy311-pytest84-nodist-cover (ubuntu/x64) 86.10% <93.93%> (+0.61%) ⬆️
pypy311-pytest84-nodist-cover (windows/x64) 85.35% <93.93%> (+0.61%) ⬆️
pypy311-pytest84-xdist-cover (macos/arm64) 85.91% <93.93%> (?)
pypy311-pytest84-xdist-cover (ubuntu/x64) 86.40% <93.93%> (?)
pypy311-pytest84-xdist-cover (windows/x64) 85.68% <93.93%> (?)
pypy311-pytest90-nodist-cover (macos/arm64) 85.58% <93.93%> (+0.39%) ⬆️
pypy311-pytest90-nodist-cover (ubuntu/x64) 85.91% <93.93%> (+0.42%) ⬆️
pypy311-pytest90-nodist-cover (windows/x64) 85.35% <93.93%> (+0.40%) ⬆️
pypy311-pytest90-xdist-cover (macos/arm64) 85.91% <93.93%> (?)
pypy311-pytest90-xdist-cover (ubuntu/x64) 86.20% <93.93%> (?)
pypy311-pytest90-xdist-cover (windows/x64) 85.68% <93.93%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ionelmc
Copy link
Copy Markdown
Owner

ionelmc commented Feb 26, 2026

Hmmm nice, looks like your did some refactorings, I'll try find time this week to review.

@akx
Copy link
Copy Markdown
Contributor Author

akx commented Feb 26, 2026

looks like your did some refactorings

Very tiny ones, separated into the first commit for ease of review. ef81697

add_display_options(compare_command.add_argument, prefix='')
add_histogram_options(compare_command.add_argument, prefix='')
compare_command.add_argument(
'--compare-between',
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the prefix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix seems to be forced to '' for all compare subcommand options (and there's no prefix available in this function as far as I can see..?)?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah oops I kinda forgot what this function was for. What I wanted is --compare-between to be --between instead (cause the command already has "compare" - pointless to just repeat "compare" all over).

@ionelmc
Copy link
Copy Markdown
Owner

ionelmc commented Mar 17, 2026

Sorry for the delays, I finally got to try this and I have an example to discuss. I have this stuff locally (bunch of crappy stats for 2 platforms):

> pytest-benchmark list

/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0001_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005508_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0001_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141615_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0002_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005552_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0002_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141718_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0003_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005844_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0003_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141813_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0004_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_010137_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0004_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_143038_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0005_4330f5597d413b9c0d0e54928bac300679822cac_20190107_010839_uncommited-changes.json

If I run pytest-benchmark compare --compare-between --columns=min I get this (without --columns= it's even worse):

--------------------------------------------------------------------------------------------------------------------------------------------- benchmark: 9 tests, 9 sources ---------------------------------------------------------------------------------------------------------------------------------------------
Name (time in ns)        0001_9aa5319 Min  0001_bf76dd3 Min  0002_9aa5319 Min  0002_bf76dd3 Min  0003_9aa5319 Min  0003_bf76dd3 Min  0004_9aa5319 Min  0004_bf76dd3 Min  0005_4330f55 Min  Chg:0001_b/Min  Chg:0002_9/Min  Chg:0002_b/Min  Chg:0003_9/Min  Chg:0003_b/Min  Chg:0004_9/Min  Chg:0004_b/Min  Chg:0005_4/Min
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_xfast                        59.6046           34.2931           61.9888           34.3178           61.9888           34.3048           61.9888           34.3178           59.6046          -42.5%           +4.0%          -42.4%           +4.0%          -42.4%           +4.0%          -42.4%           +0.0%
test_fast                     17,200.0000        6,499.9999       15,400.0008        9,600.0003       14,900.0007        9,400.0002       14,799.9999        9,299.9999       15,799.9966          -62.2%          -10.5%          -44.2%          -13.4%          -45.3%          -14.0%          -45.9%           -8.1%
test_parametrized[2]          17,900.0017       69,300.0002       17,700.0002       51,900.0000       42,400.0027       24,800.0001       21,399.9992       28,399.9998       19,399.9986         +287.2%           -1.1%         +189.9%         +136.9%          +38.5%          +19.6%          +58.7%           +8.4%
test_parametrized[4]          22,799.9990       74,600.0001       64,200.0014       41,500.0000       22,699.9982       25,400.0001       25,299.9998       53,499.9999       19,700.0008         +227.2%         +181.6%          +82.0%           -0.4%          +11.4%          +11.0%         +134.6%          -13.6%
test_parametrized[3]          23,699.9986       74,300.0001       16,900.0014       37,400.0001       23,700.0022       42,199.9998       20,399.9989       21,000.0003       18,599.9997         +213.5%          -28.7%          +57.8%           +0.0%          +78.1%          -13.9%          -11.4%          -21.5%
test_parametrized[0]          29,499.9991       73,700.0000       19,899.9987       53,400.0001       35,799.9998       39,999.9999       22,001.0006       77,100.0000       23,999.9972         +149.8%          -32.5%          +81.0%          +21.4%          +35.6%          -25.4%         +161.4%          -18.6%
test_parametrized[1]          49,499.9986       43,099.9999       18,899.9984       47,999.9999       51,700.0008       32,599.9999       22,699.9982       53,499.9999       20,299.9981          -12.9%          -61.8%           -3.0%           +4.4%          -34.1%          -54.1%           +8.1%          -59.0%
test_slow                  1,060,704.9990    1,031,199.9999    1,061,405.0007    1,066,400.0001    1,049,705.0025    1,064,200.0002    1,039,404.0000    1,066,299.9998    1,036,099.9986           -2.8%           +0.1%           +0.5%           -1.0%           +0.3%           -2.0%           +0.5%           -2.3%
test_slower               10,039,549.0008   10,069,902.0004   10,072,847.9993   10,073,599.9999   10,072,043.9987   10,068,999.9999   10,072,840.0030   10,073,999.9998   10,067,498.9990           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

So the way I understand it 0001_9aa5319 is the reference and then everything is differentiated to that (correct me if I am wrong).

I think it would be better is the differences appear right after the column that the diff is made of.

Currently it's: reference, run1, run2, run3, diff1, diff2, diff3. I think reference, run1, diff1, run2, diff2, run3, diff3 would be better, and you'd be able to make the column headers more compact (eg: just show "difference" or "diff" instead of repeating and trying to fit the run name from the previous columns).

Also, about columns, I would like to propose this idea: make --between be exclusive with --columns. Cause it doesn't make sense to use --between with all the columns unless you're comparing just 2 runs. Actually I am not sure but the defaults are producing too many columns. It should default somehow to either compare 1 stat or compare all the default stats on just 2 runs. Maybe go with these:

  • --between (no value) does some magic: if 2 runs are in the input, compare all the default columns, if more than 2 runs then only compare a default stat like min or mean (I guess we have to decide which is the most useful one to show by default - what to you usually use?)
  • --between=min,max,etc shows those columns (either overrides what --columns sets or errors out by being exclusive option)

@ionelmc
Copy link
Copy Markdown
Owner

ionelmc commented Mar 17, 2026

So lemme know what you think, to sum it up, my ideas where to compact/remove redundant info from the column names (like Chg:runname just repeats runname from previous column) and give better defaults/prevent users from outputting something with 100 columns by default.

@akx
Copy link
Copy Markdown
Contributor Author

akx commented Mar 18, 2026

So the way I understand it 0001_9aa5319 is the reference and then everything is differentiated to that (correct me if I am wrong).

Exactly. Maybe the 1st item's header should have a (*) or something and a post-table legend would explain that it's the reference item?

I think it would be better is the differences appear right after the column that the diff is made of.

Currently it's: reference, run1, run2, run3, diff1, diff2, diff3. I think reference, run1, diff1, run2, diff2, run3, diff3 would be better, and you'd be able to make the column headers more compact (eg: just show "difference" or "diff" instead of repeating and trying to fit the run name from the previous columns).

Sure, good idea. I admit I just basically use --compare-between --columns=ops, so I didn't think about the UX enough when there are more columns.

  • --between (no value) does some magic: if 2 runs are in the input, compare all the default columns, if more than 2 runs then only compare a default stat like min or mean (I guess we have to decide which is the most useful one to show by default - what to you usually use?)

I use OPS, but I'm not sure it's the best metric to default to. 🤔

@akx akx force-pushed the compare-between branch from ee802f9 to b482bdc Compare March 18, 2026 08:16
@akx
Copy link
Copy Markdown
Contributor Author

akx commented Mar 18, 2026

Reworked the column order, made the reference numbers blue, added the reference indicator.

Screenshot 2026-03-18 at 10 17 41

I didn't touch the argument parsing yet though, not sure about the (implicit) default.

@ionelmc
Copy link
Copy Markdown
Owner

ionelmc commented Mar 18, 2026

Reworked the column order, made the reference numbers blue, added the reference indicator.

That delta sign is better but can you reorder the columns? Then you don't need the (*) part and everything is easier to follow if the delta always appears right after the referenced column.

@ionelmc
Copy link
Copy Markdown
Owner

ionelmc commented Mar 18, 2026

I use OPS, but I'm not sure it's the best metric to default to.

Maybe no defaults then, and make --between always require the columns?

@akx
Copy link
Copy Markdown
Contributor Author

akx commented Mar 18, 2026

That delta sign is better but can you reorder the columns? Then you don't need the (*) part and everything is easier to follow if the delta always appears right after the referenced column.

So

| mean@REF | mean@A | Δmean@A | mean@B | Δmean@B | min@REF | min@A | Δmin@A | min@B | Δmin@B |

Okay! That'll work OK with the blue color for the reference.

Maybe no defaults then, and make --between always require the columns?

uv run pytest-benchmark compare .benchmarks/Darwin-CPython-3.14-64bit/*.json --between=ops

? Sounds good to me 👍

@akx akx changed the title Add --compare-between mode Add compare-between mode Mar 24, 2026
@akx akx force-pushed the compare-between branch from b482bdc to 72982c0 Compare March 24, 2026 16:07
@akx
Copy link
Copy Markdown
Contributor Author

akx commented Mar 24, 2026

--betweenified and simplified:
Screenshot 2026-03-24 at 18 05 52

@akx akx force-pushed the compare-between branch from 72982c0 to 5ebe1c6 Compare March 24, 2026 16:09
@ionelmc
Copy link
Copy Markdown
Owner

ionelmc commented Mar 24, 2026

Looks great. Two last things to do I think:

  • adding a screenshot in the readme
  • the column size is already trimmed to 12 characters - perhaps you could use --name instead to provide the actual rendering of the column name? It would be passed as name_format to the renderer class.

@akx akx force-pushed the compare-between branch from 5ebe1c6 to db52808 Compare March 24, 2026 20:01
@akx akx force-pushed the compare-between branch from db52808 to 37e6ec2 Compare March 24, 2026 21:16
@akx
Copy link
Copy Markdown
Contributor Author

akx commented Mar 24, 2026

Switched to using name_format and added a screenshot (meticulously squeezed to 32 colors and optimized 😁).

@akx akx requested a review from ionelmc March 24, 2026 21:26
@ionelmc ionelmc merged commit cf97f26 into ionelmc:master Mar 25, 2026
222 of 223 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants