Add objdump parser benchmark with aggregated JSON output by partouf · Pull Request #51 · compiler-explorer/asm-parser

partouf · 2026-06-27T13:53:48Z

What

Adds a benchmark for the ObjDumpParser pipeline (fromStream + outputJson) plus tooling to run it repeatedly and aggregate run-to-run statistics into a JSON artifact, intended as the basis for a performance baseline.

How it works

src/test/objdump_benchmark.cpp — a Catch2 BENCHMARK over four pre-captured objdump resource files of increasing size (example.asm, example_intel.asm, gcc12_sort_object_reloc.asm, gcc12_bin_fmt_O2_flto.asm). Each input is pre-loaded into memory so file I/O isn't measured, and re-parsed with a fresh parser per iteration. Tagged [!benchmark] so it stays out of the default ./asm-parser-test run — existing CI is unaffected.
src/test/CMakeLists.txt — compiles the benchmark and enables CATCH_CONFIG_ENABLE_BENCHMARKING (Catch2 v2.13's built-in micro-benchmarking; no new dependencies).
scripts/run-benchmarks.sh — runs the benchmark N times (default 5), writing one Catch2 XML per run.
scripts/aggregate_bench.py — combines the per-run XMLs into a summary JSON: mean of per-run means, run-to-run stddev, CV%, min/max, and the full per-run list, with commit/ref/label/generated_utc metadata.
.github/workflows/benchmark.yml — manual (workflow_dispatch) workflow mirroring build.yml's Conan/gcc-10/Release setup. Builds, runs the benchmarks, prints the summary, and uploads bench-summary.json + raw run XMLs as an artifact.
.gitignore — ignores build-release/ and bench-results/.

Usage

cmake -B build-release -G Ninja -S . -DCMAKE_BUILD_TYPE=Release && cmake --build build-release
scripts/run-benchmarks.sh          # -> bench-results/bench-summary.json
# or: BENCH_RUNS=10 scripts/run-benchmarks.sh

Raw per-run data is also available directly as Catch2 XML:

./asm-parser-test "[objdump]" -r xml -o run.xml

Stability notes

Run-to-run variance on the large inputs is small (~2-3%), while example.asm (~150us) is noise-dominated (CV ~15%). The summary's cv_percent field surfaces this per benchmark — the large files are the trustworthy signal.

Next steps (not in this PR)

Once this runs on GitHub, the bench-summary artifact can be promoted to a committed baseline (e.g. bench-results/baseline.json). A follow-up can add a compare_bench.py + PR gate that fails on regressions beyond a threshold. Note GitHub-hosted runners are noisier than a dedicated box, so any gate should use a generous threshold and likely only the large inputs.

🤖 Generated with Claude Code

Add a Catch2 micro-benchmark for the ObjDumpParser pipeline (fromStream + outputJson) over the pre-captured objdump resource files, plus tooling to run it repeatedly and aggregate run-to-run statistics. - src/test/objdump_benchmark.cpp: BENCHMARK over four inputs of increasing size, tagged [!benchmark] so it stays out of the default test run. - src/test/CMakeLists.txt: compile the benchmark and enable Catch2's CATCH_CONFIG_ENABLE_BENCHMARKING. - scripts/run-benchmarks.sh: run N times, emit one Catch2 XML per run. - scripts/aggregate_bench.py: combine the XML runs into a summary JSON (mean of per-run means, run-to-run stddev, CV, min/max, per-run means). - .github/workflows/benchmark.yml: manual workflow that builds Release, runs the benchmarks, and uploads the summary + raw XML as an artifact. - .gitignore: ignore build-release/ and bench-results/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Trigger the benchmark workflow on push to main and on pull requests (matching build.yml), instead of only on manual dispatch. Keep workflow_dispatch for ad-hoc reruns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop the branch filters so the benchmark runs on push to any branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@v7

Bump all actions to their latest releases to clear the Node.js 20 deprecation warnings (runners now default to Node 24): - actions/checkout@v4 -> @v7 (Node 24) - actions/upload-artifact@v4 -> @v7 (Node 24) - turtlebrowser/get-conan@main -> @v1.2 (pin to release) - fnkr/github-action-ghr@v1 -> @v1.3 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Capture a baseline benchmark summary (5 runs, GitHub runner) and compare each new run against it in the workflow. - bench-results/baseline.json: committed baseline; .gitignore keeps it tracked while ignoring other generated benchmark output. - scripts/compare_bench.py: compares a summary against the baseline, prints a per-benchmark delta table, writes a Markdown table to the GitHub step summary, and exits non-zero on a regression beyond the threshold (default 10%). - benchmark.yml: run the comparison after generating the summary; upload the artifact even on failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

partouf and others added 5 commits June 27, 2026 15:53

Run benchmark workflow on all branches

5f5c2c2

Drop the branch filters so the benchmark runs on push to any branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

partouf merged commit 59b21fa into main Jun 27, 2026
3 checks passed

partouf deleted the benchmark-objdump-parser branch June 27, 2026 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add objdump parser benchmark with aggregated JSON output#51

Add objdump parser benchmark with aggregated JSON output#51
partouf merged 5 commits into
mainfrom
benchmark-objdump-parser

partouf commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

partouf commented Jun 27, 2026

What

How it works

Usage

Stability notes

Next steps (not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant