Skip to content

Add objdump parser benchmark with aggregated JSON output#51

Merged
partouf merged 5 commits into
mainfrom
benchmark-objdump-parser
Jun 27, 2026
Merged

Add objdump parser benchmark with aggregated JSON output#51
partouf merged 5 commits into
mainfrom
benchmark-objdump-parser

Conversation

@partouf

@partouf partouf commented Jun 27, 2026

Copy link
Copy Markdown
Member

What

Adds a benchmark for the ObjDumpParser pipeline (fromStream + outputJson) plus tooling to run it repeatedly and aggregate run-to-run statistics into a JSON artifact, intended as the basis for a performance baseline.

How it works

  • src/test/objdump_benchmark.cpp — a Catch2 BENCHMARK over four pre-captured objdump resource files of increasing size (example.asm, example_intel.asm, gcc12_sort_object_reloc.asm, gcc12_bin_fmt_O2_flto.asm). Each input is pre-loaded into memory so file I/O isn't measured, and re-parsed with a fresh parser per iteration. Tagged [!benchmark] so it stays out of the default ./asm-parser-test run — existing CI is unaffected.
  • src/test/CMakeLists.txt — compiles the benchmark and enables CATCH_CONFIG_ENABLE_BENCHMARKING (Catch2 v2.13's built-in micro-benchmarking; no new dependencies).
  • scripts/run-benchmarks.sh — runs the benchmark N times (default 5), writing one Catch2 XML per run.
  • scripts/aggregate_bench.py — combines the per-run XMLs into a summary JSON: mean of per-run means, run-to-run stddev, CV%, min/max, and the full per-run list, with commit/ref/label/generated_utc metadata.
  • .github/workflows/benchmark.yml — manual (workflow_dispatch) workflow mirroring build.yml's Conan/gcc-10/Release setup. Builds, runs the benchmarks, prints the summary, and uploads bench-summary.json + raw run XMLs as an artifact.
  • .gitignore — ignores build-release/ and bench-results/.

Usage

cmake -B build-release -G Ninja -S . -DCMAKE_BUILD_TYPE=Release && cmake --build build-release
scripts/run-benchmarks.sh          # -> bench-results/bench-summary.json
# or: BENCH_RUNS=10 scripts/run-benchmarks.sh

Raw per-run data is also available directly as Catch2 XML:

./asm-parser-test "[objdump]" -r xml -o run.xml

Stability notes

Run-to-run variance on the large inputs is small (~2-3%), while example.asm (~150us) is noise-dominated (CV ~15%). The summary's cv_percent field surfaces this per benchmark — the large files are the trustworthy signal.

Next steps (not in this PR)

Once this runs on GitHub, the bench-summary artifact can be promoted to a committed baseline (e.g. bench-results/baseline.json). A follow-up can add a compare_bench.py + PR gate that fails on regressions beyond a threshold. Note GitHub-hosted runners are noisier than a dedicated box, so any gate should use a generous threshold and likely only the large inputs.

🤖 Generated with Claude Code

partouf and others added 5 commits June 27, 2026 15:53
Add a Catch2 micro-benchmark for the ObjDumpParser pipeline (fromStream +
outputJson) over the pre-captured objdump resource files, plus tooling to
run it repeatedly and aggregate run-to-run statistics.

- src/test/objdump_benchmark.cpp: BENCHMARK over four inputs of increasing
  size, tagged [!benchmark] so it stays out of the default test run.
- src/test/CMakeLists.txt: compile the benchmark and enable Catch2's
  CATCH_CONFIG_ENABLE_BENCHMARKING.
- scripts/run-benchmarks.sh: run N times, emit one Catch2 XML per run.
- scripts/aggregate_bench.py: combine the XML runs into a summary JSON
  (mean of per-run means, run-to-run stddev, CV, min/max, per-run means).
- .github/workflows/benchmark.yml: manual workflow that builds Release,
  runs the benchmarks, and uploads the summary + raw XML as an artifact.
- .gitignore: ignore build-release/ and bench-results/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Trigger the benchmark workflow on push to main and on pull requests
(matching build.yml), instead of only on manual dispatch. Keep
workflow_dispatch for ad-hoc reruns.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the branch filters so the benchmark runs on push to any branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump all actions to their latest releases to clear the Node.js 20
deprecation warnings (runners now default to Node 24):

- actions/checkout@v4        -> @v7  (Node 24)
- actions/upload-artifact@v4 -> @v7  (Node 24)
- turtlebrowser/get-conan@main -> @v1.2  (pin to release)
- fnkr/github-action-ghr@v1   -> @v1.3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Capture a baseline benchmark summary (5 runs, GitHub runner) and compare
each new run against it in the workflow.

- bench-results/baseline.json: committed baseline; .gitignore keeps it
  tracked while ignoring other generated benchmark output.
- scripts/compare_bench.py: compares a summary against the baseline,
  prints a per-benchmark delta table, writes a Markdown table to the
  GitHub step summary, and exits non-zero on a regression beyond the
  threshold (default 10%).
- benchmark.yml: run the comparison after generating the summary; upload
  the artifact even on failure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@partouf partouf merged commit 59b21fa into main Jun 27, 2026
3 checks passed
@partouf partouf deleted the benchmark-objdump-parser branch June 27, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant