Motivation
Code changes have the potential to shift client/server throughput and latency in ways that aren't visible in functional tests. Today we
have no automated signal when a change regresses performance, and reviewers don't have a consistent number to reference when
evaluating PR impact.
Proposal
Add a CI job that:
- Runs the existing benchmark/ targets (or a curated subset) on the POSIX build.
- Records the results as a machine-readable artifact (CSV/JSON) attached to the workflow run.
- Either:
- (a) Baseline comparison: compares against a stored baseline (main branch numbers) and flags regressions beyond a threshold, or
- (b) Historical record: posts results as a PR comment or uploads them to a dashboard so humans can eyeball trends.
Starting with (b) is likely lower-effort and sufficient for surfacing regressions during review.
Motivation
Code changes have the potential to shift client/server throughput and latency in ways that aren't visible in functional tests. Today we
have no automated signal when a change regresses performance, and reviewers don't have a consistent number to reference when
evaluating PR impact.
Proposal
Add a CI job that:
- (a) Baseline comparison: compares against a stored baseline (main branch numbers) and flags regressions beyond a threshold, or
- (b) Historical record: posts results as a PR comment or uploads them to a dashboard so humans can eyeball trends.
Starting with (b) is likely lower-effort and sufficient for surfacing regressions during review.