In the past there was daily benchmarks pipeline that collected metrics from the e2e tests and analyzed them for regression.
In practice it was ignored for a long time, likely because the load was non-representative.
Challenge
Find representative load:
- Hard to simulate real load unless on the testnet/mainnet.
- What is relevant load? Checkpoints sync, genesis sync or sync 3 months of data?
- The relevant load should test more then just storage sync...
- Automatic genesis sync would be nice to see if anything is breaking but this would take too much time.
- How performant are current CI runners? Probably we would need custom bare metal instance for it...
Proposed solution
Have a weekly pipeline or on release pipeline, that triggers representative load and also implicitly checks breaking changes.
E.g. sync 3 most recent months of the data and collect and compare the actual metrics, between the pipeline runs.
#6423 enables this, whilst #6422 may define an upper bound for the sync age.
In the past there was daily benchmarks pipeline that collected metrics from the e2e tests and analyzed them for regression.
In practice it was ignored for a long time, likely because the load was non-representative.
Challenge
Find representative load:
Proposed solution
Have a weekly pipeline or on release pipeline, that triggers representative load and also implicitly checks breaking changes.
E.g. sync 3 most recent months of the data and collect and compare the actual metrics, between the pipeline runs.
#6423 enables this, whilst #6422 may define an upper bound for the sync age.