Add a callgrind measure#314
Conversation
This commit adds a new `callgrind` measure. It must always be run inside a child process that is running under Valgrind's Callgrind tool. It uses the `valgrind-requests` crate to communicate with Valgrind and record data from the simulated caches and branch predictor. Running under Callgrind is much slower than running natively, but also is much less noisy. Therefore we adjust the default numbers of processes and iterations per process accordingly. Fixes bytecodealliance#312
f01f606 to
f03bed1
Compare
| @@ -58,8 +238,8 @@ pub struct BenchmarkCommand { | |||
| engine_flags: Option<String>, | |||
|
|
|||
| /// How many processes should we use for each Wasm benchmark? | |||
There was a problem hiding this comment.
Might be nice to document the default here so it shows up in help output again as a hint to users looking to modify things. Same applies for iterations-per-process.
posborne
left a comment
There was a problem hiding this comment.
Changes look good to me; I was able to do a couple runs with callgrind to confirm.
The time to run is definitely astronomically slow; if the data is reliable with a smaller sample, we may want to see about coming up some smaller inputs. Possibly a different default suite when targeting callgrind compared with the default. That's, of course, secondary to getting results that can be trusted but there's some balance point in there.
For sure, I am planning on doing a pass over the benchmarks to get them all running roughly the same amount of instructions per execution iteration when I have a chance. Probably won't be for every single one, but the ones that are easy enough to do that, I will. FWIW, I will also be making a PR for the PCA stuff soon. Have it working locally, just need to do some final tweaks. |
This commit adds a new
callgrindmeasure. It must always be run inside a child process that is running under Valgrind's Callgrind tool. It uses thevalgrind-requestscrate to communicate with Valgrind and record data from the simulated caches and branch predictor.Running under Callgrind is much slower than running natively, but also is much less noisy. Therefore we adjust the default numbers of processes and iterations per process accordingly.
Fixes #312