Add virtual-cycles to callgrind measure by fitzgen · Pull Request #316 · bytecodealliance/sightglass

fitzgen · 2026-06-10T21:57:53Z

@cfallin mind taking a look at this? Interested in if you have opinions on the cost factors for various cache miss and branch misprediction events.

cfallin · 2026-06-10T22:36:34Z

+            + cost(100, "ll-dcache-read-misses")
+            + cost(100, "ll-dcache-write-misses")
+            + cost(10, "conditional-branch-misses")
+            + cost(10, "indirect-branch-misses")


These numbers are not astronomically wrong, and even plausible for some generic middle-of-the-road CPU... I guess the most important thing is that we fix them once and then do comparisons based on them.

On the other hand, I do wonder if we could instead record the raw stats (that shouldn't be astronomically expensive -- 9 integers rather than 1, per run?) so that we could choose to present them differently, or give the user sliders ("perf on a big machine with out-of-order exec that hides latency" vs "perf on little embedded chip" or whatever)?

FWIW, sightglass will still report the underlying events, it will just also report virtual cycles in addition to those events, so we could always go back and recompute virtual cycles based on old data if we wanted.

I didn't want to make virtual cycles live outside of sightglass tho so that if we say "commit BLAH regressed virtual cycles" we can also provide a simple sightglass CLI command to reproduce the regression.

Ah, good point; as long as we're recording and not losing them then I'm happy.

But yeah, any tweaks to the cost factors you think we should make before merging this? They aren't permanent but would be nice not to have to tweak them a bunch.

Honestly, they seem fine-ish. In a modern CPU an L1 miss will go to L2 with a cost of 10-15 cycles; and misses of LLC will go to DRAM with a cost of 200-300 cycles; and branch mispredicts are typically discovered 10-15-ish pipeline stages in; so these numbers are pretty close to real. The biggest gap will probably be the cache model itself (modern CPUs have L1 at 64KiB-ish, L2 at 256-512 KiB-ish, and LLC at anywhere from 8-32MiB-ish; Callgrind appears to model only L1 and LLC and I don't know what its default sizes are), and the fact that an out-of-order CPU will often hide at least the L1 misses completely. But this will give us something better than inst count.

We fix the size, associativity, and line size of callgrind's caches here:

sightglass/crates/cli/src/benchmark.rs

Lines 29 to 31 in 1d56857

const CACHE_MODEL_I1: &str = "32768,8,64";

const CACHE_MODEL_D1: &str = "32768,8,64";

const CACHE_MODEL_LL: &str = "8388608,16,64";

So 32KiB, 8-way associative, 64B line size L1 instruction and data caches.

8MiB, 16-way associative, 64B line size LL cache.

Let me know if you think we should change any of that.

Seems OK-ish, yeah; maybe double the L1 to 64KiB for a more typical modern CPU (several of the most recent generations of Intel, AMD are as such at least).

fitzgen force-pushed the callgrind-virtual-cycles branch from 4a16cf2 to 86ed210 Compare June 10, 2026 22:04

Add virtual-cycles to callgrind measure

55a8cf0

fitzgen force-pushed the callgrind-virtual-cycles branch from 86ed210 to 55a8cf0 Compare June 10, 2026 22:17

fitzgen requested a review from cfallin June 10, 2026 22:17

cfallin approved these changes Jun 10, 2026

View reviewed changes

fitzgen added 2 commits June 11, 2026 12:29

Adjust virtual cycle cost factors

4e00bb7

Double L1 instruction and data caches to 64KiB in callgrind

311101d

fitzgen merged commit 1027e61 into bytecodealliance:main Jun 11, 2026
16 checks passed

fitzgen deleted the callgrind-virtual-cycles branch June 11, 2026 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add virtual-cycles to callgrind measure#316

Add virtual-cycles to callgrind measure#316
fitzgen merged 3 commits into
bytecodealliance:mainfrom
fitzgen:callgrind-virtual-cycles

fitzgen commented Jun 10, 2026

Uh oh!

cfallin Jun 10, 2026

Uh oh!

fitzgen Jun 10, 2026

Uh oh!

cfallin Jun 10, 2026

Uh oh!

fitzgen Jun 10, 2026

Uh oh!

cfallin Jun 10, 2026

Uh oh!

fitzgen Jun 11, 2026

Uh oh!

cfallin Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	const CACHE_MODEL_I1: &str = "32768,8,64";
	const CACHE_MODEL_D1: &str = "32768,8,64";
	const CACHE_MODEL_LL: &str = "8388608,16,64";

Conversation

fitzgen commented Jun 10, 2026

Uh oh!

cfallin Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

fitzgen Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

cfallin Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

fitzgen Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

cfallin Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

fitzgen Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

cfallin Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants