Conversation
5ca575f to
5355bf1
Compare
5355bf1 to
31ea764
Compare
|
A minor meta-comment: the last commit where the action happens is a bit too large for comfort, it affects about 700 lines of very-tricky code. As a reviewer I would prefer to have several smaller commits to pace myself and be able to think about changes in isolation. It may very well be that the specific change being done here had to be done in one go, or that this is just a side-effect of your working habits for good reasons. But my impression is that https://github.com/oxcaml/oxcaml/pull/5163/commits has a more separated commit history, and that they were somehow squashed together during the (possibly non-trivial) rebase process. I would prefer to keep separate commits when possible, to make review easier. |
damiendoligez
left a comment
There was a problem hiding this comment.
Looks good to me. Just a few remarks.
| * Promoted_hd (so that, e.g., major GC marking can work while the | ||
| * block is still on our oldify todo list). Have to do this here, |
There was a problem hiding this comment.
major GC marking can work while the block is still on our oldify todo list
That doesn't sound very likely because major GC marking cannot be allowed to see the block before you have initialized all the scannable fields.
@NickBarnes is there a sharable benchmark program I could use to reproduce your results? I haven't reviewed the code yet (it's on the list), however the explanation sounds compelling. |
|
@NickBarnes when you have time it would be nice to address the review comments of @damiendoligez so that we can merge this. |
This is a rewrite of the hot code in the minor GC, the function
oldify_one(and, to some extent,oldify_mopup). Basically the fast path througholdify_oneis now all inlined, does not touch global variables, andshared_heap.cmakes the major heap free lists available to an inlined function inshared_heap.h, so that in the common case the minor GC can allocate inline.There are five commits, of which the first four are other small PRs (#14492, #14568, #14569, #14570). The last commit is the main change here.
Upstreamed from OxCaml, which is somewhat different because OCaml has a different free-list implementation (#13616). I have measured the matching PR in OxCaml (oxcaml/oxcaml#5163) to make a 2-3% total runtime improvement in some real workloads. Basically any workload which does much minor GC promotion should see measurable speed gains from this PR.