Optimize IntRange::from_pat, then shrink ParamEnv#77257
Optimize IntRange::from_pat, then shrink ParamEnv#77257bors merged 2 commits intorust-lang:masterfrom
IntRange::from_pat, then shrink ParamEnv#77257Conversation
Previously, this method called the more general `pat_constructor` function, which can return other pattern variants besides `IntRange`. Then it throws away any non-`IntRange` variants. Specialize it so work is only done when it could result in an `IntRange`.
…uct." This reverts commit ab83d37.
|
@bors try |
|
Awaiting bors try build completion |
|
⌛ Trying commit c4d8089 with merge baa01e8429de2c041f0095c01d4e8ca99c0b11a7... |
|
☀️ Try build successful - checks-actions, checks-azure |
|
Queued baa01e8429de2c041f0095c01d4e8ca99c0b11a7 with parent 1ec980d, future comparison URL. |
|
Finished benchmarking try commit (baa01e8429de2c041f0095c01d4e8ca99c0b11a7): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
(moved to summary) |
jackh726
left a comment
There was a problem hiding this comment.
LGTM :) Super weird that this was the cause of the slowdown. (A large one at that!)
|
I'm not quite sure what you mean @vandenheuvel. I think I've gone as far as I care to. If you'd like to investigate further, feel free to keep #77058 open. Perhaps you can do even better. Here's the "slow" assembly ( Excerpt from 0.00 → callq rustc_mir_build::thir::pattern::_match::pat_constructor
0.00 mov 0xb0(%rsp),%cl
0.02 mov 0xc0(%rsp),%r9
0.20 mov 0xb8(%rsp),%rbp
0.01 mov 0xd0(%rsp),%r13
0.01 mov 0xc8(%rsp),%r8
0.01 mov 0xd8(%rsp),%al
0.01 mov 0xd9(%rsp),%edx
32.37 mov %edx,0x40(%rsp)
0.00 mov 0xdc(%rsp),%edx
0.00 mov %edx,0x43(%rsp)
cmp $0x7,%cl
↓ jne 5a1 Excerpt from (I think) 5a1: mov 0x40(%rsp),%edx
20.80 mov 0x43(%rsp),%esi
0.03 mov %esi,0x183(%rsp)
mov %edx,0x180(%rsp)
mov $0x1,%dl
cmp $0x3,%cl
↓ jne 5d9
mov 0x180(%rsp),%ecx
0.01 mov 0x183(%rsp),%edx
23.64 mov %edx,0x23(%rsp)
0.12 mov %ecx,0x20(%rsp)
cmp $0x2,%al
sete %dl
0.07 5d9: mov 0x20(%rsp),%eax
0.01 mov 0x23(%rsp),%ecx
18.20 mov %ecx,0xb3(%rsp)
0.01 mov %eax,0xb0(%rsp)
test %dl,%dl
0.00 ↓ jne 1546
mov %r14,0x18(%rsp)
0.00 mov 0x218(%rsp),%rax
0.00 mov (%rax),%rdi
0.01 mov 0x30(%rbx),%rax
0.01 mov 0x8(%rbx),%r14
0.14 mov 0x10(%rbx),%rsi
mov 0x20(%rbx),%rdx
mov 0x18(%rbx),%r10
movzbl (%rax),%ecx
cmp $0x2,%rcx
↓ je 629 We're repeatedly copying two words into local stack variables, then branching, then reading them again. This sequence has a lot of data hazards, which is clearly slowing down the pipeline significantly. I can't say why LLVM chose to generate code this way, or what causes the data hazards to disappear with my changes to |
|
I think the tag was meant for @vandenheuvel |
Sorry! |
|
r=me unless you're waiting on something further, not quite sure. |
|
@bors r=Mark-Simulacrum |
|
📌 Commit c4d8089 has been approved by |
|
☀️ Test successful - checks-actions, checks-azure |
|
@ecstatic-morse I don't have the expertise for further investigation either, unfortunately. And why I asked whether #77058 would remain open: it seems that the root cause has not yet been found.
So to document that, it seems useful to keep #77058 open, perhaps after renaming it and adding another comment. |
|
I don't think there's anything actionable left in #77058. If you disagree, you're free to reopen it (you should have the requisite permissions). |
|
Final perf results are in. Looks to be an improvement everywhere except |
Resolves #77058.
r? @Mark-Simulacrum
cc @vandenheuvel
Looking at the output of
perf reportfor #76244, the hot instructions seemed to be around the call topat_constructorinIntRange::from_pat. I carried out an obvious optimization, but it actually made the instruction count higher (see #77075). However, it seems to have mitigated whatever was causing the pipeline stalls, so when combined with #76244, it's a net win.As you can see below, the regression in #76244 seems to have originated from something measured by
stalled-cycles-backend. I'll try to collect some finer-grained stats to see if I can isolate it. I wish I had a better idea of what was going on here. I'd like to prevent the regression from reappearing in the future due to small changes in unrelated code.Current `master`:
Shrink `ParamEnv` without changing `IntRange::from_pat`:
Shrink `ParamEnv` and change `IntRange::from_pat`: