Manually implement PartialEq for Option<T> and specialize non-nullable types#103556
Manually implement PartialEq for Option<T> and specialize non-nullable types#103556bors merged 3 commits intorust-lang:masterfrom
Conversation
|
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
|
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
|
perf run was requested @bors try @rust-timer queue |
|
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
|
⌛ Trying commit d0a58800564093f4b6db3ae0b5f76547c7564f4b with merge 05058ddce07865ec73eedb7c8d2cddd50d96d959... |
thomcc
left a comment
There was a problem hiding this comment.
This is a pretty significant codegen win, although I'm surprised we don't get this already. Needs a fix for some UB, tho (fixing the UB in godbolt still produces the codegen win).
This comment has been minimized.
This comment has been minimized.
|
@bors try @rust-timer queue |
|
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
|
⌛ Trying commit a8140e59b80fae0c3b27e3a1d6e0b176dd5fb757 with merge 8032e517bc4d1e2309051ba99ef9c8beaff83a82... |
This comment has been minimized.
This comment has been minimized.
|
Made the NonNull test support |
|
@bors try @rust-timer queue |
|
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
|
⌛ Trying commit 5ed28fed2a0b927a267da146d51ec99bce8bc92f with merge a041a05c3184bb8c38b8940422e8951e99b6d3f1... |
|
|
|
☀️ Try build successful - checks-actions |
|
Queued a041a05c3184bb8c38b8940422e8951e99b6d3f1 with parent 6365e5a, future comparison URL. |
|
I went to try to see if there's anything we could do to make LLVM understand this, and realized that right now we're shooting outselves in the foot: https://rust.godbolt.org/z/Ye5xr8P8x What's pub fn demo_std(x: &NonZeroU32, y: &NonZeroU32) -> bool {
x == y
}define noundef zeroext i1 @_ZN7example8demo_std17hcce6db1e74f1c1d4E(ptr noalias nocapture noundef readonly align 4 dereferenceable(4) %0, ptr noalias nocapture noundef readonly align 4 dereferenceable(4) %1) unnamed_addr #0 {
%_9 = load i32, ptr %0, align 4
%_10 = load i32, ptr %1, align 4
%2 = icmp eq i32 %_9, %_10
ret i1 %2
}Whereas if you write the obvious implementation yourself pub fn demo_obvious(x: &NonZeroU32, y: &NonZeroU32) -> bool {
x.get() == y.get()
}Then the loads get the efine noundef zeroext i1 @_ZN7example12demo_obvious17haee70b6eb73f133dE(ptr noalias nocapture noundef readonly align 4 dereferenceable(4) %x, ptr noalias nocapture noundef readonly align 4 dereferenceable(4) %y) unnamed_addr #0 {
%self = load i32, ptr %x, align 4, !range !2, !noundef !3
%self1 = load i32, ptr %y, align 4, !range !2, !noundef !3
%0 = icmp eq i32 %self, %self1
ret i1 %0
}
!2 = !{i32 1, i32 0}It's possible that LLVM still might not be able to optimize this even with that for other reasons (#49572 (comment)), but I think we should at least find out whether giving LLVM the obvious information would be enough to let it make this transform -- it would be great if we could solve this in the EDIT: Oh, if nikic already looked then there's probably no easy fix. |
|
cc #49892 |
b6b33c2 to
a1b650c
Compare
|
@rustbot ready |
|
☔ The latest upstream changes (presumably #103797) made this pull request unmergeable. Please resolve the merge conflicts. |
a1b650c to
20f2d8b
Compare
|
Thanks! It's great that this worked out without adding any @bors r+ |
|
Weird, let's try that again @bors r+ |
|
☀️ Test successful - checks-actions |
|
Finished benchmarking commit (8841bee): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
|
Perf changes are few, tiny, and not a concern. @rustbot label: +perf-regression-triaged |
…eq, r=scottmcm Manually implement PartialEq for Option<T> and specialize non-nullable types This PR manually implements `PartialEq` and `StructuralPartialEq` for `Option`, which seems to produce slightly better codegen than the automatically derived implementation. It also allows specializing on the `core::num::NonZero*` and `core::ptr::NonNull` types, taking advantage of the niche optimization by transmuting the `Option<T>` to `T` to be compared directly, which can be done in just two instructions. A comparison of the original, new and specialized code generation is available [here](https://godbolt.org/z/dE4jxdYsa).
This PR manually implements
PartialEqandStructuralPartialEqforOption, which seems to produce slightly better codegen than the automatically derived implementation.It also allows specializing on the
core::num::NonZero*andcore::ptr::NonNulltypes, taking advantage of the niche optimization by transmuting theOption<T>toTto be compared directly, which can be done in just two instructions.A comparison of the original, new and specialized code generation is available here.