Stop emitting UbChecks on every Vec→Slice#150265
Stop emitting UbChecks on every Vec→Slice#150265rust-bors[bot] merged 2 commits intorust-lang:mainfrom
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Stop emitting UbChecks on every Vec→Slice
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (d3405d7): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.5%, secondary 1.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.0%, secondary -1.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.2%, secondary -0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 481.34s -> 483.129s (0.37%) |
9ca160d to
fd8744f
Compare
|
r? @ibraheemdev rustbot has assigned @ibraheemdev. Use |
|
Reconfirming after rebasing, but should be basically the same |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Stop emitting UbChecks on every Vec→Slice
| fn not_equal(&self, other: &[B]) -> bool { | ||
| !self.equal(other) | ||
| } |
There was a problem hiding this comment.
Annot: nothing actually overrode this anywhere, so removed it in favour of the usual PartialEq::ne.
| StorageLive(_38); | ||
| _36 = copy _29 as &[u8] (Transmute); | ||
| _38 = copy _28 as &[u8] (Transmute); | ||
| _7 = <[u8] as PartialEq>::eq(move _36, move _38) -> [return: bb19, unwind unreachable]; |
There was a problem hiding this comment.
annot: note that we're still inlining the whole &String → &str → &u8 part (since it'll essentially disappear in LLVM), just stopping at <[_]>::eq which sharing at the MIR level is probably best.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (fa07ba2): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.1%, secondary 1.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.0%, secondary -1.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.2%, secondary -0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 481.395s -> 479.879s (-0.31%) |
Spotted this in PR148766's test changes. It doesn't seem like this ubcheck would catch anything useful; let's see if skipping it helps perf.
f9695bf to
c48df5d
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
@rustbot reroll |
|
@bors r+ rollup=never |
This comment has been minimized.
This comment has been minimized.
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 3fda0e4 (parent) -> 85d0cdf (this PR) Test differencesShow 268 test diffs268 doctest diffs were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 85d0cdfe3489ff1a4b86daeddba6fcf82b47bd65 --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (85d0cdf): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.5%, secondary 1.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.3%, secondary 1.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.3%, secondary -0.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 474.629s -> 473.885s (-0.16%) |
|
Wins clearly outweigh regressions here: check, debug, and doc are all green. opt is mixed The one notable red is in syn-opt-full, which loses the gains it had in #148766, but I think overall this is still good. Might be interesting future work to see why it specifically got hit by this, though. @rustbot label: +perf-regression-triaged |
…acrum Tweak `SlicePartialEq` to allow MIR-inlining the `compare_bytes` call #150265 disabled this because it was a net perf win, but let's see if we can tweak the structure of this to allow more inlining on this side while still not MIR-inlining the loop when it's not just `memcmp` and thus hopefully preserving the perf win. This should also allow MIR-inlining the length check, which was previously blocked, and thus might allow some obvious non-matches to optimize away as well.
…acrum Tweak `SlicePartialEq` to allow MIR-inlining the `compare_bytes` call rust-lang/rust#150265 disabled this because it was a net perf win, but let's see if we can tweak the structure of this to allow more inlining on this side while still not MIR-inlining the loop when it's not just `memcmp` and thus hopefully preserving the perf win. This should also allow MIR-inlining the length check, which was previously blocked, and thus might allow some obvious non-matches to optimize away as well.
…acrum Tweak `SlicePartialEq` to allow MIR-inlining the `compare_bytes` call rust-lang/rust#150265 disabled this because it was a net perf win, but let's see if we can tweak the structure of this to allow more inlining on this side while still not MIR-inlining the loop when it's not just `memcmp` and thus hopefully preserving the perf win. This should also allow MIR-inlining the length check, which was previously blocked, and thus might allow some obvious non-matches to optimize away as well.
…acrum Tweak `SlicePartialEq` to allow MIR-inlining the `compare_bytes` call rust-lang/rust#150265 disabled this because it was a net perf win, but let's see if we can tweak the structure of this to allow more inlining on this side while still not MIR-inlining the loop when it's not just `memcmp` and thus hopefully preserving the perf win. This should also allow MIR-inlining the length check, which was previously blocked, and thus might allow some obvious non-matches to optimize away as well.
…acrum Tweak `SlicePartialEq` to allow MIR-inlining the `compare_bytes` call rust-lang/rust#150265 disabled this because it was a net perf win, but let's see if we can tweak the structure of this to allow more inlining on this side while still not MIR-inlining the loop when it's not just `memcmp` and thus hopefully preserving the perf win. This should also allow MIR-inlining the length check, which was previously blocked, and thus might allow some obvious non-matches to optimize away as well.
Spotted this in #148766's test changes. It doesn't seem like this ubcheck would catch anything useful; let's see if skipping it helps perf. (After all, this is inside every
[]on a vec, among other things.)