New MIR Pass: SsaRangePropagation#150309
Conversation
|
@bors try @rust-timer queue (I want to test if try builds work and this PR seems like a good fit for a perf. run :) ) |
This comment has been minimized.
This comment has been minimized.
[EXPERIMENT] New MIR Pass: SsaRangePropagation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (d9c5f75): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.9%, secondary 1.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 1.8%, secondary -1.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 482.582s -> 484.115s (0.32%) |
|
@bors try parent=99ff3fbb86658b427f5dd7daaae8db5626a63c26 @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
[EXPERIMENT] New MIR Pass: SsaRangePropagation
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (edacb2e): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.2%, secondary 1.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 1.8%, secondary -1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 482.582s -> 482.879s (0.06%) |
This comment has been minimized.
This comment has been minimized.
|
The pass is fast, and it reduced MIR body definitely. All regressions are from opt, and there are probably more inlining. |
|
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
|
r? @nnethercote rustbot has assigned @nnethercote. Use |
| if let Err(Some(place)) = self.simplify_operand(cond, location) { | ||
| let successor = Location { block: *target, statement_index: 0 }; | ||
| if location.block != successor.block | ||
| && self.unique_predecessors.contains(successor.block) |
There was a problem hiding this comment.
Why check location.block != successor.block? We know that location.block has at least one predecessor which is reachable from bb0, so which is not itself. So if successor.block has a unique predecessor, it must be different from location.block, mustn't it?
I'm tempted to turn the location.block != successor.block into an assertion, or do you have a test to exhibit this case?
There was a problem hiding this comment.
Yup, and I can change to location.dominates(successor, self.dominators) for assert.
| } | ||
| let successor = Location { block: target, statement_index: 0 }; | ||
| if location.block != successor.block | ||
| && self.unique_predecessors.contains(successor.block) |
There was a problem hiding this comment.
IIUC, the unique_predecessors check should be equivalent to location.dominates(successor, self.dominators), isn't it?
There was a problem hiding this comment.
Consider the test case on_match_2 and on_if_2. They are not equivalent for the SwitchInt terminator.
| } | ||
|
|
||
| let otherwise = Location { block: targets.otherwise(), statement_index: 0 }; | ||
| if place.ty(self.local_decls, self.tcx).ty.is_bool() |
There was a problem hiding this comment.
Do you mind leaving a FIXME to extend this to other types?
There was a problem hiding this comment.
I have left a FIXME. Those FIXMEs can be fixed when it shows beneficial.
| //! let b = a < 9; | ||
| //! if b { | ||
| //! let c = b; // c is true since b is within the range [1, 2) | ||
| //! let d = a < 8; // d is true since b within the range [0, 9) |
There was a problem hiding this comment.
I don't understand these comments. b is boolean, how can it be within a numeric range? And where does the [1, 2) come from?
There was a problem hiding this comment.
This is the integer representation in MIR, so the full range of boolean is [0, 2).
|
Requested reviewer is already assigned to this pull request. Please choose another assignee. |
| // FIXME: This should use the intersection of all valid ranges. | ||
| let (_, range) = | ||
| ranges.iter().find(|(range_loc, _)| range_loc.dominates(location, &self.dominators))?; | ||
| Some(*range) |
There was a problem hiding this comment.
If we don't have a better range, should we fallback to the type-based range? For discriminants in particular?
There was a problem hiding this comment.
Yes. I will reland #148443 as a subsequent PR.
| continue; | ||
| } | ||
| let successor = Location { block: target, statement_index: 0 }; | ||
| if self.unique_predecessors.contains(successor.block) { |
There was a problem hiding this comment.
Can we use location.dominates(successor, &self.dominators) here too?
There was a problem hiding this comment.
No. SwitchInt has multiple targets. For instance, bb0 dominates bb2, but the match value can be 1 or 2 in bb2. on_if_2 show this case.
graph TD;
bb0(["bb0: match(1:bb1, 2:bb2)"])-->|1|bb1;
bb0-->|2|bb2;
bb1-->bb2;
There is another case. The test case is on_match_2. bb0 dominates bb1, but the match value can be 1 or 2.
graph TD;
bb0(["bb0: match(1:bb1, 2:bb1)"])-->|1|bb1;
bb0-->|2|bb1;
There was a problem hiding this comment.
Ah too bad, I would have loved to get rid of unique_predecessors...
|
Thanks! |
This comment has been minimized.
This comment has been minimized.
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 9b37157 (parent) -> 3d087e6 (this PR) Test differencesShow 8 test diffsStage 1
Stage 2
Additionally, 6 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 3d087e6044bddc65723bf42c76fe4cc33a0076b0 --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (3d087e6): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.7%, secondary 0.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary -0.1%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 474.437s -> 473.646s (-0.17%) |
|
perf triage: pre-merge results roughly match the final results. Based on the assesment from #150309 (comment), I assume some small regressions in @rustbot label: +perf-regression-triaged Note that I can't asses very well whether this is justified or not as I don't have that much context. The original motivation for mir-opts was improving compile time, which is what I'm triaging here. From that standpoint this is now negative and it's only acceptable if it later leads to more improvements. As far as I can tell, the motivation here is to improve runtime performance, for which we don't have much coverage in the benchmarks, so from that standpoint it's also not obvious to me whether this is worth the cost. Somebody with more knowledge than me should make that judgement. |
|
At least for now, the pass probably cannot improve runtime performance because all things should have been done in LLVM. IMO, smaller MIR may be improvements or regressions. For instance, inlining :( or some new opportunities that have more constants are found in LLVM. The regressions are smaller. I don't think it's too bad. |
As an alternative to #150192.
Introduces a new pass that propagates the known ranges of SSA locals.
We can know the ranges of SSA locals at some locations for the following code:
This PR only implements a trivial range: we know one value on switch, assert, and assume.