reuse RHS allocation for vec.extend(vec.into_iter()) when they do not fit into the LHS#77496
reuse RHS allocation for vec.extend(vec.into_iter()) when they do not fit into the LHS#77496the8472 wants to merge 12 commits intorust-lang:masterfrom
Conversation
|
r? @KodrAus (rust_highfive has picked a reviewer for you, use r? to override) |
|
@bors try @rust-timer queue |
|
Awaiting bors try build completion |
|
⌛ Trying commit b0a249415002259095836413843f80f033c3d782 with merge 097a1aa29aaea51d74064470a76793b00bb9ec11... |
|
CC @pickfire since you wanted to review vec changes |
|
☀️ Try build successful - checks-actions, checks-azure |
|
Queued 097a1aa29aaea51d74064470a76793b00bb9ec11 with parent 738d4a7, future comparison URL. |
|
Finished benchmarking try commit (097a1aa29aaea51d74064470a76793b00bb9ec11): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
Nice, moderate improvements across the board :) |
|
Oh wait I was looking at bootstrap - the instruction counts seem to be about the same. |
b0a2494 to
c5af975
Compare
Co-authored-by: Ivan Tham <pickfire@riseup.net>
b8af850 to
8a0a13a
Compare
|
| /// * `offset == 0` is always valid | ||
| /// * `offset` must be positive | ||
| unsafe fn into_vec(self, offset: isize) -> Vec<T> { | ||
| let dst = unsafe { self.buf.as_ptr().offset(offset) }; |
There was a problem hiding this comment.
We could probably use add instead of offset for usize, but I don't know if the compiler will optimize out the offset if it is 0? Does it or should we have two functions, one with offset and one without?
There was a problem hiding this comment.
0 is a constant in the other callsite, so it should be easy to optimize for llvm.
There was a problem hiding this comment.
cc @lzutao to see if he's interested to check this out
| && self.capacity() - self.len() < iterator.len() | ||
| && iterator.cap - iterator.len() >= self.len() |
There was a problem hiding this comment.
Can this be simplified as?
| && self.capacity() - self.len() < iterator.len() | |
| && iterator.cap - iterator.len() >= self.len() | |
| && iterator.len() - self.len() < iterator.cap - self.capacity() |
I wonder if this will always grow the vec if the iterator is larger.
There was a problem hiding this comment.
The proposed change would underflow if self is larger than iterator
There was a problem hiding this comment.
This is technically still an overflow, but that iterator.len() - self.len() would panic or wrap if, say, self.capacity() == 20_000 and self.len() == 19_950, and iterator.len() == 100.
There was a problem hiding this comment.
I meant the case where self.capacity() > iterator.cap . The subtraction would underflow the usize result and thus lead to the inequality unexpectedly evaluating to true which would then violate the safety constraints of into_vec_with_uninit_prefix
There was a problem hiding this comment.
Then how about?
| && self.capacity() - self.len() < iterator.len() | |
| && iterator.cap - iterator.len() >= self.len() | |
| && iterator.len().saturating_sub(self.len()) < iterator.cap.saturating_sub(self.capacity()) |
I
There was a problem hiding this comment.
For a self with len == 2 && cap == 2 and an iterator len == 2 && cap == 3 that would evaluate to true and attempt to store 4 elements into an allocation of 3. 💣💥
Significant parts were changed since then, yes. |
| // ² memcpy | ||
| // ³ memmove | ||
| // ⁴ free | ||
| // |
There was a problem hiding this comment.
That line was intentionally left blank.
There was a problem hiding this comment.
I think following the rest of the codebase convention here would have this blank line not have the comment //, and just be empty.
| // ³ into_vec ____------BBBB____-------------- Vec(0x00, 0, 4) Vec(0x0a, 4, 8) | ||
| // ⁴ *self = v ----------BBBB____-------------- Vec(0x0a, 4, 8) | ||
| // | ||
| // ## empty self, pristine iterator |
There was a problem hiding this comment.
Do we even check for this case below?
By the way, nice diagram.
There was a problem hiding this comment.
into_vec_with_uninit_prefix is pretty much just a struct conversion when there's nothing to move.
library/alloc/src/vec.rs
Outdated
| // ## insufficient capacity | ||
| // | ||
| // [initial] AAAAA-----BBBBBB__-------------- Vec(0x00, 5, 5) IntoIter(0x0a, 0x0a, 0x0f, 8) | ||
| // ¹² reserve(6) ----------BBBBBB__--AAAAA______- Vec(0x14, 5, 11) IntoIter(0x0a, 0x0a, 0x0f, 8) |
There was a problem hiding this comment.
Wait, is this correct or maybe I think wrongly. Shouldn't reserve only have one malloc (realloc) to reallocate the memory when the alignment is the same which in this case should be the same, why does it needs a memmove?
Lines 169 to 183 in e055f87
There was a problem hiding this comment.
In many cases realloc is just malloc+memcpy+free. Only in some cases it can extend the allocation in place.
| return; | ||
| } | ||
| iterator.ptr = iterator.end; | ||
| iterator.move_into(self); |
There was a problem hiding this comment.
It would be cooler if the code are linked to the cool diagram.
| iterator.move_into(self); | |
| // Insufficient capacity | |
| iterator.move_into(self); |
There was a problem hiding this comment.
Well, yes and no.
With the optimization present that is indeed all it covers. But it is the general codepath that also works without the optimization. So I don't want to give the impression that it can only handle that case.
pickfire
left a comment
There was a problem hiding this comment.
Looks good to me but I think we should run the timer again.
|
@jyn514 poke for another perf run. |
|
@bors try @rust-timer queue Thanks for the ping, I'd forgotten about this. |
|
Awaiting bors try build completion |
|
⌛ Trying commit 1885f38 with merge d99d46fbeabf815d807dea479ae158f7ae9041c2... |
|
☀️ Try build successful - checks-actions, checks-azure |
|
Queued d99d46fbeabf815d807dea479ae158f7ae9041c2 with parent b5c9e24, future comparison URL. |
|
Finished benchmarking try commit (d99d46fbeabf815d807dea479ae158f7ae9041c2): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
Slight improvements in the bootstrap timings, otherwise not much change. Not sure if it's worth it. |
|
Slight change but the memory did improve, less memory used. |
|
I am pretty sure that the memory usage here is just noise. Noise in the 20% range is not unusual for max-rss. I think the improvements on the bootstrap timing is also likely noise, though that's less clear. |
|
@the8472 Hmm, was there a usecase you had in mind originally when optimizing this case? |
|
@KodrAus just trying to reduce allocations and memcpys. But looks like it didn't work out. |
I tried a broader version of this optimization in #70793 but it regressed compile times because it emitted more IR for every single use of
extend()and thus had to be removed from that PR. This attempt is narrower since it only applies to cases where we're extending/appending directly from another vec.The aim is to improve runtime behavior, not necessarily compile time but I still want a perf-run to make sure it's not regressing significantly unlike the original attempt.