Create the previous dep graph index on a background thread#116375
Create the previous dep graph index on a background thread#116375Zoxc wants to merge 1 commit intorust-lang:mainfrom
Conversation
| fn prefetch(self: &Arc<Self>) { | ||
| if !self.index.is_empty() { | ||
| let this = self.clone(); | ||
| thread::spawn(move || { |
There was a problem hiding this comment.
Shouldn't we block this on a job server token being available for this extra computation work?
There was a problem hiding this comment.
I'm not sure if that's performance win as it's more efficient if this completes in a timely manner. setup_index is less efficient than doing all the dep kinds at once.
There was a problem hiding this comment.
I'm not sure if that's performance win as it's more efficient if this completes in a timely manner.
I don't really follow. Jobserver token availability isn't about performance, strictly speaking, it's about making sure that we're not consuming more resources than the host has and/or the user is willing to give. We've definitely had complaints about -j1 (for example) not being respected before.
If we don't have a token available, that may mean that we should do the work in-band (i.e., not spawning the thread) even if that is slower. But, that's what's going to happen anyway on a system that's already CPU-saturated - just via kernel scheduling - which is the alternative here, right? So I don't really understand how the token would be a problem.
There was a problem hiding this comment.
I guess we could add a try_acquire method to the jobserver and only spawn the thread if we get a token. It looks like that would be racy on POSIX though. I'm not sure if macOS or Linux offers a way to do non-blocking reads.
There was a problem hiding this comment.
I've made the PR check for a free token now. It's a bit racy but probably works fine.
|
In a typical compilation, when happens the first use of this reciprocal index? |
It will end up calling |
|
☔ The latest upstream changes (presumably #118900) made this pull request unmergeable. Please resolve the merge conflicts. |
This comment has been minimized.
This comment has been minimized.
edf6ef4 to
1b9f6d8
Compare
Encode dep graph edges directly from the previous graph when promoting This encodes dep graph edges directly from the previous graph when promoting nodes from a previous session, avoiding allocations / copies. Based on rust-lang#122064 and rust-lang#116375. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:unchanged</td><td align="right">0.4177s</td><td align="right">0.4072s</td><td align="right">💚 -2.52%</td></tr><tr><td>🟣 <b>hyper</b>:check:unchanged</td><td align="right">0.1430s</td><td align="right">0.1420s</td><td align="right"> -0.69%</td></tr><tr><td>🟣 <b>regex</b>:check:unchanged</td><td align="right">0.3106s</td><td align="right">0.3038s</td><td align="right">💚 -2.19%</td></tr><tr><td>🟣 <b>syn</b>:check:unchanged</td><td align="right">0.5823s</td><td align="right">0.5688s</td><td align="right">💚 -2.33%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:unchanged</td><td align="right">1.3992s</td><td align="right">1.3692s</td><td align="right">💚 -2.14%</td></tr><tr><td>Total</td><td align="right">2.8528s</td><td align="right">2.7910s</td><td align="right">💚 -2.17%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9803s</td><td align="right">💚 -1.97%</td></tr></table>
|
Probably should do a perf run here as the result in #122070 was more mixed than expected. |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Create the previous dep graph index on a background thread This changes `SerializedDepGraph.index` to be computed on-demand per dep kind. This means we can immediately start using queries without waiting for the entire index to be constructed. Additionally a background thread is started which computes the entire index, effectively off-loading most of the index construction to the background thread. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Memory</td><td align="right">Memory</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:unchanged</td><td align="right">0.4259s</td><td align="right">0.4225s</td><td align="right"> -0.79%</td><td align="right">89.65 MiB</td><td align="right">90.08 MiB</td><td align="right"> 0.48%</td></tr><tr><td>🟣 <b>hyper</b>:check:unchanged</td><td align="right">0.1425s</td><td align="right">0.1417s</td><td align="right"> -0.53%</td><td align="right">47.85 MiB</td><td align="right">47.91 MiB</td><td align="right"> 0.13%</td></tr><tr><td>🟣 <b>regex</b>:check:unchanged</td><td align="right">0.3188s</td><td align="right">0.3157s</td><td align="right"> -0.97%</td><td align="right">71.09 MiB</td><td align="right">71.58 MiB</td><td align="right"> 0.69%</td></tr><tr><td>🟣 <b>syn</b>:check:unchanged</td><td align="right">0.5895s</td><td align="right">0.5813s</td><td align="right">💚 -1.38%</td><td align="right">101.68 MiB</td><td align="right">102.15 MiB</td><td align="right"> 0.47%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:unchanged</td><td align="right">1.4392s</td><td align="right">1.4361s</td><td align="right"> -0.22%</td><td align="right">200.62 MiB</td><td align="right">201.68 MiB</td><td align="right"> 0.53%</td></tr><tr><td>Total</td><td align="right">2.9158s</td><td align="right">2.8974s</td><td align="right"> -0.63%</td><td align="right">510.89 MiB</td><td align="right">513.40 MiB</td><td align="right"> 0.49%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9922s</td><td align="right"> -0.78%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.46%</td></tr></table> <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Memory</td><td align="right">Memory</td><td align="right">%</th></tr><tr><td>🟠 <b>clap</b>:debug:unchanged</td><td align="right">1.0753s</td><td align="right">1.0684s</td><td align="right"> -0.64%</td><td align="right">142.80 MiB</td><td align="right">142.72 MiB</td><td align="right"> -0.05%</td></tr><tr><td>🟠 <b>hyper</b>:debug:unchanged</td><td align="right">0.2857s</td><td align="right">0.2847s</td><td align="right"> -0.35%</td><td align="right">63.06 MiB</td><td align="right">63.15 MiB</td><td align="right"> 0.13%</td></tr><tr><td>🟠 <b>regex</b>:debug:unchanged</td><td align="right">0.7703s</td><td align="right">0.7633s</td><td align="right"> -0.90%</td><td align="right">108.76 MiB</td><td align="right">109.03 MiB</td><td align="right"> 0.25%</td></tr><tr><td>🟠 <b>syn</b>:debug:unchanged</td><td align="right">1.0596s</td><td align="right">1.0531s</td><td align="right"> -0.62%</td><td align="right">142.08 MiB</td><td align="right">142.18 MiB</td><td align="right"> 0.07%</td></tr><tr><td>🟠 <b>syntex_syntax</b>:debug:unchanged</td><td align="right">2.7530s</td><td align="right">2.7274s</td><td align="right"> -0.93%</td><td align="right">308.92 MiB</td><td align="right">308.63 MiB</td><td align="right"> -0.09%</td></tr><tr><td>Total</td><td align="right">5.9438s</td><td align="right">5.8969s</td><td align="right"> -0.79%</td><td align="right">765.62 MiB</td><td align="right">765.71 MiB</td><td align="right"> 0.01%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9931s</td><td align="right"> -0.69%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.06%</td></tr></table> r? `@cjgillot`
|
☀️ Try build successful - checks-actions |
1 similar comment
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (fa1beb3): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 647.634s -> 647.342s (-0.05%) |
|
That's quite an odd performance result. It seem to have large wall time regressions which don't show up in the self profiling results nor can I reproduce them locally. |
|
@Zoxc |
|
@Zoxc |
This comment has been minimized.
This comment has been minimized.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Create the previous dep graph index on a background thread This changes `SerializedDepGraph.index` to be computed on-demand per dep kind. This means we can immediately start using queries without waiting for the entire index to be constructed. Additionally a background thread is started which computes the entire index, effectively off-loading most of the index construction to the background thread. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Memory</td><td align="right">Memory</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:unchanged</td><td align="right">0.4259s</td><td align="right">0.4225s</td><td align="right"> -0.79%</td><td align="right">89.65 MiB</td><td align="right">90.08 MiB</td><td align="right"> 0.48%</td></tr><tr><td>🟣 <b>hyper</b>:check:unchanged</td><td align="right">0.1425s</td><td align="right">0.1417s</td><td align="right"> -0.53%</td><td align="right">47.85 MiB</td><td align="right">47.91 MiB</td><td align="right"> 0.13%</td></tr><tr><td>🟣 <b>regex</b>:check:unchanged</td><td align="right">0.3188s</td><td align="right">0.3157s</td><td align="right"> -0.97%</td><td align="right">71.09 MiB</td><td align="right">71.58 MiB</td><td align="right"> 0.69%</td></tr><tr><td>🟣 <b>syn</b>:check:unchanged</td><td align="right">0.5895s</td><td align="right">0.5813s</td><td align="right">💚 -1.38%</td><td align="right">101.68 MiB</td><td align="right">102.15 MiB</td><td align="right"> 0.47%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:unchanged</td><td align="right">1.4392s</td><td align="right">1.4361s</td><td align="right"> -0.22%</td><td align="right">200.62 MiB</td><td align="right">201.68 MiB</td><td align="right"> 0.53%</td></tr><tr><td>Total</td><td align="right">2.9158s</td><td align="right">2.8974s</td><td align="right"> -0.63%</td><td align="right">510.89 MiB</td><td align="right">513.40 MiB</td><td align="right"> 0.49%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9922s</td><td align="right"> -0.78%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.46%</td></tr></table> <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Memory</td><td align="right">Memory</td><td align="right">%</th></tr><tr><td>🟠 <b>clap</b>:debug:unchanged</td><td align="right">1.0753s</td><td align="right">1.0684s</td><td align="right"> -0.64%</td><td align="right">142.80 MiB</td><td align="right">142.72 MiB</td><td align="right"> -0.05%</td></tr><tr><td>🟠 <b>hyper</b>:debug:unchanged</td><td align="right">0.2857s</td><td align="right">0.2847s</td><td align="right"> -0.35%</td><td align="right">63.06 MiB</td><td align="right">63.15 MiB</td><td align="right"> 0.13%</td></tr><tr><td>🟠 <b>regex</b>:debug:unchanged</td><td align="right">0.7703s</td><td align="right">0.7633s</td><td align="right"> -0.90%</td><td align="right">108.76 MiB</td><td align="right">109.03 MiB</td><td align="right"> 0.25%</td></tr><tr><td>🟠 <b>syn</b>:debug:unchanged</td><td align="right">1.0596s</td><td align="right">1.0531s</td><td align="right"> -0.62%</td><td align="right">142.08 MiB</td><td align="right">142.18 MiB</td><td align="right"> 0.07%</td></tr><tr><td>🟠 <b>syntex_syntax</b>:debug:unchanged</td><td align="right">2.7530s</td><td align="right">2.7274s</td><td align="right"> -0.93%</td><td align="right">308.92 MiB</td><td align="right">308.63 MiB</td><td align="right"> -0.09%</td></tr><tr><td>Total</td><td align="right">5.9438s</td><td align="right">5.8969s</td><td align="right"> -0.79%</td><td align="right">765.62 MiB</td><td align="right">765.71 MiB</td><td align="right"> 0.01%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9931s</td><td align="right"> -0.69%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.06%</td></tr></table> r? `@cjgillot`
|
☀️ Try build successful - checks-actions |
|
Finished benchmarking commit (6c5d278): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -0.1%, secondary -0.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.4%, secondary -1.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 774.166s -> 776.361s (0.28%) |
|
☔ The latest upstream changes (presumably #139758) made this pull request unmergeable. Please resolve the merge conflicts. |
616e19f to
bde2a86
Compare
|
☔ The latest upstream changes (presumably #148220) made this pull request unmergeable. Please resolve the merge conflicts. |
|
This no longer appears to be an improvement:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Encode dep graph edges directly from the previous graph when promoting This encodes dep graph edges directly from the previous graph when promoting nodes from a previous session, avoiding allocations / copies. ~~Based on rust-lang/rust#122064 and rust-lang/rust#116375 <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:unchanged</td><td align="right">0.4177s</td><td align="right">0.4072s</td><td align="right">💚 -2.52%</td></tr><tr><td>🟣 <b>hyper</b>:check:unchanged</td><td align="right">0.1430s</td><td align="right">0.1420s</td><td align="right"> -0.69%</td></tr><tr><td>🟣 <b>regex</b>:check:unchanged</td><td align="right">0.3106s</td><td align="right">0.3038s</td><td align="right">💚 -2.19%</td></tr><tr><td>🟣 <b>syn</b>:check:unchanged</td><td align="right">0.5823s</td><td align="right">0.5688s</td><td align="right">💚 -2.33%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:unchanged</td><td align="right">1.3992s</td><td align="right">1.3692s</td><td align="right">💚 -2.14%</td></tr><tr><td>Total</td><td align="right">2.8528s</td><td align="right">2.7910s</td><td align="right">💚 -2.17%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9803s</td><td align="right">💚 -1.97%</td></tr></table>
This changes
SerializedDepGraph.indexto be computed on-demand per dep kind. This means we can immediately start using queries without waiting for the entire index to be constructed. Additionally a background thread is started which computes the entire index, effectively off-loading most of the index construction to the background thread.r? @cjgillot