Rollup of 9 pull requests#155796
Conversation
Workgroup memory is a memory region that is shared between all threads in a workgroup on GPUs. Workgroup memory can be allocated statically or after compilation, when launching a gpu-kernel. The intrinsic added here returns the pointer to the memory that is allocated at launch-time. # Interface With this change, workgroup memory can be accessed in Rust by calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T` intrinsic. It returns the pointer to workgroup memory guaranteeing that it is aligned to at least the alignment of `T`. The pointer is dereferencable for the size specified when launching the current gpu-kernel (which may be the size of `T` but can also be larger or smaller or zero). All calls to this intrinsic return a pointer to the same address. See the intrinsic documentation for more details. ## Alternative Interfaces It was also considered to expose dynamic workgroup memory as extern static variables in Rust, like they are represented in LLVM IR. However, due to the pointer not being guaranteed to be dereferencable (that depends on the allocated size at runtime), such a global must be zero-sized, which makes global variables a bad fit. # Implementation Details Workgroup memory in amdgpu and nvptx lives in address space 3. Workgroup memory from a launch is implemented by creating an external global variable in address space 3. The global is declared with size 0, as the actual size is only known at runtime. It is defined behavior in LLVM to access an external global outside the defined size. There is no similar way to get the allocated size of launch-sized workgroup memory on amdgpu an nvptx, so users have to pass this out-of-band or rely on target specific ways for now.
…hanges The test `tests/debuginfo/basic-stepping.rs` has a history of regressing for subtle reasons, and has non-obvious expectations. So I'd like to keep an extra eye on it.
…useZ4,Sa4dus,workingjubilee,RalfJung,nikic,kjetilkjeka,kulst Add intrinsic for launch-sized workgroup memory on GPUs Workgroup memory is a memory region that is shared between all threads in a workgroup on GPUs. Workgroup memory can be allocated statically or after compilation, when launching a gpu-kernel. The intrinsic added here returns the pointer to the memory that is allocated at launch-time. # Interface With this change, workgroup memory can be accessed in Rust by calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T` intrinsic. It returns the pointer to workgroup memory guaranteeing that it is aligned to at least the alignment of `T`. The pointer is dereferencable for the size specified when launching the current gpu-kernel (which may be the size of `T` but can also be larger or smaller or zero). All calls to this intrinsic return a pointer to the same address. See the intrinsic documentation for more details. ## Alternative Interfaces It was also considered to expose dynamic workgroup memory as extern static variables in Rust, like they are represented in LLVM IR. However, due to the pointer not being guaranteed to be dereferencable (that depends on the allocated size at runtime), such a global must be zero-sized, which makes global variables a bad fit. # Implementation Details Workgroup memory in amdgpu and nvptx lives in address space 3. Workgroup memory from a launch is implemented by creating an external global variable in address space 3. The global is declared with size 0, as the actual size is only known at runtime. It is defined behavior in LLVM to access an external global outside the defined size. There is no similar way to get the allocated size of launch-sized workgroup memory on amdgpu an nvptx, so users have to pass this out-of-band or rely on target specific ways for now. Tracking issue: rust-lang#135516
…ttr-span, r=JonathanBrouwer Fix ICE from cfg_attr_trace Fixes rust-lang#154801 Fixes rust-lang#143094 r? @JonathanBrouwer The root cause is we recovery from parsing attribute error here: https://github.com/rust-lang/rust/blob/ed6f9af7d47f5a5eda2a4a1925d1e250b51a37f2/compiler/rustc_attr_parsing/src/parser.rs#L550 while the later suggestion code from type checking try to inspect the attr span of the `expr` in the second error, keep the span seems reasonable.
…, r=JonathanBrouwer Error on invalid macho section specifier The macho section specifier used by `#[link_section = "..."]` is more strict than e.g. the one for elf. LLVM will error when you get it wrong, which is easy to do if you're used to elf. So, provide some guidance for the simplest mistakes, based on the LLVM validation. Currently compilation fails with an LLVM error, see https://godbolt.org/z/WoE8EdK1K. The LLVM validation logic is at https://github.com/llvm/llvm-project/blob/a0f0d6342e0cd75b7f41e0e6aae0944393b68a62/llvm/lib/MC/MCSectionMachO.cpp#L199-L203 LLVM validates the other components of the section specifier too, but it feels a bit fragile to duplicate those checks. If you get that far, hopefully the LLVM errors will be sufficient to get unstuck. --- sidequest from rust-lang#147811 r? JonathanBrouwer specifically, is this the right place for this sort of validation? `rustc_attr_parsing` also does some validation.
…laumeGomez Add an edge-case test for `--remap-path-prefix` for `rustc` & `rustdoc` Intended to resolve @lolbinarycat concern rust-lang#155307 (comment)
cleanup, restructure and merge `tests/ui/deriving` into `tests/ui/derives` As a followup to rust-lang#155615, this PR deletes some outdated tests from these directories, splits up `ui/derives` into smaller directories to roughly group tests by the derive macros they use and moves over all tests from `ui/deriving` into `ui/derives`. r? @Kivooeo
…uct, r=fee1-dead Reject implementing const Drop for types that are not const `Destruct` already fixes rust-lang#155618 While there is no soundness or otherwise issue currently, this PR ensures that people get what they expect. It seems wrong to allow implementing `const Drop`, but then the type still can't be dropped at compile-time. r? @fee1-dead
…thanBrouwer Add a higher-level API for parsing attributes
triagebot.toml: Ping Enselic when tests/debuginfo/basic-stepping.rs changes The test `tests/debuginfo/basic-stepping.rs` has a history of [regressing for subtle reasons](rust-lang#33013 (comment)) ([retroactively](rust-lang#144497)), and has [expected behavior](rust-lang#153941) that is [not obvious](rust-lang#155377). So I'd like to keep an extra eye one it.
…ggestions, r=JonathanBrouwer Do not suggest internal cfg trace attributes Fixes rust-lang#150566.
|
@bors r+ rollup=never p=5 |
This comment has been minimized.
This comment has been minimized.
|
📌 Perf builds for each rolled up PR:
previous master: 9838411cb7 In the case of a perf regression, run the following command for each PR you suspect might be the cause: |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 9838411 (parent) -> 68ffae4 (this PR) Test differencesShow 313 test diffsStage 1
Stage 2
(and 167 additional test diffs) Additionally, 46 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 68ffae46b581747074413e4242ab0c6ecc4db4a9 --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
A job failed! Check out the build log: (web) (plain enhanced) (plain) Click to see the possible cause of the failure (guessed by this bot) |
|
Finished benchmarking commit (68ffae4): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary 2.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 6.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 486.61s -> 488.457s (0.38%) |
Successful merges:
--remap-path-prefixforrustc&rustdoc#155485 (Add an edge-case test for--remap-path-prefixforrustc&rustdoc)tests/ui/derivingintotests/ui/derives#155659 (cleanup, restructure and mergetests/ui/derivingintotests/ui/derives)Destructalready #155676 ( Reject implementing const Drop for types that are not constDestructalready)r? @ghost
Create a similar rollup