Conversation
…ust nightly - also port from register_attr to register_tool (approach shamelessly taken from rust-gpu)
|
It seems that e.g. extern crate alloc;
use cuda_std::prelude::*;
#[kernel]
#[allow(improper_ctypes_definitions, clippy::missing_safety_doc)]
pub unsafe fn add(a: &[f32], b: &[f32], c: *mut f32) {
let idx = thread::index_1d() as usize;
if idx < a.len() {
let elem = &mut *c.add(idx);
*elem = a[idx] + b[idx];
if idx == 0 {
cuda_std::println!("Elem 0: {}", *elem);
}
}
}The resulting ptx will be invalid with A offending ptx section looks like: Top section, truncated: .const .align 8 .u8 _ZN4core3fmt12USIZE_MARKER17h8e203fb7dfec90c9E[8] = {0XFF(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF0000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF0000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF000000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00000000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE)};$L__BB6_5:
mov.u64 %rd112, 0;
ld.v2.u32 {%r5, %r6}, [%rd108+32];
ld.u8 %rs3, [%rd108+40];
st.local.u8 [%rd7+8], %rs3;
st.local.v2.u32 [%rd7], {%r5, %r6};
ld.u64 %rd109, [%rd108+24];
ld.u16 %rs4, [%rd108+16];
and.b16 %rs2, %rs4, 3;
setp.eq.s16 %p6, %rs2, 2;
mov.u64 %rd110, %rd112;
@%p6 bra $L__BB6_10;
setp.ne.s16 %p7, %rs2, 1;
@%p7 bra $L__BB6_9;
shl.b64 %rd63, %rd109, 4;
add.s64 %rd64, %rd115, %rd63;
add.s64 %rd18, %rd64, 8;
ld.u64 %rd65, [_ZN4core3fmt12USIZE_MARKER17h8e203fb7dfec90c9E];
ld.u64 %rd66, [%rd64+8];
setp.ne.s64 %p8, %rd66, %rd65;
mov.u64 %rd110, %rd112;
@%p8 bra $L__BB6_10;
ld.u64 %rd68, [%rd18+-8];
ld.u64 %rd109, [%rd68];
mov.u64 %rd110, 1;
bra.uni $L__BB6_10;Those both references are to the Even though this error is there, a way more complex example code (not using |
|
@apriori hello there. Just wanted to check-in and see if you've been having any success on this branch. I have a few open PRs, some of which I am actively using, and I'm thinking about rebasing them onto this branch in order to gain the update rustc benefits. Think it is reasonable to rebase onto this branch? |
|
I am getting an I moved back to master because I have a few parallel reduction algorithms that make heavy use of shared memory, and I don't want to take the time right now to debug the code gen issue |
|
TBH, I really hope that @RDambrosio016 (hope all is well) comes back some day. Having to move over to a C++ wrapper pattern, building lots of shared libraries, multi-stage nvcc build pipelines and such ... not fun. This framework on the other hand already has a lot of work put into it, and keeping it up-to-date and moving forward is a huge boon to the community. I'm still holding out hope that it will be revitalized soon |
I would wish the same, but so far it seems @RDambrosio016 lost interest/has no time anymore. For me non-trivial programs were working as long as Anyway, some more work should happen on this, or this framework will loose connection to rustc development entirely - nor will it gain acceptance. |
|
Sorry, i've just been really busy with my degree and other things. I think being tied to a different codegen, and especially to libnvvm is not the way to go for the future. I think adding the required linking logic for nvptx in rustc is much easier and better. Im doing some experiments trying to do that. |
|
@RDambrosio016 nice! Hope all is going well with your studies. |
|
@RDambrosio016 so you want to prefer using the already existing nvptx codegen backend of rustc? |
|
BTW, something I've done to help mitigate the issue with having to use the older compiler version:
There are a few ways to optimize this. Doesn't need to be an example, there are other ways. Keeping it out of the build.rs of the larger project is a way to help ensure that the rust toolchain limitation doesn't spread. |
|
@apriori Hello there! I would like to port Despite what @RDambrosio016 said a few weeks ago about abandoning the NVVM codegen and moving to what's already implemented in I also heard that NVIDIA might be in the process of updating their tools to a much more recent LLVM version as even for them it's too difficult to rely on something as old as v7.0.1. This would probably simplify some of the logic implemented in |
|
@dssgabriel interesting, what do you mean by invalid PTX? i was not able to build anything since it requires a custom linker (my proposal in rustc would put the linking logic in rustc) that doesnt work on windows. The LLVM PTX backend is mature enough that i would expect it to generate valid code unless rustc is emitting something very suspicious. |
Unfortunately no. rustc is a rapidly moving target. I once just checked a just slightly more recent nightly after 2022/12/10 and compilation failed. There is two approaches for this I would consider "valid": a) Fix the issues in this MR and continue from there One can and should use rustc_codegen_llvm as a template. But here and there more detailed knowledge about cuda PTX is required - some solutions I merely guessed and I bet I was wrong with that. As far as I know though, libNVVM 2.0 is very different from prior versions. I think @RDambrosio016 can comment more on the feasability of this. I would also prefer to have these efforts more "upstream", but we are kind of lost if upstream rustc is not moving and/or improving with the PTX backend.
I cannot comment on this other than that I never really tried the official rustc ptx backend. Rust-cuda was simply the way more compelling and accessible solution. This is also due to @RDambrosio016 good documentation and immediately runnable examples, let alone all his hard work on building pretty much an ecosystem of libraries.
As the interfacing would still be via libNVVM I doubt that has any impact on general accessibility. Maybe developer install experience might improve a bit when not depending on ancient llvm versions, but that is pretty much about it.
So far my experience also with rust-cuda was that single-source is a thing I would really love to see, but its hard with the rust compilation model I would imagine, especially with |
|
Is there anything I can do to help? Is it just an issue of putting some line of code in the right place? Can we write a new bindgen wrapper to get low-level accesss? |
|
Hello! We are rebooting this project. Sorry for your PR not getting merged! Is this still relevant? |
This comment was marked as off-topic.
This comment was marked as off-topic.
Hello @LegNeato, thank you for the update! |
|
@dssgabriel Thanks for the info and the link to your code! I was perusing these changes and a lot of changes are the same we had to do to forward port rust-gpu, so they don't look too scary. We'll probably hit a brick wall at the same place rust-gpu did (significant breaking changes to Rust's internal allocation model and |
|
@apriori are you still interested in working on this? I know it has been a long time :-) |
|
@dssgabriel So you did even update libNVVM? |
|
@apriori Yes, I did! It worked fine (with CUDA 12.1) for the time I spent playing with it and with the kernels I wrote. I haven't touched it since September 2023, though, so it might have broken with newer versions of CUDA 🤷♂️ |
|
@apriori ok, I can poke at it in the next couple of days if you don't get to it 🤙 |
|
I would suggest that we first split this repo in two (or more). All crates that aren't part of the codegen or cuda-std (i.e. cust and co) and that could be compiled and tested on a recent nightly (or stable) should be moved. That way, work on these crates can happen faster without the work on the codegen blocking it |
add_gpuexample workingbreaking:
cuda_builder. The user has no longer to define the respectivecfg_attrsection in gpu code. Leaving them still in gpu code will result in a compile error from cargo.to be further tested: