Port to more current rust-nightly by apriori · Pull Request #98 · Rust-GPU/rust-cuda

apriori · 2022-12-19T18:56:39Z

also port from register_attr to register_tool (approach shamelessly taken from rust-gpu)
add_gpu example working
Deactivate warp submodule as it does trigger some codegen issues

breaking:

Inject "no_std" and "register_tool" crate flags via cuda_builder. The user has no longer to define the respective cfg_attr section in gpu code. Leaving them still in gpu code will result in a compile error from cargo.

to be further tested:

more complex examples/projects

…ust nightly - also port from register_attr to register_tool (approach shamelessly taken from rust-gpu)

apriori · 2022-12-20T00:17:32Z

It seems that e.g. cuda_std::println attempting to format a number is broken. If one changes the add_gpu to this:

extern crate alloc;

use cuda_std::prelude::*;

#[kernel]
#[allow(improper_ctypes_definitions, clippy::missing_safety_doc)]
pub unsafe fn add(a: &[f32], b: &[f32], c: *mut f32) {
    let idx = thread::index_1d() as usize;
    if idx < a.len() {
        let elem = &mut *c.add(idx);
        *elem = a[idx] + b[idx];

        if idx == 0 {
            cuda_std::println!("Elem 0: {}", *elem);
        }
    }
}

The resulting ptx will be invalid with

ptxas examples/cuda/resources/add.ptx, line 1370; error   : State space mismatch between instruction and address in instruction 'ld'
ptxas examples/cuda/resources/add.ptx, line 1399; error   : State space mismatch between instruction and address in instruction 'ld'

A offending ptx section looks like:

Top section, truncated:

.const .align 8 .u8 _ZN4core3fmt12USIZE_MARKER17h8e203fb7dfec90c9E[8] = {0XFF(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF0000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF0000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF000000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE), 0xFF00000000000000(_ZN4core3ops8function6FnOnce9call_once17h95dfe8b893b0399cE)};

$L__BB6_5:
	mov.u64 	%rd112, 0;
	ld.v2.u32 	{%r5, %r6}, [%rd108+32];
	ld.u8 	%rs3, [%rd108+40];
	st.local.u8 	[%rd7+8], %rs3;
	st.local.v2.u32 	[%rd7], {%r5, %r6};
	ld.u64 	%rd109, [%rd108+24];
	ld.u16 	%rs4, [%rd108+16];
	and.b16  	%rs2, %rs4, 3;
	setp.eq.s16 	%p6, %rs2, 2;
	mov.u64 	%rd110, %rd112;
	@%p6 bra 	$L__BB6_10;

	setp.ne.s16 	%p7, %rs2, 1;
	@%p7 bra 	$L__BB6_9;

	shl.b64 	%rd63, %rd109, 4;
	add.s64 	%rd64, %rd115, %rd63;
	add.s64 	%rd18, %rd64, 8;
	ld.u64 	%rd65, [_ZN4core3fmt12USIZE_MARKER17h8e203fb7dfec90c9E];
	ld.u64 	%rd66, [%rd64+8];
	setp.ne.s64 	%p8, %rd66, %rd65;
	mov.u64 	%rd110, %rd112;
	@%p8 bra 	$L__BB6_10;

	ld.u64 	%rd68, [%rd18+-8];
	ld.u64 	%rd109, [%rd68];
	mov.u64 	%rd110, 1;
	bra.uni 	$L__BB6_10;

Those both references are to the core::fmt::USIZE_MARKER constant. Not quite sure what is going on there.

Even though this error is there, a way more complex example code (not using cuda_std::println) is functional. So codegen is not "entirely" broken, but apparently something is odd with such cases of global, static constants.

thedodd · 2023-03-30T01:00:14Z

@apriori hello there. Just wanted to check-in and see if you've been having any success on this branch. I have a few open PRs, some of which I am actively using, and I'm thinking about rebasing them onto this branch in order to gain the update rustc benefits.

Think it is reasonable to rebase onto this branch?

thedodd · 2023-03-31T03:24:27Z

I am getting an an illegal memory access was encountered error where on master I am not getting the same error. I'll see if I can pin down the issue. This appears to be related to use of shared memory.

I moved back to master because I have a few parallel reduction algorithms that make heavy use of shared memory, and I don't want to take the time right now to debug the code gen issue :).

thedodd · 2023-03-31T14:52:44Z

TBH, I really hope that @RDambrosio016 (hope all is well) comes back some day. Having to move over to a C++ wrapper pattern, building lots of shared libraries, multi-stage nvcc build pipelines and such ... not fun.

This framework on the other hand already has a lot of work put into it, and keeping it up-to-date and moving forward is a huge boon to the community. I'm still holding out hope that it will be revitalized soon :).

apriori · 2023-04-01T11:12:19Z

TBH, I really hope that @RDambrosio016 (hope all is well) comes back some day. Having to move over to a C++ wrapper pattern, building lots of shared libraries, multi-stage nvcc build pipelines and such ... not fun.

This framework on the other hand already has a lot of work put into it, and keeping it up-to-date and moving forward is a huge boon to the community. I'm still holding out hope that it will be revitalized soon :).

I would wish the same, but so far it seems @RDambrosio016 lost interest/has no time anymore.
This port should be more considered a hack. I have little to no knowledge in the field and was merely attempting to port over (similar as @RDambrosio016 did it) by taking rustc_codegen_llvm as a template.

For me non-trivial programs were working as long as cudastd::println was not used. I do not recall whether they used shared memory. I think I did ... gotta recheck.

Anyway, some more work should happen on this, or this framework will loose connection to rustc development entirely - nor will it gain acceptance.
Unfortunately, using internal rustc libraries means a continous maintenance effort.

RDambrosio016 · 2023-04-04T01:12:58Z

Sorry, i've just been really busy with my degree and other things. I think being tied to a different codegen, and especially to libnvvm is not the way to go for the future. I think adding the required linking logic for nvptx in rustc is much easier and better. Im doing some experiments trying to do that.

thedodd · 2023-04-04T17:26:31Z

@RDambrosio016 nice! Hope all is going well with your studies.

apriori · 2023-04-08T10:46:17Z

@RDambrosio016 so you want to prefer using the already existing nvptx codegen backend of rustc?
I remember you mentioning it has inferior optimizations compared to libnvvm. Then the long-term approach would be to improve that upstream, right?

thedodd · 2023-04-13T15:33:21Z

BTW, something I've done to help mitigate the issue with having to use the older compiler version:

Obviously the PTX related code will stay in its own crate.
Only that crate uses the older toolchain.
Build the PTX via an example within that crate, instead of doing so as part of a build.rs within the larger project.

There are a few ways to optimize this. Doesn't need to be an example, there are other ways. Keeping it out of the build.rs of the larger project is a way to help ensure that the rust toolchain limitation doesn't spread.

dssgabriel · 2023-05-03T08:02:04Z

@apriori Hello there!

I would like to port Rust-CUDA to the latest libNVVM (version 2.0) that came out with CUDA 12.0 (see #100). Is this draft up to date with current nightly (1.71.0 as of writing)? I think it would be better to base ourselves on a more recent version of rustc if we aim at bringing the whole crate up to date.

Despite what @RDambrosio016 said a few weeks ago about abandoning the NVVM codegen and moving to what's already implemented in rustc, I think we are much better off with rustc-codegen-nvvm at the moment. I haven't been able to generate valid PTX using the nvptx64-cuda target implemented in rustc, even on very simple AXPY-like kernels. There doesn't seem to be any efforts for better support by the compiler either, despite nvptx being a Tier 2 target. Moreover, the better optimizations opportunities and the fact that NVIDIA will continue supporting libNVVM in the future make it much more appealing to stay on this codegen IMHO.
I think this is a great project and it would be a shame to throw all that hard work away.

I also heard that NVIDIA might be in the process of updating their tools to a much more recent LLVM version as even for them it's too difficult to rely on something as old as v7.0.1. This would probably simplify some of the logic implemented in rustc-codegen-nvvm but we shall see.
Finally, it seems that some of the guys working on the NVHPC toolkit at NVIDIA are also Rust enjoyers and they'd be willing to push things for Rust if NVIDIA gets enough demand for it. I would very much like to see Rust carve a bigger spot in the field of HPC and GPGPU computing and this project feels like the best place to do so!

RDambrosio016 · 2023-05-03T20:13:31Z

@dssgabriel interesting, what do you mean by invalid PTX? i was not able to build anything since it requires a custom linker (my proposal in rustc would put the linking logic in rustc) that doesnt work on windows. The LLVM PTX backend is mature enough that i would expect it to generate valid code unless rustc is emitting something very suspicious.

apriori · 2023-05-03T22:44:28Z

@dssgabriel

I would like to port Rust-CUDA to the latest libNVVM (version 2.0) that came out with CUDA 12.0 (see #100). Is this draft up to date with current nightly (1.71.0 as of writing)? I think it would be better to base ourselves on a more recent version of rustc if we aim at bringing the whole crate up to date.

Unfortunately no. rustc is a rapidly moving target. I once just checked a just slightly more recent nightly after 2022/12/10 and compilation failed. There is two approaches for this I would consider "valid":

a) Fix the issues in this MR and continue from there
b) Start over from current HEAD

One can and should use rustc_codegen_llvm as a template. But here and there more detailed knowledge about cuda PTX is required - some solutions I merely guessed and I bet I was wrong with that.

As far as I know though, libNVVM 2.0 is very different from prior versions. I think @RDambrosio016 can comment more on the feasability of this. I would also prefer to have these efforts more "upstream", but we are kind of lost if upstream rustc is not moving and/or improving with the PTX backend.

Despite what @RDambrosio016 said a few weeks ago about abandoning the NVVM codegen and moving to what's already implemented in rustc, I think we are much better off with rustc-codegen-nvvm at the moment. I haven't been able to generate valid PTX using the nvptx64-cuda target implemented in rustc, even on very simple AXPY-like kernels.

I cannot comment on this other than that I never really tried the official rustc ptx backend. Rust-cuda was simply the way more compelling and accessible solution. This is also due to @RDambrosio016 good documentation and immediately runnable examples, let alone all his hard work on building pretty much an ecosystem of libraries.

I think this is a great project and it would be a shame to throw all that hard work away.
It is amazing work by @RDambrosio016 indeed. Still, I would imagine you could rebase this ecosystem of libs, bindings and APIs on a different codegen.

I also heard that NVIDIA might be in the process of updating their tools to a much more recent LLVM version as even for them it's too difficult to rely on something as old as v7.0.1. This would probably simplify some of the logic implemented in rustc-codegen-nvvm but we shall see.

As the interfacing would still be via libNVVM I doubt that has any impact on general accessibility. Maybe developer install experience might improve a bit when not depending on ancient llvm versions, but that is pretty much about it.

Finally, it seems that some of the guys working on the NVHPC toolkit at NVIDIA are also Rust enjoyers and they'd be willing to push things for Rust if NVIDIA gets enough demand for it. I would very much like to see Rust carve a bigger spot in the field of HPC and GPGPU computing and this project feels like the best place to do so!

So far my experience also with rust-cuda was that single-source is a thing I would really love to see, but its hard with the rust compilation model I would imagine, especially with cfg_if. NVIDIA pools a lot of ressources into the CUDA ecosystem and they do an amazing job. I am not sure what their take would be on "rewriting quite an amount of it". See for example https://github.com/NVIDIA/cub, which is absolutely crucial if you do not want to reinvent the square wheel all the time when writing high performance custom kernels.
Still, even without it rust-cuda was for me a much better experience than plain Cuda and C++. Cuda is the best-in-class ecosystem when it comes to GPGPU, but still it feels decades behind general software tooling and language development. The match with C++ resulted in the worst possible compilation and tooling experience possible (compared to tooling in other languages) and there is absolutely nothing technological dictating this combination.
Also, when you look at e.g. the shuffle instructions, you see C-like imperative languages feel like a fundamental misfit to describe the underlying computation model.

David-OConnor · 2023-10-28T20:24:44Z

Is there anything I can do to help? Is it just an issue of putting some line of code in the right place? Can we write a new bindgen wrapper to get low-level accesss?

LegNeato · 2025-01-27T15:41:41Z

Hello! We are rebooting this project. Sorry for your PR not getting merged!

Is this still relevant?

dssgabriel · 2025-01-27T17:31:19Z

Hello! We are rebooting this project. Sorry for your PR not getting merged!

Is this still relevant?

Hello @LegNeato, thank you for the update!
Not sure if this particular PR is relevant anymore, lots of stuff has changed in rustc since then and it would probably easier to restart from scratch if you want to bring this project up-to-date.
I have a patch for Rust-CUDA to work with CUDA 12 (breaking changes w/ CUDA 11.8) but this project being dead for some time now, I never bothered opening a PR. You can have a look at it in case it helps here.

LegNeato · 2025-01-27T18:07:25Z

@dssgabriel Thanks for the info and the link to your code!

I was perusing these changes and a lot of changes are the same we had to do to forward port rust-gpu, so they don't look too scary. We'll probably hit a brick wall at the same place rust-gpu did (significant breaking changes to Rust's internal allocation model and
#[repr(simd)] were introduced)

LegNeato · 2025-01-27T18:08:04Z

@apriori are you still interested in working on this? I know it has been a long time :-)

apriori · 2025-01-29T11:01:09Z

@dssgabriel So you did even update libNVVM?
@LegNeato I guess starting over would be a more sane approach. Unfortunately I might not have the time for it.

dssgabriel · 2025-01-30T23:13:36Z

@apriori Yes, I did! It worked fine (with CUDA 12.1) for the time I spent playing with it and with the kernels I wrote. I haven't touched it since September 2023, though, so it might have broken with newer versions of CUDA 🤷‍♂️

LegNeato · 2025-01-31T00:49:14Z

@apriori ok, I can poke at it in the next couple of days if you don't get to it 🤙

juntyr · 2025-02-08T18:01:53Z

I would suggest that we first split this repo in two (or more). All crates that aren't part of the codegen or cuda-std (i.e. cust and co) and that could be compiled and tested on a recent nightly (or stable) should be moved. That way, work on these crates can happen faster without the work on the codegen blocking it

LegNeato · 2025-03-13T18:37:14Z

A version that used this as a reference landed here:

#155

Thanks @apriori ! I'm going to close this.

Superficial attempt to get rustc_codegen_nvvm working under current r…

9d821c6

…ust nightly - also port from register_attr to register_tool (approach shamelessly taken from rust-gpu)

apriori mentioned this pull request Dec 19, 2022

Requesting help for updating Rust-CUDA to newer rustc version. #95

Closed

apriori mentioned this pull request Feb 22, 2023

Transitive dependency not compileable with nightly-2021-12-04 #103

Closed

smallstepman mentioned this pull request Apr 23, 2023

Is this project abandoned? #106

Closed

zachs18 mentioned this pull request Dec 11, 2023

Forward std feature to some deps. dimforge/nalgebra#1321

Merged

This comment was marked as off-topic.

Sign in to view

LegNeato mentioned this pull request Jan 31, 2025

Rust-CUDA is being rebooted! #130

Open

jorge-ortega mentioned this pull request Mar 12, 2025

Update toolchain to nightly-03-02. #155

Merged

LegNeato closed this Mar 13, 2025

Conversation

apriori commented Dec 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apriori commented Dec 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thedodd commented Mar 30, 2023

Uh oh!

thedodd commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thedodd commented Mar 31, 2023

Uh oh!

apriori commented Apr 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RDambrosio016 commented Apr 4, 2023

Uh oh!

thedodd commented Apr 4, 2023

Uh oh!

apriori commented Apr 8, 2023

Uh oh!

thedodd commented Apr 13, 2023

Uh oh!

dssgabriel commented May 3, 2023

Uh oh!

RDambrosio016 commented May 3, 2023

Uh oh!

apriori commented May 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

David-OConnor commented Oct 28, 2023

Uh oh!

LegNeato commented Jan 27, 2025

Uh oh!

This comment was marked as off-topic.

dssgabriel commented Jan 27, 2025

Uh oh!

LegNeato commented Jan 27, 2025

Uh oh!

LegNeato commented Jan 27, 2025

Uh oh!

apriori commented Jan 29, 2025

Uh oh!

dssgabriel commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegNeato commented Jan 31, 2025

Uh oh!

juntyr commented Feb 8, 2025

Uh oh!

LegNeato commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

apriori commented Dec 19, 2022 •

edited

Loading

apriori commented Dec 20, 2022 •

edited

Loading

thedodd commented Mar 31, 2023 •

edited

Loading

apriori commented Apr 1, 2023 •

edited

Loading

apriori commented May 3, 2023 •

edited

Loading

dssgabriel commented Jan 30, 2025 •

edited

Loading