Skip to content

Conversation

@folkertdev
Copy link
Contributor

@folkertdev folkertdev commented Jan 17, 2026

tracking issue: rust-lang/rust#135681

Because qemu does not support these (yet), I haven't added any runtime tests

@rustbot
Copy link
Collaborator

rustbot commented Jan 17, 2026

r? @sayantn

rustbot has assigned @sayantn.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Copy link
Contributor Author

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// On processors implementing the IBM z16 architecture, only the value 0 is supported.
static_assert_uimm_bits!(B, 0);

vclfnls(a, B)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent to

https://godbolt.org/z/dGaf4P7sa

Clearly that optimizes horribly at the moment. If the const value being 0 does the obvious thing, I believe all of these could be implemented in terms of simpler simd primitives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if currently only 0 is supported, we can just use SIMD primitives, as the assertion will ensure no other value is passed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, though currently it seems unspecified what the conversion method is (I think there is only one implementation that actually makes sense, but then why is the IMM argument even there?).

Also currently the SIMD primitives don't optimize into the instruction that this intrinsic should map to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AI accelerator unit operates on its own private data types. In particular, it uses a 16-bit floating point type which is neither IEEE-16 nor bfloat16, but a proprietary format. In order to prepare input/output data to be used with the accelerator, applications need to convert standard (IEEE) data types to and from this private data type; for this purpose, the ISA provides these conversion instructions (mapped to compiler intrinsics).

In principle, the accelerator might support multiple different private data types, and the immediate operand of these intrinsics identifies which of those types the conversion should target. This is not specified by the ISA but may differ between processor generations. However, all current processors only support a single private data type, identified by the immediate value 0.

So in practice, the immediate will always be 0 today. I'm not convinced this ought to be enforced by the compiler - if a future processor adds a second type, it might be good if we could use the intrinsic without having to update the compiler.

Either way, whatever the immediate value is, there is no possibility to open-code the conversion with standard LLVM IR - the private floating-point format is unknown to LLVM! This absolutely has to map to the LLVM builtin (and thus the special instruction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I've changed the code to accept the full 0..=15 range there.

Because `qemu` does not support these (yet), I haven't added any runtime tests
@folkertdev
Copy link
Contributor Author

@uweigand does this look good now?

@uweigand
Copy link

@uweigand does this look good now?

Looks good to me know. There is one question remaining in my mind: for vec_convert_to_fp16 and vec_convert_from_fp16 only one side is the proprietary type (represented as vector_signed_short), while the other side is actually a vector of standard IEEE 16-bit floats. This is also represented as vector_signed_short here, which follows the precedent set by GCC and clang.

That precedent was created at a time when we did not have any _Float16 support in those compilers - but now we do. So in theory we could be more precise and use a proper type here. But I guess this would mean that we'd have to introduce a new vector type as well (vector_float16 ? vector_half ?) Given that we do not actually have any other instructions operating on that type, not even basic arithmetic, in current processors, I'm not sure this makes sense.

@folkertdev
Copy link
Contributor Author

I suspect these functions will continue to be unstable for a while, so we could change this later if vector_half actually gets more serious support.

@uweigand
Copy link

I suspect these functions will continue to be unstable for a while, so we could change this later if vector_half actually gets more serious support.

Sounds good to me, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants