Implement avx512 compressstore intrinsics#1273
Conversation
crates/core_arch/src/x86/avx512f.rs
Outdated
| fn vcompresspd128(a: f64x2, src: f64x2, mask: u8) -> f64x2; | ||
|
|
||
| #[link_name = "llvm.x86.avx512.mask.compress.store.d.512"] | ||
| fn vcompressd_mem(mem: *mut i8, data: i32x16, mask: u16); |
There was a problem hiding this comment.
Is there a better naming convention for these intrinsics? The asm mnemonic is the same for mem and reg operations.
There was a problem hiding this comment.
The recommended way to figure this out is to look at what IR clang generates: https://godbolt.org/z/nvaxM4MGh
In this case it is calling the llvm.masked.compressstore.v2f64 intrinsic which unfortunately can't be called directly from Rust because it uses a i1 vector which can't be represented with Rust types.
This is the reason why #1254 implemented some of the AVX512 intrinsics using inline assembly instead. I think this is the right approach in this case as well.
There was a problem hiding this comment.
These llvm intrinsic seem to work though, and I saw them used with plain integer masks in llvm testcases: https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll#L8938 (not sure if there is an official documentation for them though). I wanted to avoid asm unless absolutely necessary.
Test failures in CI seem unrelated to the changes in this PR.
crates/core_arch/src/x86/avx512f.rs
Outdated
| #[inline] | ||
| #[target_feature(enable = "avx512f")] | ||
| #[cfg_attr(test, assert_instr(vpcompressd))] | ||
| pub unsafe fn _mm512_mask_compressstoreu_epi32(base_addr: *mut i8, k: __mmask16, a: __m512i) { |
There was a problem hiding this comment.
Intel's intrinsic guide uses a void* for base_addr, the llvm intrinsics use an i8*. Using a ptr of the correct datatype would be more ergonomic, but I'm not sure whether that might prevent using the intrinsics for actually unaligned data.
There was a problem hiding this comment.
The convention here is to use *mut u8 where C uses void pointers. LLVM's i8 doesn't mean anything since LLVM IR types don't have signs: LLVM's i8 is used for both of Rust's u8 and i8.
There was a problem hiding this comment.
I missed that convention when implementing the masked load/store instructions, there are also several more intrinsics that did not already follow this convention. I can take a look at adjusting the stdarch-verify test to catch this and change any existing type differences. Since avx512 is still unstable that should be possible I guess.
Implement avx512f compressstore intrinsics.