Conversation
|
When you have something that builds, please let me and @WorksOnArm know - would love to provide test cycles and diverse hardware to check out performance on. |
|
Thank you so much :D that's awesome! |
|
Nooo! it seems like we're going to be blocked on missing intrinsics :( https://doc.rust-lang.org/core/arch/aarch64/index.html / rust-lang/stdarch#148 |
|
@Licenser Do you have an inventory yet of intrinsics that you need / intrinsics that are missing? Reading the linked issue, sounds like there's slow progress. |
|
Ah yes I made a list and then posted it to the wrong ticket ... silly me ... Those are the intrinsics I found in @lemire's arm64 implementation
(there is a full list of missing instructions on the rust ticket - those are the required ones for porting simdjson.rs) |
|
@Licenser alas! Would you be open to the possibility of a PR that uses assembly macros in the meantime? Maybe it won't be that far off from the intrinsic version... |
|
Absolutely, I also gave you contributor permission so no or required;) I might take a look on the weekend to see what is required to get the intrinsics at least into nighly |
|
@Licenser that's awesome - I'll take a pass at defining some intrinsics in 'src/neon/intrinsics.rs', and we can compare notes as you work with nightly! |
|
I started working on a pull request: rust-lang/stdarch#792 |
|
We just published simdjson 0.2.0 with NEON support... |
|
Huzza! |
* feat: neon support * feat: temp stub replacements for neon intrinsics (pending rust-lang/stdarch#792) * fix: drone CI rustup nightly * feat: fix guards, use rust stdlib for bit count operations * fix: remove double semicolon * feat: fancy generic generator functions, thanks @Licenser
|
OMG OMG OMG! this is great! :D |
|
@Licenser are you thinking we might be able to merge this today and then have a subsequent PR to delete the intrinsics once everything's available in nightly? Thank you again for all your help. PS the new UTF8 tests look great! |
|
I'd rather not, I could see that in resulting in some headache downstream if the intrinsics make it in and that'd be very, very, very hacky for a crate. That said brave people ca already use it as a git dependency by pointing to the git branch. |
|
Ah, that makes sense. What is taking you so long?!! ;) |
|
Maybe I found something? Let me know what you think... https://godbolt.org/z/36hnUE #[cfg(target_arch = "aarch64")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
#[rustc_args_required_const(1)]
pub unsafe fn vget_lane_u8(a: uint8x8_t, n: u32) -> u8 {
if n < 0 || n > 7 {
unreachable_unchecked();
};
match n {
0 => a.0,
1 => a.1,
2 => a.2,
3 => a.3,
4 => a.4,
5 => a.5,
6 => a.6,
7 => a.7,
_ => unreachable_unchecked()
}
}(Also clang: https://clang.godbolt.org/z/TpqJIp) |
|
@Licenser I think your vld1q is all set, since the intrinsic turns into ldr anyway? |
|
Oh that's a very good catch! then the ld1 commands are indeed done :D for |
|
This leaves only those two functions: // uint64_t vget_lane_u64 (uint64x1_t v, const int lane)
arm_vget_lane!(vget_lane_u64, uint64x1_t, u64, 0);
#[simd_test(enable = "neon")]
unsafe fn test_vget_lane_u64() {
let v = i64x1::new(1);
let lane = 0;
let r = vget_lane_u64(transmute(v), lane);
assert_eq!(r, 1);
}
// uint32_t vgetq_lane_u32 (uint32x4_t v, const int lane)
arm_vget_lane!(vgetq_lane_u32, uint32x4_t, u32, 3);
#[simd_test(enable = "neon")]
unsafe fn test_vgetq_lane_u32() {
let v = i32x4::new(1, 2, 3, 4);
let lane = 1;
let r = vgetq_lane_u32(transmute(v), lane);
assert_eq!(r, 2);
} |
|
|
|
@Licenser that's awesome work... very nice! I think some of the "ldr" confusion is because the operands are Does this look good? Let me know what you think! All the best, -Sunny |
* Use simd-lite * Update badge * Update badge * Get rid of transmutes * Use NeonInit trait * vqsubq_u8 fix * vqsubq_u8 fix pt. 2 * use reexprted values from simd-lite



No description provided.