I am not sure how to prepare the input to say __builtin_amdgcn_mfma_i32_16x16x16i8.
The a and b expected to be int32. I packed the four int8 to an int32 as below
for (int i = 0; i < 4; ++i) { const int r_idx = thread_x * K + i + thread_y * 4; a |= (int32_t(src[r_idx]) << 8 * (3 - i)); }
In above, src is an array of int8's. And a is an int32.
calling the instruction, does not seem to be producing the expected results.
would any one please advise how to prepare the input data a and b in above instruction.
Or ideally add a test with in/int8 and out/int32 please.
Apologies if this is not the right place to ask for this. I was not allowed to add this in Discussion part.
Thanks,
I am not sure how to prepare the input to say
__builtin_amdgcn_mfma_i32_16x16x16i8.The
aandbexpected to beint32. I packed the four int8 to an int32 as belowfor (int i = 0; i < 4; ++i) { const int r_idx = thread_x * K + i + thread_y * 4; a |= (int32_t(src[r_idx]) << 8 * (3 - i)); }In above, src is an array of int8's. And
ais an int32.calling the instruction, does not seem to be producing the expected results.
would any one please advise how to prepare the input data
aandbin above instruction.Or ideally add a test with in/int8 and out/int32 please.
Apologies if this is not the right place to ask for this. I was not allowed to add this in Discussion part.
Thanks,