libm: constant time fpclassifyf#1784
Conversation
|
addendum: isfinite(x)can just be INFINITY > fabs(x)and bool isfinite(float x) {
union {float f; uint32_t i;} u = {x};
u.i &= 0x7fffffff;
return INFINITY > u.f;
}should also dis-require the use of |
|
I'm currently thinking about how to integrate this; a couple of preliminary thoughts:
|
|
the algorithm: have c be composed of bits "mii" (1 m bit, 2 i bits) if exponent = 0x00 -> ii = 1 insert m bit simply checking that mantissa != 0 shift [by nibbles] a magic constant where each of the nibbles correspond to log2() of the corresponding FP classify define macro issue is making this with log2() automagically with macros is quite painful, the easiest I could see is it being #define X(x) ((x)==16?4:(x)==8?3:(x)==4?2:(x)==2?1:(x)==1?0:0)then just construct by return 1 << ((0x22312240 >> (c * 4)) & 0xf);
#define X(x, s) ((((x)==16?4:(x)==8?3:(x)==4?2:(x)==2?1:(x)==1?0:0)) << (s*4))
return 1 << (((X(1,0) | X(16,1) | X(4,2) | X(4,3) | X(2,4) | X(8, 5) | X(4,6) | X(4,7)) >> (c * 4)) & 0xf);
#undef xI considered for double and long double but I worried if ia64 80-bit long double was to be accounted for |
|
I get how it works, it's just that it would be best to explain it in-band with comments in the code for posterity. What I'm wondering is if this is worth the hassle to integrate - the improvement over the existing version is not that large (~20 % on my machine with appropriate |
|
I have typed out an untested generic version of this earlier today: https://gist.github.com/no92/42903883ff18fa8ec92b7c1d643192d4 |
|
there is extra uneeded movzx due to type widening on that implementation template <std::floating_point T>
constexpr int branchless_fpclassify(T x) {
using traits = detail::fp_traits<T>;
using U = typename traits::uint_type;
U u = std::bit_cast<U>(x);
uint8_t exp = (u >> traits::exp_shift) & traits::exp_mask;
uint8_t c = (exp + 1) & traits::exp_mask;
c = (c > 2) ? 2 : c;
c |= ((u & traits::frac_mask) != 0) << 2;
constexpr int32_t magic_table = detail::generate_fpclassify_table();
return 1 << ((magic_table >> uint8_t(c * 4)) & 0xF);
}would be a more faithful generic func of course the question of "is this worth integrating for" is uh.... |
|
Be of note that if FP_CLASSIFY constants were just 0,1,2,3 it would be far cheaper to do the last part (which mind you takes a lot of instructions to pull off) |
|
Mind you that |
Not tested with real world data.
tested over entire 2^32 range and it matches the original.
of course branchless != good
gcc 16.1
clangarm