libm: constant time fpclassifyf by xinitrcn1 · Pull Request #1784 · managarm/mlibc

xinitrcn1 · 2026-05-13T07:57:34Z

Not tested with real world data.

tested over entire 2^32 range and it matches the original.

of course branchless != good

gcc 16.1

"__fpclassifyf_old":
        vmovd   edx, xmm0
        mov     eax, edx
        shr     eax, 23
        and     eax, 255
        je      .L8
        mov     ecx, 4
        cmp     eax, 255
        je      .L9
        mov     eax, ecx
        ret
.L9:
        sal     edx, 9
        mov     ecx, 1
        cmp     edx, 1
        sbb     ecx, -1
        mov     eax, ecx
        ret
.L8:
        add     edx, edx
        cmp     edx, 1
        sbb     ecx, ecx
        and     ecx, 8
        add     ecx, 8
        mov     eax, ecx
        ret

"__fpclassifyf_new":
        vmovd   ecx, xmm0
        mov     edx, 3
        mov     eax, ecx
        shr     eax, 23
        inc     eax
        cmp     al, dl
        cmovbe  edx, eax
        and     ecx, 8388607
        setne   al
        sal     eax, 2
        or      eax, edx
        mov     edx, 573645376
        movzx   eax, al
        sal     eax, 2
        sarx    edx, edx, eax
        mov     eax, 1
        and     edx, 15
        shlx    eax, eax, edx
        ret

clangarm

__fpclassifyf_old:
        fmov    w1, s0
        ubfx    x2, x1, 23, 8
        cbz     w2, .L8
        mov     w0, 4
        cmp     w2, 255
        beq     .L9
        ret
.L9:
        cmp     wzr, w1, lsl 9
        cset    w0, ne
        add     w0, w0, 1
        ret
.L8:
        cmp     wzr, w1, lsl 1
        mov     w0, 16
        mov     w1, 8
        csel    w0, w0, w1, eq
        ret

__fpclassifyf_new:
        fmov    w2, s0
        mov     w3, 1
        mov     w4, 3
        mov     w0, 8768
        movk    w0, 0x2231, lsl 16
        add     w1, w3, w2, lsr 23
        and     w1, w1, 255
        cmp     w1, 3
        csel    w1, w1, w4, ls
        tst     x2, 8388607
        cset    w2, ne
        orr     w1, w1, w2, lsl 2
        ubfiz   w1, w1, 2, 8
        asr     w0, w0, w1
        and     w0, w0, 15
        lsl     w0, w3, w0
        ret

xinitrcn1 · 2026-05-13T08:00:50Z

addendum:

isfinite(x)

can just be

INFINITY > fabs(x)

and

bool isfinite(float x) {
  union {float f; uint32_t i;} u = {x};
  u.i &= 0x7fffffff;
  return INFINITY > u.f;
}

should also dis-require the use of fpclassify() in the respective macro, unknown if x87 or m68k is cared about

no92 · 2026-05-13T11:13:50Z

I'm currently thinking about how to integrate this; a couple of preliminary thoughts:

this function, much like most others in our libm, are imported from musl - if we add our own implementation, it would be best to move it outside of the musl-generic-math subdirectory.
The algorithm is quite clever, but could probably use some explanation as to what is going on. A few comments would probably suffice.
This looks like it could be generalized to also cover the double and even long double case using a templated function, but that'd require selecting a trait based on the type + width (for long double). Something like template <std::floating_point T, int MantDigits = std::numeric_limits<T>::digits> (untested)
I benchmarked this on my machine using different -march settings, and the (small) performance benefit of this requires x86-64-v3. Maybe this should be selected using Function multiversioning using attributes like __attribute__((target("avx,bmi2")))?
The lookup table should preferably be generated using a consteval function instead of hardcoding it - not only is that less brittle, but also better documentation for how it works :^)

xinitrcn1 · 2026-05-13T16:33:30Z

the algorithm:

have c be composed of bits "mii" (1 m bit, 2 i bits)

if exponent = 0x00 -> ii = 1
if exponent = 0xff -> ii = 0
accomplished via ii + 1 = [0x00 + 1 = 0x01, 0xff + 1 = 0x00]
if neither 0x00 or 0xff -> ii = 2 (this usually translates into cmovbe)

insert m bit simply checking that mantissa != 0

shift [by nibbles] a magic constant where each of the nibbles correspond to log2() of the corresponding FP classify define macro

issue is making this with log2() automagically with macros is quite painful, the easiest I could see is it being

#define X(x) ((x)==16?4:(x)==8?3:(x)==4?2:(x)==2?1:(x)==1?0:0)

then just construct by

    return 1 << ((0x22312240 >> (c * 4)) & 0xf);

#define X(x, s) ((((x)==16?4:(x)==8?3:(x)==4?2:(x)==2?1:(x)==1?0:0)) << (s*4))
    return 1 << (((X(1,0) | X(16,1) | X(4,2) | X(4,3) | X(2,4) | X(8, 5) | X(4,6) | X(4,7)) >> (c * 4)) & 0xf);
#undef x

I considered for double and long double but I worried if ia64 80-bit long double was to be accounted for

no92 · 2026-05-13T16:45:18Z

I get how it works, it's just that it would be best to explain it in-band with comments in the code for posterity.

What I'm wondering is if this is worth the hassle to integrate - the improvement over the existing version is not that large (~20 % on my machine with appropriate -march), but on pre-x86_64-v3 it is more than 100 % slower. Switching between implementations would be possible with FMVs or ifuncs, but is the maintenance cost worth that?

no92 · 2026-05-13T16:49:51Z

I have typed out an untested generic version of this earlier today: https://gist.github.com/no92/42903883ff18fa8ec92b7c1d643192d4

xinitrcn1 · 2026-05-13T17:05:27Z

there is extra uneeded movzx due to type widening on that implementation
additionally the arithmethic shift at the end isnt used, in this case shouldn't be an issue
either way

template <std::floating_point T>
constexpr int branchless_fpclassify(T x) {
	using traits = detail::fp_traits<T>;
	using U = typename traits::uint_type;
	U u = std::bit_cast<U>(x);
	uint8_t exp = (u >> traits::exp_shift) & traits::exp_mask;
	uint8_t c = (exp + 1) & traits::exp_mask;
	c = (c > 2) ? 2 : c;
	c |= ((u & traits::frac_mask) != 0) << 2;
	constexpr int32_t magic_table = detail::generate_fpclassify_table();
	return 1 << ((magic_table >> uint8_t(c * 4)) & 0xF);
}

would be a more faithful generic func

of course the question of "is this worth integrating for" is uh....

xinitrcn1 · 2026-05-13T17:21:06Z

Be of note that if FP_CLASSIFY constants were just 0,1,2,3 it would be far cheaper to do the last part (which mind you takes a lot of instructions to pull off)

xinitrcn1 · 2026-05-13T17:24:59Z

old
https://godbolt.org/z/Kqfx37qjs

new
https://godbolt.org/z/eh4fT3o3T

no92 · 2026-05-13T17:28:38Z

Mind you that FP_ZERO does not fit in 4 bit. Changing FP_* macro values would be an ABI break.

libm: constant time fpclassifyf

3ca1a23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

libm: constant time fpclassifyf#1784

libm: constant time fpclassifyf#1784
xinitrcn1 wants to merge 1 commit into
managarm:masterfrom
xinitrcn1:patch-1

xinitrcn1 commented May 13, 2026 •

edited

Loading

Uh oh!

xinitrcn1 commented May 13, 2026 •

edited

Loading

Uh oh!

no92 commented May 13, 2026

Uh oh!

xinitrcn1 commented May 13, 2026 •

edited

Loading

Uh oh!

no92 commented May 13, 2026

Uh oh!

no92 commented May 13, 2026

Uh oh!

xinitrcn1 commented May 13, 2026

Uh oh!

xinitrcn1 commented May 13, 2026

Uh oh!

xinitrcn1 commented May 13, 2026

Uh oh!

no92 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

xinitrcn1 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xinitrcn1 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

no92 commented May 13, 2026

Uh oh!

xinitrcn1 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

no92 commented May 13, 2026

Uh oh!

no92 commented May 13, 2026

Uh oh!

xinitrcn1 commented May 13, 2026

Uh oh!

xinitrcn1 commented May 13, 2026

Uh oh!

xinitrcn1 commented May 13, 2026

Uh oh!

no92 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xinitrcn1 commented May 13, 2026 •

edited

Loading

xinitrcn1 commented May 13, 2026 •

edited

Loading

xinitrcn1 commented May 13, 2026 •

edited

Loading