Skip to content

Use __builtin_ctz in bitmap_ffs#18

Merged
jserv merged 1 commit intomasterfrom
bitmap-ffs
Feb 9, 2026
Merged

Use __builtin_ctz in bitmap_ffs#18
jserv merged 1 commit intomasterfrom
bitmap-ffs

Conversation

@jserv
Copy link
Copy Markdown
Collaborator

@jserv jserv commented Feb 9, 2026

This replaces __builtin_ffs (1-based) minus 1 with __builtin_ctz (0-based), which maps directly to RBIT+CLZ on ARM Cortex-M4 without the extra SUB instruction that the ffs-then-subtract pattern requires.


Summary by cubic

Replace __builtin_ffs(x) - 1 with __builtin_ctz(x) in bitmap_ffs to return a 0-based index using fewer instructions on ARM Cortex-M4 (maps to RBIT+CLZ, avoids an extra SUB). Behavior is unchanged; we still assert when x == 0.

Written for commit 6df9064. Summary will update on new commits.

This replaces __builtin_ffs (1-based) minus 1 with __builtin_ctz
(0-based), which maps directly to RBIT+CLZ on ARM Cortex-M4 without the
extra SUB instruction that the ffs-then-subtract pattern requires.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 9, 2026

WCET Results (x86-64)

TLSF WCET Analysis
==================
Timer:      cycles
Cache:      hot
Pool:       4194304 bytes (4.0 MB)
Iterations: 5000 (warmup: 500)
Sizes:      16 64 256 1024 4096 bytes

--- malloc_worst (small alloc from single huge block) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         73         98        122        123        147      28444      105.6      401.0
      64         73         98         98        123        147        269       94.3       11.8
     256         73         98         98         98        147      40645      104.0      573.5
    1024         73         98         98         98        122        245       94.6        9.1
    4096         73         98         98        123        171        245       97.7       10.6

--- malloc_best (exact bin hit, no split) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         49         73         74         98         98         98       72.8        8.0
      64         49         74         74         98         98         98       75.4        7.6
     256         49         73         74         98         98        147       69.0       13.1
    1024         49         73         74         98         98        147       69.5       11.3
    4096         49         73         74         98         98        147       69.4       11.6

--- free_worst (sandwiched between two free blocks) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         49         74         98         98        123        221       78.2       10.4
      64         49         74         98         98        147        245       77.7       11.0
     256         49         74         98         98        196        245       77.7       11.8
    1024         49         74         98         98        147        220       75.8       10.2
    4096         49         74         98         98        147        245       75.1       10.5

--- free_best (no merge (used neighbors)) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         49         49         74         74        122        245       61.2       12.8
      64         49         73         74         74        123        245       63.0       12.7
     256         49         73         74         74         74        123       63.4       12.1
    1024         49         73         74         74         74        123       63.3       12.1
    4096         49         49         74         74        123      19134       64.4      270.1

--- worst/best ratio (p99) ---
    size     malloc       free
      16      1.26x      1.32x
      64      1.66x      1.32x
     256      1.32x      1.32x
    1024      1.66x      1.32x
    4096      1.26x      1.32x

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 9, 2026

WCET Results (arm64)

TLSF WCET Analysis
==================
Timer:      ticks
Cache:      hot
Pool:       4194304 bytes (4.0 MB)
Iterations: 5000 (warmup: 500)
Sizes:      16 64 256 1024 4096 bytes

--- malloc_worst (small alloc from single huge block) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         24         32         32         40         40         40       30.9        3.0
      64         24         32         32         32         40         48       30.8        3.1
     256         24         32         32         40         40        880       31.0       12.4
    1024         24         32         32         32         40         40       30.6        3.2
    4096         24         32         32         40         40       2752       31.5       38.6

--- malloc_best (exact bin hit, no split) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         16         24         32         32         32         32       25.1        2.8
      64         16         24         32         32         32         32       25.1        2.8
     256         16         24         32         32         32         40       25.0        2.6
    1024         16         24         32         32         32         32       24.9        2.6
    4096         16         24         32         32         32         32       24.8        2.5

--- free_worst (sandwiched between two free blocks) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         16         24         32         32         32         32       25.9        3.4
      64         24         24         32         32         32       1632       26.2       23.0
     256         16         24         32         32         32         32       26.1        3.5
    1024         16         24         32         32         32         32       26.0        3.5
    4096         16         24         32         32         32         32       26.0        3.5

--- free_best (no merge (used neighbors)) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         16         24         24         24         24         24       20.6        4.0
      64          8         24         24         24         24         32       20.5        4.0
     256         16         24         24         24         24       1728       21.0       24.5
    1024         16         24         24         24         24         32       20.6        4.0
    4096          8         24         24         24         24         32       20.5        4.0

--- worst/best ratio (p99) ---
    size     malloc       free
      16      1.25x      1.33x
      64      1.25x      1.33x
     256      1.25x      1.33x
    1024      1.00x      1.33x
    4096      1.25x      1.33x

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@jserv jserv merged commit 1cc2c49 into master Feb 9, 2026
10 checks passed
@jserv jserv deleted the bitmap-ffs branch February 9, 2026 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant