Skip to content

Reduce tlsf_check stack usage from 16 KB to O(1)#14

Merged
jserv merged 1 commit intomasterfrom
stack-usage
Feb 8, 2026
Merged

Reduce tlsf_check stack usage from 16 KB to O(1)#14
jserv merged 1 commit intomasterfrom
stack-usage

Conversation

@jserv
Copy link
Copy Markdown
Collaborator

@jserv jserv commented Feb 8, 2026

This replaces the 2048-entry hash table in check_no_duplicates (16 KB on 64-bit) with Floyd's tortoise-and-hare cycle detection inlined into the Phase 2 free-list walk. A duplicate block in a singly-linked free list necessarily creates a cycle, which Floyd's algorithm detects in O(n) time with O(1) space.

Cross-bin duplicates cannot escape detection because Phase 2 already validates mapping(block_size(block)) == (fl, sl) for every entry; block in the wrong bin fails that check before any cycle could form across bins.

It removes the separate block_in_free_list lookup from Phase 1 (the physical block walk), eliminating an O(n^2) worst-case path.

  • Stack: ~16 KB → 8 bytes (one pointer per bin walk)
  • Time: O(n²) worst-case → O(n) total across all phases
  • Coverage: no silent skip above 1433 free blocks (old 70% load cap)
  • Termination: Phase 2 can no longer hang on corrupted cyclic lists

Summary by cubic

Replace the 16 KB hash-table duplicate check in tlsf_check with O(1) Floyd cycle detection during the free-list walk. This cuts stack usage to one pointer per bin (~8 bytes), removes an O(n²) path, and keeps duplicate detection reliable without skips or hangs.

  • Refactors

    • Inline tortoise-and-hare cycle detection in Phase 2 free-list walk.
    • Remove check_no_duplicates and Phase 1 block_in_free_list scan.
  • Bug Fixes

    • No silent skip above ~1433 free blocks; all duplicates are checked.
    • Prevents hangs on corrupted cyclic free lists; cross-bin duplicates still fail fl/sl mapping validation.

Written for commit be1e0af. Summary will update on new commits.

This replaces the 2048-entry hash table in check_no_duplicates (16 KB on
64-bit) with Floyd's tortoise-and-hare cycle detection inlined into the
Phase 2 free-list walk.  A duplicate block in a singly-linked free list
necessarily creates a cycle, which Floyd's algorithm detects in O(n)
time with O(1) space.

Cross-bin duplicates cannot escape detection because Phase 2 already
validates mapping(block_size(block)) == (fl, sl) for every entry; block
in the wrong bin fails that check before any cycle could form across
bins.

It removes the separate block_in_free_list lookup from Phase 1 (the
physical block walk), eliminating an O(n^2) worst-case path.
 - Stack: ~16 KB → 8 bytes (one pointer per bin walk)
 - Time:  O(n²) worst-case → O(n) total across all phases
 - Coverage: no silent skip above 1433 free blocks (old 70% load cap)
 - Termination: Phase 2 can no longer hang on corrupted cyclic lists
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 8, 2026

WCET Results (x86-64)

TLSF WCET Analysis
==================
Timer:      cycles
Cache:      hot
Pool:       4194304 bytes (4.0 MB)
Iterations: 5000 (warmup: 500)
Sizes:      16 64 256 1024 4096 bytes

--- malloc_worst (small alloc from single huge block) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         73         98         98        123        172       6835       91.2       96.3
      64         73         98         98         98        123        196       89.4       11.9
     256         73         74         98         98        123        490       85.0       13.7
    1024         73         74         98         98        171        172       85.7       12.7
    4096         73         98         98         98         98      20629       91.9      290.7

--- malloc_best (exact bin hit, no split) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         49         73         74         98         98        196       73.4        8.4
      64         49         74         74         98         98         98       73.6        8.6
     256         49         73         74         74         74         74       70.9        7.6
    1024         49         73         74         74         74        147       71.0        7.6
    4096         49         74         74         98         98        147       74.9        8.5

--- free_worst (sandwiched between two free blocks) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         49         73         74         74         98      16954       75.3      238.8
      64         49         74         74         98         98         98       73.8        8.3
     256         49         74         98         98        123      20948       81.8      295.3
    1024         49         74         98         98         98      16489       80.7      232.3
    4096         49         74         98         98         98        122       82.2       12.7

--- free_best (no merge (used neighbors)) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         49         73         74         74        122        123       62.5       12.3
      64         49         49         74         74         74         74       61.0       12.3
     256         49         49         74         74         74        147       59.0       12.1
    1024         49         49         74         74         74        123       61.0       12.3
    4096         49         73         74         74         74         98       63.1       12.1

--- worst/best ratio (p99) ---
    size     malloc       free
      16      1.00x      1.32x
      64      1.00x      1.32x
     256      1.32x      1.32x
    1024      1.00x      1.32x
    4096      1.32x      1.32x

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 8, 2026

WCET Results (arm64)

TLSF WCET Analysis
==================
Timer:      ticks
Cache:      hot
Pool:       4194304 bytes (4.0 MB)
Iterations: 5000 (warmup: 500)
Sizes:      16 64 256 1024 4096 bytes

--- malloc_worst (small alloc from single huge block) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         24         32         32         40         40      26496       36.7      374.3
      64         24         32         32         40         40         40       31.7        2.4
     256         16         32         32         40         40         40       31.5        2.6
    1024         24         32         32         32         40         40       30.9        2.9
    4096         24         32         32         40         40         40       31.6        2.5

--- malloc_best (exact bin hit, no split) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         24         24         32         32         32         32       25.4        3.0
      64          8         24         32         32         32         32       25.4        3.0
     256         16         24         32         32         32         32       25.4        3.1
    1024         16         24         32         32         32         32       25.3        3.0
    4096         16         24         32         32         32         32       25.3        3.0

--- free_worst (sandwiched between two free blocks) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         16         24         32         32         32         40       25.5        3.2
      64         16         24         32         32         32         40       26.1        3.6
     256         16         24         32         32         32      22568       29.9      318.8
    1024         16         24         32         32         32         32       25.6        3.2
    4096         16         24         32         32         32         32       25.9        3.4

--- free_best (no merge (used neighbors)) ---
    size        min        p50        p90        p99      p99.9        max       mean     stddev
      16         16         24         24         24         24       2128       20.9       30.1
      64         16         24         24         24         24         24       20.3        4.0
     256          8         24         24         24         24         24       20.4        4.0
    1024          8         24         24         24         24         24       20.5        4.0
    4096         16         24         24         24         24         32       20.6        4.0

--- worst/best ratio (p99) ---
    size     malloc       free
      16      1.25x      1.33x
      64      1.00x      1.33x
     256      1.25x      1.33x
    1024      1.25x      1.33x
    4096      1.25x      1.33x

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@jserv jserv merged commit efbd0be into master Feb 8, 2026
10 checks passed
@jserv jserv deleted the stack-usage branch February 8, 2026 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant