Use hash func to boost file creation and lookup by RoyWFHuang · Pull Request #79 · sysprog21/simplefs

RoyWFHuang · 2025-11-10T03:54:28Z

Previously, SimpleFS used a sequential insertion method to create files, which worked efficiently when the filesystem contained only a small number of files.
However, in real-world use cases, filesystems often manage a large number of files, making sequential search and insertion inefficient.
Inspired by Ext4’s hash-based directory indexing, this change adopts a hash function to accelerate file indexing and improve scalability.

Change:
Implemented hash-based file index lookup
Improved scalability for large directory structures

hash_code = file_hash(file_name);

extent index = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT
block index = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;

 inode
  +-----------------------+
  | i_mode = IFDIR | 0755 |      block 123 (simplefs_file_ei_block)
  | ei_block = 123    ----|--->  +----------------+
  | i_size = 4 KiB        |      | nr_files  = 7  |
  | i_blocks = 1          |      |----------------|
  +-----------------------+    0 | ee_block  = 0  |
              (extent index = 0) | ee_len    = 8  |
                                 | ee_start  = 84 |--->  +-------------+ block 84(simplefs_dir_block)
                                 | nr_file   = 2  |      |nr_files = 2 | (block index = 0)
                                 |----------------|      |-------------|
                               1 | ee_block  = 8  |    0 | inode  = 24 |
              (extent index = 1) | ee_len    = 8  |      | nr_blk = 1  |
                                 | ee_start  = 16 |      | (foo)       |
                                 | nr_file   = 5  |      |-------------|
                                 |----------------|    1 | inode  = 45 |
                                 | ...            |      | nr_blk = 14 |
                                 |----------------|      | (bar)       |
                             341 | ee_block  = 0  |      |-------------|
            (extent index = 341) | ee_len    = 0  |      | ...         |
                                 | ee_start  = 0  |      |-------------|
                                 | nr_file   = 12 |   14 | 0           |
                                 +----------------+      +-------------+ block 85(simplefs_dir_block)
                                                         |nr_files = 2 | (block index = 1)
                                                         |-------------|
                                                       0 | inode  = 48 |
                                                         | nr_blk = 15 |
                                                         | (foo1)      |
                                                         |-------------|
                                                       1 | inode  = 0  |
                                                         | nr_blk = 0  |
                                                         |             |
                                                         |-------------|
                                                         | ...         |
                                                         |-------------|
                                                      14 | 0           |
                                                         +-------------+

Performance test

Random create 30600 files into filesystem

legacy:

         168140.12 msec task-clock                       #    0.647 CPUs utilized
            111367      context-switches                 #  662.346 /sec
             40917      cpu-migrations                   #  243.351 /sec
           3736053      page-faults                      #   22.220 K/sec
      369091680702      cycles                           #    2.195 GHz
      168751830643      instructions                     #    0.46  insn per cycle
       34044524391      branches                         #  202.477 M/sec
         768151711      branch-misses                    #    2.26% of all branches

     259.842753513 seconds time elapsed
      23.000247000 seconds user
     150.380145000 seconds sys

full_name_hash

         167926.13 msec task-clock                       #    0.755 CPUs utilized
            110631      context-switches                 #  658.808 /sec
             43835      cpu-migrations                   #  261.037 /sec
           3858617      page-faults                      #   22.978 K/sec
      392878398961      cycles                           #    2.340 GHz
      207287412692      instructions                     #    0.53  insn per cycle
       42556269864      branches                         #  253.423 M/sec
         840868990      branch-misses                    #    1.98% of all branches

     222.274028604 seconds time elapsed
      20.794966000 seconds user
     151.941876000 seconds sys

Random remove 30600 files into filesystem

legacy:

         104332.44 msec task-clock                       #    0.976 CPUs utilized
             56514      context-switches                 #  541.672 /sec
              1174      cpu-migrations                   #   11.252 /sec
           3796962      page-faults                      #   36.393 K/sec
      258293481279      cycles                           #    2.476 GHz
      153853176926      instructions                     #    0.60  insn per cycle
       30434271757      branches                         #  291.705 M/sec
         532967347      branch-misses                    #    1.75% of all branches

     106.921706288 seconds time elapsed
      16.987883000 seconds user
      91.268661000 seconds sys

full_name_hash

          83278.61 msec task-clock                       #    0.967 CPUs utilized
             52431      context-switches                 #  629.585 /sec
              1309      cpu-migrations                   #   15.718 /sec
           3796501      page-faults                      #   45.588 K/sec
      199894058328      cycles                           #    2.400 GHz
      110625460371      instructions                     #    0.55  insn per cycle
       20325767251      branches                         #  244.069 M/sec
         490549944      branch-misses                    #    2.41% of all branches

      86.132655220 seconds time elapsed
      19.180209000 seconds user
      68.476075000 seconds sys

Random check (ls -la filename) 30600 files into filesystem
Use perf stat ls -la to measure the query time for each file and sum up all elapsed times to calculate the total lookup cost.

Legacy :
min time: 0.00171 s
max time: 0.03799 s
avg time: 0.00423332 s
tot time: 129.539510 s

full_name_hash:
min time: 0.00171 s
max time: 0.03588 s
avg time: 0.00305601 s
tot time: 93.514040 s

Summary by cubic

Switched SimpleFS to hash-based directory indexing using a deterministic FNV-1a simplefs_hash to map filenames to extent/block slots for faster create, lookup, and delete. On 30.6k files: create ~15% faster, delete ~19% faster, lookup ~28% faster.

New Features
- Hash-guided placement and lookup in create, link, symlink, rename, and unlink, with ring-scan fallback across extents/blocks on collisions; added fast __file_lookup used by lookup/rename.
- readdir now stops early using per-extent and per-block file counters; unlink/rename free empty extents; same-dir rename updates the entry in place, cross-dir rename inserts-then-removes with rollback safety.
Refactors
- Added hash.c (FNV-1a) and CHECK_AND_SET_RING_INDEX; renamed helpers (e.g., simplefs_get_new_ext) and variables for clarity; updated Makefile to include hash.o.

^{Written for commit 7323d62. Summary will update on new commits.}

jserv · 2025-11-10T03:58:52Z

How can you determine which hash function is the most suitable?

symlink.c

hash.c

visitorckw

I saw that your PR description includes some performance benchmarks, but the commit message lacks any performance numbers to support your improvements. Please improve the commit message.

bitmap.h

super.c

dir.c

.config

RoyWFHuang · 2025-11-10T21:36:11Z

How can you determine which hash function is the most suitable?

I’m not sure if "fnv" is the most suitable, but index in SimpleFS is relatively small, using a more complex algorithm might not provide significant benefits. I think fnv is a reasonable balance between simplicity and performance.

visitorckw

You ignored many of my comments without making any changes or providing any replies. You still retained many irrelevant changes, making the review difficult. Additionally, a single-line commit message saying only "optimize the file search process" is way too vague. Please improve the git commit message.

hash.c

RoyWFHuang · 2025-11-13T22:05:07Z

I saw that your PR description includes some performance benchmarks, but the commit message lacks any performance numbers to support your improvements. Please improve the commit message.

Added all hash results into the commit.

Makefile

hash.c

.github/workflows/main.yaml

visitorckw

Quoted from patch 2:

Align the print function with the Simplefs print format for consistency.
Also adjust variable declarations to fix compiler warnings when building
under the C90 standard.

I'm unsure which Linux kernel versions simplefs currently intends to support, but AFAIK, the Linux kernel currently uses gnu c11 as its standard.

Furthermore, the word "Also" is often a sign that the change should be in a separate patch. In my view, you are performing two distinct actions here:

a) Changing printk -> pr_err.
b) Fixing a compiler warning.

I also remain confused as to whether the printk to pr_err change is truly warranted, and what relevance it has to the PR's title, which is "Use hash func to boost file creation and lookup".

jserv · 2025-11-28T04:37:34Z

inode.c

+    bh = sb_bread(sb, ci_dir->ei_block);
+    if (!bh)
+        return ERR_PTR(-EIO);
+
+    eblock = (struct simplefs_file_ei_block *) bh->b_data;
+    bh2 = sb_bread(sb, eblock->extents[ei].ee_start + bi);
+    if (!bh2)
+        return ERR_PTR(-EIO);


Buffer head leak in simplefs_lookup(): When bh2 read fails, bh is never released.

inode.c

jserv · 2025-11-28T04:40:47Z

inode.c

+    hash_code = simplefs_hash(dentry) %
+                (SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT);


full_name_hash() is designed for VFS dentry caching, not for on-disk indexing. It uses a per-boot salt that changes on reboot. Consider: what happens to hash placement after reboot?

If full_name_hash() returns different values after reboot (it does), lookups will start from wrong positions. The linear probe fallback saves correctness, but destroys the performance benefit.

Use a deterministic hash like FNV-1a or djb2.

jserv · 2025-11-28T04:44:03Z

@cubic-dev-ai Continue analyzing and warning until:

Fix all buffer head leaks
Add missing mark_buffer_dirty() calls
Fix the rename error path panic
Extract wraparound logic into helper function

jserv · 2026-02-04T04:09:17Z

Worst case is still O(n) linear search when hash collisions cluster. No evidence hash quality was tested. What's the distribution?

  hash_code = simplefs_hash(dentry) % (SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT);
  ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
  bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;

The above is good for deterministic FNV-1a hash, proper modulo distribution. It uses 64-bit hash but returns 32-bit (truncates), no analysis of collision rates.

inode.c

jserv · 2026-02-04T04:12:17Z

inode.c

+rm_new:
+    if (dest_inserted) {
+        bh_ext = sb_bread(
+            sb, eblk_dest->extents[dest_ei].ee_start + dest_inserted_bi);
+        if (bh_ext) {
+            dblock = (struct simplefs_dir_block *) bh_ext->b_data;
+            if (simplefs_try_remove_entry(dblock, eblk_dest, dest_ei,
+                                          src_in->i_ino,
+                                          dest_dentry->d_name.name)) {
+                mark_buffer_dirty(bh_ext);
+                mark_buffer_dirty(bh_fei_blk_dest);
+            }
+            brelse(bh_ext);
+        }


If simplefs_try_remove_entry fails (I/O error), do we leak the destination entry?
The code doesn't check return value.

Use the dest_bh_ext cache in the insert section to quickly find deletion targets.

hash.c

inode.c

jserv · 2026-02-04T04:19:02Z

inode.c

+        strncpy(dblock->files[fi].filename, dest_dentry->d_name.name,
+                SIMPLEFS_FILENAME_LEN);


File becomes unfindable after rename because:

Old entry stays at hash(old_name) location

Future lookups calculate hash(new_name) and search wrong bucket

File effectively vanishes from namespace

RoyWFHuang · 2026-04-03T14:41:11Z

Worst case is still O(n) linear search when hash collisions cluster. No evidence hash quality was tested. What's the distribution?
  hash_code = simplefs_hash(dentry) % (SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT);
  ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
  bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
The above is good for deterministic FNV-1a hash, proper modulo distribution. It uses 64-bit hash but returns 32-bit (truncates), no analysis of collision rates.

The current structure relies on 2,040 buckets, each handling 15 items. We are aware of the $O(n)$ risk when clusters occur.I've performed some benchmarks on the FNV-1a hash distribution within this range.

Please see the attached graphics for the actual collision rates and distribution stats. We are monitoring the impact of the 32-bit truncation to ensure hash quality

case 1 random file name (200000 files)
random alphabet set from 4 ~ 12 characters for the file name

A one-shot statistical analysis of 200,000 common file names.

Analysis of the first collision occurrence (averaged over 200 runs).

Frequency of collisions occurring when a bucket reaches its maximum capacity of 15 items (averaged over 200 runs).

case 2 common file name for human used (200000 files)
use
prefixes = ["IMG", "DSC", "Document", "Scan", "Final", "Draft", "Backup", "Project", "Meeting_Notes", "Invoice"]
topics = ["Marketing", "Finance", "Trip", "Family", "Work", "Old_Files", "Report", "Presentation", "Photo", "Data"]
extensions = [".jpg", ".pdf", ".docx", ".xlsx", ".png", ".txt", ".mp4", ".zip"]
to create file name

A one-shot statistical analysis of 200,000 common file names.

Analysis of the first collision occurrence (averaged over 200 runs).

Frequency of collisions occurring when a bucket reaches its maximum capacity of 15 items (averaged over 200 runs)

case 3 random file name (2040 * 15 files), maximum file capacity based on simplefs settings ($2040 \times 15$ files).

Each collision requires a linear probe (incrementing the index) to find an available slot.

Based on these statistics, we can observe that the collision rate remains low for the first 20,000 files (approximately 2/3 of the total capacity).

jserv

The hash-based directory indexing idea is sound and the perf numbers are real. But the implementation has several correctness bugs that can cause data corruption or make files unreachable. Three blocking issues found independently by all reviewers.

Minor: orphan comment (line 133), double blank lines (161, 1170), stale TODOs (1195, 1312), printk should be pr_info (1166, 1283), typo "founded" -> "found" (911), commit message typo "fiile".

Needs a v2 to fix the data-corruption and unreachable-file bugs before merge.

jserv · 2026-04-03T15:18:04Z

inode.c

 static const struct inode_operations simplefs_inode_ops;
 static const struct inode_operations symlink_inode_ops;

+#define CHECK_AND_SET_RING_INDEX(idx, len) \


CHECK_AND_SET_RING_INDEX is not a proper ring wrap. Single subtraction, not modulo. If idx starts at 7 (from hash % SIMPLEFS_MAX_BLOCKS_PER_EXTENT) but the current extent has ee_len = 2, the macro yields 7 - 2 = 5 -- still out of bounds. sb_bread then reads a block outside the extent.

Use idx %= len, or at minimum clamp the initial value to the actual extent length before entering the inner loop.

In the current simplefs implementation, the hash_code is restricted to be within SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT. Following this logic:

ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT ensures ei stays within 0 to SIMPLEFS_MAX_EXTENTS - 1.

bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT ensures bi stays within 0 to SIMPLEFS_MAX_BLOCKS_PER_EXTENT - 1.

Therefore, if we apply the mapping correctly using:
CHECK_AND_SET_RING_INDEX(ei, SIMPLEFS_MAX_EXTENTS);
CHECK_AND_SET_RING_INDEX(_bi, SIMPLEFS_MAX_BLOCKS_PER_EXTENT);

We should not encounter out-of-bounds errors under normal conditions."

inode.c

jserv · 2026-04-03T15:18:04Z

inode.c

+
+            dblock = (struct simplefs_dir_block *) bh2->b_data;
+            /* Search file in ei_block */
+            for (_fi = 0; _fi < dblock->nr_files;) {


Unchecked nr_blk from disk -- if on-disk nr_blk is 0, _fi += dblock->files[_fi].nr_blk loops forever. If nr_blk is oversized, _fi jumps past SIMPLEFS_FILES_PER_BLOCK and indexes out of bounds on the next iteration.

Validate nr_blk >= 1 && _fi + nr_blk <= SIMPLEFS_FILES_PER_BLOCK before advancing; return -EIO on corrupt metadata.

I believe this situation should not occur under normal conditions, as nr_blk (or nr_files) should always be at least 1. If we encounter a 0, it likely indicates a bug in the block merging or splitting logic.

jserv · 2026-04-03T15:18:04Z

inode.c

+    if (chk < 0)
+        return ERR_PTR(chk); /* I/O error */
+
    bh = sb_bread(sb, ci_dir->ei_block);


Redundant I/O -- __file_lookup already reads the ei_block and dir_block to find the file, then releases them. Here simplefs_lookup reads them again just to extract the inode number. This doubles the I/O on every successful lookup and partially defeats the hash optimization.

Have __file_lookup return the inode number directly instead of the (ei, bi, fi) triple.

Regarding the redundant I/O:

The current implementation of __file_lookup is designed only to locate the (ei, bi, fi) indices and does not maintain the bh cache. Returning the inode number directly or keeping the cache active would require refactoring all related functions and their call sites.

I agree this is a valid performance concern. I plan to create a dedicated issue to track this enhancement and implement a more efficient caching mechanism in a future update.

Additionally, regarding file search performance, we can leverage the "nr_files" information to further boost search speed. I will include this optimization in the same roadmap to improve overall directory operation efficiency.

jserv · 2026-04-03T15:18:04Z

inode.c

+                !strcmp(dblock->files[fi].filename, name)) {
+                dblock->files[fi].inode = 0;
+                /* Merge the empty data */
+                for (i = fi - 1; i >= 0; i--) {


Backward merge fails when fi == 0. The loop for (i = fi - 1; i >= 0; i--) starts at i = -1 and immediately exits. The freed space at slot 0 is never coalesced with adjacent free space. Over time this leaks directory entry slots.

Handle the fi == 0 case explicitly -- the freed slot becomes the head of the free chain.

I believe this case is already handled. We should separate this issue into two scenarios based on the state of the directory:

If fi = 0 is a file and fi = 1 is empty:
In this case, the nr_blk for fi = 0 should be >= 2 (as nr_blk is used to record contiguous empty blocks). When the file at fi = 0 is removed, we only need to update the metadata to reflect that the slot is now part of the free space.

If fi = 0 is a file and fi = 1 also contains a file:
The nr_blk would be 1. When the file at fi = 0 is removed, it starts recording contiguous empty blocks from that position. Since the following slot (fi = 1) is occupied, the logic remains consistent and doesn't require an explicit change for the fi = 0 case.

To clarify, nr_blk has two different meanings in our implementation:

If a block is empty: It indicates the number of contiguous empty blocks.

If the leading block contains a file: It indicates the total span, including the used block and any trailing empty space.

inode.c

jserv · 2026-04-03T15:18:04Z

inode.c

        }
+        dblock = (struct simplefs_dir_block *) bh_ext->b_data;
+
+        strncpy(dblock->files[fi].filename, dest_dentry->d_name.name,


In-place rename in full directory breaks the hash invariant. When src_dir == dest_dir and the directory is full, this just overwrites the filename in the old slot. But the entry stays at the hash position of the old name. Future lookups hash the new name to a different bucket and will not find the file.

This makes renamed files unreachable by normal lookup/delete. Remove the in-place shortcut; do remove-then-reinsert using the new name's hash, or return an error if the target bucket is full.

I think an in-place rename is sufficient in this case. If src_dir == dest_dir and the directory is full, even if we were to remove the old entry and re-insert it, the new entry would likely end up in the same slot anyway.

Even if the new filename's hash doesn't match the current slot, the file remains reachable. Our lookup mechanism performs a linear search starting from the hash index and continues through the ring until it finds the entry or returns to the start. Therefore, modifying the filename in-place avoids the overhead of a full remove-and-reinsert operation without sacrificing correctness.

inode.c

jserv · 2026-04-03T15:18:05Z

hash.c

+
+uint32_t simplefs_hash(struct dentry *dentry)
+{
+    const char *str = dentry->d_name.name;


Unkeyed FNV-1a is vulnerable to HashDoS. FNV-1a is deterministic and trivially reversible. Attackers can craft filenames that all collide, forcing O(n) linear scans on every directory operation.

Consider using the kernel's full_name_hash() which is SipHash-based with a per-boot random key, or at minimum incorporate a per-superblock random salt.

Based on our previous discussion

full_name_hash() is designed for VFS dentry caching, not for on-disk indexing. It uses a per-boot salt that changes on reboot. Consider: what happens to hash placement after reboot? If full_name_hash() returns different values after reboot (it does), lookups will start from wrong positions. The linear probe fallback saves correctness, but destroys the performance benefit. Use a deterministic hash like FNV-1a or djb2.

I believe we should stick with this approach to ensure consistency across reboots.

inode.c

Introduce a hash-based mechanism to speed up file creation and lookup operations. The hash function enables faster access to extent and logical block extent index, improving overall filesystem performance. hash_code = file_hash(file_name); extent index = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT block index = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT; Use perf to measure: 1. File Creation (random) Legacy: 259.842753513 seconds time elapsed 23.000247000 seconds user 150.380145000 seconds sys full_name_hash: 222.274028604 seconds time elapsed 20.794966000 seconds user 151.941876000 seconds sys 2. File Listing (random) Legacy: min time: 0.00171 s max time: 0.03799 s avg time: 0.00423332 s tot time: 129.539510 s full_name_hash: min time: 0.00171 s max time: 0.03588 s avg time: 0.00305601 s tot time: 93.514040 s 3. files Removal (Random) Legacy: 106.921706288 seconds time elapsed 16.987883000 seconds user 91.268661000 seconds sys full_name_hash: 86.132655220 seconds time elapsed 19.180209000 seconds user 68.476075000 seconds sys

jserv reviewed Nov 10, 2025

View reviewed changes

symlink.c Outdated Show resolved Hide resolved

jserv reviewed Nov 10, 2025

View reviewed changes

hash.c Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

visitorckw suggested changes Nov 10, 2025

View reviewed changes

visitorckw reviewed Nov 10, 2025

View reviewed changes

bitmap.h Show resolved Hide resolved

visitorckw reviewed Nov 10, 2025

View reviewed changes

super.c Outdated Show resolved Hide resolved

visitorckw reviewed Nov 10, 2025

View reviewed changes

dir.c Show resolved Hide resolved

RoyWFHuang force-pushed the feature/op_perf branch from b928f62 to 308bb4c Compare November 10, 2025 21:11

jserv reviewed Nov 10, 2025

View reviewed changes

.config Outdated Show resolved Hide resolved

RoyWFHuang force-pushed the feature/op_perf branch from 308bb4c to 1645d00 Compare November 10, 2025 21:30

RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from ca74c03 to 51e0478 Compare November 10, 2025 21:54

RoyWFHuang requested a review from visitorckw November 10, 2025 23:56

visitorckw suggested changes Nov 11, 2025

View reviewed changes

visitorckw reviewed Nov 11, 2025

View reviewed changes

hash.c Outdated Show resolved Hide resolved

RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from 0519d61 to 864a9a1 Compare November 13, 2025 21:21

jserv reviewed Nov 14, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from 56c0522 to 0176a4b Compare November 14, 2025 17:38

jserv reviewed Nov 14, 2025

View reviewed changes

hash.c Outdated Show resolved Hide resolved

jserv reviewed Nov 14, 2025

View reviewed changes

.github/workflows/main.yaml Outdated Show resolved Hide resolved

RoyWFHuang force-pushed the feature/op_perf branch from 0176a4b to c2316df Compare November 14, 2025 17:54

RoyWFHuang requested review from jserv and visitorckw November 15, 2025 04:56

visitorckw suggested changes Nov 15, 2025

View reviewed changes

RoyWFHuang force-pushed the feature/op_perf branch from c2316df to c51cbb1 Compare November 16, 2025 01:24

sysprog21 deleted a comment from cubic-dev-ai bot Nov 28, 2025

This comment was marked as resolved.

Sign in to view

jserv reviewed Nov 28, 2025

View reviewed changes

inode.c Outdated Show resolved Hide resolved

jserv reviewed Nov 28, 2025

View reviewed changes

sysprog21 deleted a comment from cubic-dev-ai bot Nov 28, 2025

This comment was marked as resolved.

Sign in to view

jserv mentioned this pull request Jan 29, 2026

Support mkfs.simplefs on macOS #82

Merged

RoyWFHuang force-pushed the feature/op_perf branch 2 times, most recently from acfa00f to 19c1b8e Compare February 3, 2026 21:46