Use hash func to boost file creation and lookup#79
Use hash func to boost file creation and lookup#79RoyWFHuang wants to merge 3 commits intosysprog21:masterfrom
Conversation
|
How can you determine which hash function is the most suitable? |
visitorckw
left a comment
There was a problem hiding this comment.
I saw that your PR description includes some performance benchmarks, but the commit message lacks any performance numbers to support your improvements. Please improve the commit message.
b928f62 to
308bb4c
Compare
308bb4c to
1645d00
Compare
I’m not sure if "fnv" is the most suitable, but index in SimpleFS is relatively small, using a more complex algorithm might not provide significant benefits. I think fnv is a reasonable balance between simplicity and performance. |
ca74c03 to
51e0478
Compare
visitorckw
left a comment
There was a problem hiding this comment.
You ignored many of my comments without making any changes or providing any replies. You still retained many irrelevant changes, making the review difficult. Additionally, a single-line commit message saying only "optimize the file search process" is way too vague. Please improve the git commit message.
0519d61 to
864a9a1
Compare
Added all hash results into the commit. |
56c0522 to
0176a4b
Compare
0176a4b to
c2316df
Compare
visitorckw
left a comment
There was a problem hiding this comment.
Quoted from patch 2:
Align the print function with the Simplefs print format for consistency.
Also adjust variable declarations to fix compiler warnings when building
under the C90 standard.
I'm unsure which Linux kernel versions simplefs currently intends to support, but AFAIK, the Linux kernel currently uses gnu c11 as its standard.
Furthermore, the word "Also" is often a sign that the change should be in a separate patch. In my view, you are performing two distinct actions here:
a) Changing printk -> pr_err.
b) Fixing a compiler warning.
I also remain confused as to whether the printk to pr_err change is truly warranted, and what relevance it has to the PR's title, which is "Use hash func to boost file creation and lookup".
c2316df to
c51cbb1
Compare
| bh = sb_bread(sb, ci_dir->ei_block); | ||
| if (!bh) | ||
| return ERR_PTR(-EIO); | ||
|
|
||
| eblock = (struct simplefs_file_ei_block *) bh->b_data; | ||
| bh2 = sb_bread(sb, eblock->extents[ei].ee_start + bi); | ||
| if (!bh2) | ||
| return ERR_PTR(-EIO); |
There was a problem hiding this comment.
Buffer head leak in simplefs_lookup(): When bh2 read fails, bh is never released.
| hash_code = simplefs_hash(dentry) % | ||
| (SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT); |
There was a problem hiding this comment.
full_name_hash() is designed for VFS dentry caching, not for on-disk indexing. It uses a per-boot salt that changes on reboot. Consider: what happens to hash placement after reboot?
If full_name_hash() returns different values after reboot (it does), lookups will start from wrong positions. The linear probe fallback saves correctness, but destroys the performance benefit.
There was a problem hiding this comment.
Use a deterministic hash like FNV-1a or djb2.
|
@cubic-dev-ai Continue analyzing and warning until:
|
This comment was marked as resolved.
This comment was marked as resolved.
acfa00f to
19c1b8e
Compare
|
Worst case is still O(n) linear search when hash collisions cluster. No evidence hash quality was tested. What's the distribution? hash_code = simplefs_hash(dentry) % (SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT);
ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;The above is good for deterministic FNV-1a hash, proper modulo distribution. It uses 64-bit hash but returns 32-bit (truncates), no analysis of collision rates. |
| rm_new: | ||
| if (dest_inserted) { | ||
| bh_ext = sb_bread( | ||
| sb, eblk_dest->extents[dest_ei].ee_start + dest_inserted_bi); | ||
| if (bh_ext) { | ||
| dblock = (struct simplefs_dir_block *) bh_ext->b_data; | ||
| if (simplefs_try_remove_entry(dblock, eblk_dest, dest_ei, | ||
| src_in->i_ino, | ||
| dest_dentry->d_name.name)) { | ||
| mark_buffer_dirty(bh_ext); | ||
| mark_buffer_dirty(bh_fei_blk_dest); | ||
| } | ||
| brelse(bh_ext); | ||
| } |
There was a problem hiding this comment.
If simplefs_try_remove_entry fails (I/O error), do we leak the destination entry?
The code doesn't check return value.
There was a problem hiding this comment.
Use the dest_bh_ext cache in the insert section to quickly find deletion targets.
| strncpy(dblock->files[fi].filename, dest_dentry->d_name.name, | ||
| SIMPLEFS_FILENAME_LEN); |
There was a problem hiding this comment.
File becomes unfindable after rename because:
- Old entry stays at hash(old_name) location
- Future lookups calculate hash(new_name) and search wrong bucket
- File effectively vanishes from namespace
066a066 to
3d8fd72
Compare
There was a problem hiding this comment.
The hash-based directory indexing idea is sound and the perf numbers are real. But the implementation has several correctness bugs that can cause data corruption or make files unreachable. Three blocking issues found independently by all reviewers.
Minor: orphan comment (line 133), double blank lines (161, 1170), stale TODOs (1195, 1312), printk should be pr_info (1166, 1283), typo "founded" -> "found" (911), commit message typo "fiile".
Needs a v2 to fix the data-corruption and unreachable-file bugs before merge.
| static const struct inode_operations simplefs_inode_ops; | ||
| static const struct inode_operations symlink_inode_ops; | ||
|
|
||
| #define CHECK_AND_SET_RING_INDEX(idx, len) \ |
There was a problem hiding this comment.
CHECK_AND_SET_RING_INDEX is not a proper ring wrap. Single subtraction, not modulo. If idx starts at 7 (from hash % SIMPLEFS_MAX_BLOCKS_PER_EXTENT) but the current extent has ee_len = 2, the macro yields 7 - 2 = 5 -- still out of bounds. sb_bread then reads a block outside the extent.
Use idx %= len, or at minimum clamp the initial value to the actual extent length before entering the inner loop.
There was a problem hiding this comment.
In the current simplefs implementation, the hash_code is restricted to be within SIMPLEFS_MAX_EXTENTS * SIMPLEFS_MAX_BLOCKS_PER_EXTENT. Following this logic:
ei = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT ensures ei stays within 0 to SIMPLEFS_MAX_EXTENTS - 1.
bi = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT ensures bi stays within 0 to SIMPLEFS_MAX_BLOCKS_PER_EXTENT - 1.
Therefore, if we apply the mapping correctly using:
CHECK_AND_SET_RING_INDEX(ei, SIMPLEFS_MAX_EXTENTS);
CHECK_AND_SET_RING_INDEX(_bi, SIMPLEFS_MAX_BLOCKS_PER_EXTENT);
We should not encounter out-of-bounds errors under normal conditions."
|
|
||
| dblock = (struct simplefs_dir_block *) bh2->b_data; | ||
| /* Search file in ei_block */ | ||
| for (_fi = 0; _fi < dblock->nr_files;) { |
There was a problem hiding this comment.
Unchecked nr_blk from disk -- if on-disk nr_blk is 0, _fi += dblock->files[_fi].nr_blk loops forever. If nr_blk is oversized, _fi jumps past SIMPLEFS_FILES_PER_BLOCK and indexes out of bounds on the next iteration.
Validate nr_blk >= 1 && _fi + nr_blk <= SIMPLEFS_FILES_PER_BLOCK before advancing; return -EIO on corrupt metadata.
There was a problem hiding this comment.
I believe this situation should not occur under normal conditions, as nr_blk (or nr_files) should always be at least 1. If we encounter a 0, it likely indicates a bug in the block merging or splitting logic.
| if (chk < 0) | ||
| return ERR_PTR(chk); /* I/O error */ | ||
|
|
||
| bh = sb_bread(sb, ci_dir->ei_block); |
There was a problem hiding this comment.
Redundant I/O -- __file_lookup already reads the ei_block and dir_block to find the file, then releases them. Here simplefs_lookup reads them again just to extract the inode number. This doubles the I/O on every successful lookup and partially defeats the hash optimization.
Have __file_lookup return the inode number directly instead of the (ei, bi, fi) triple.
There was a problem hiding this comment.
Regarding the redundant I/O:
The current implementation of __file_lookup is designed only to locate the (ei, bi, fi) indices and does not maintain the bh cache. Returning the inode number directly or keeping the cache active would require refactoring all related functions and their call sites.
I agree this is a valid performance concern. I plan to create a dedicated issue to track this enhancement and implement a more efficient caching mechanism in a future update.
Additionally, regarding file search performance, we can leverage the "nr_files" information to further boost search speed. I will include this optimization in the same roadmap to improve overall directory operation efficiency.
| !strcmp(dblock->files[fi].filename, name)) { | ||
| dblock->files[fi].inode = 0; | ||
| /* Merge the empty data */ | ||
| for (i = fi - 1; i >= 0; i--) { |
There was a problem hiding this comment.
Backward merge fails when fi == 0. The loop for (i = fi - 1; i >= 0; i--) starts at i = -1 and immediately exits. The freed space at slot 0 is never coalesced with adjacent free space. Over time this leaks directory entry slots.
Handle the fi == 0 case explicitly -- the freed slot becomes the head of the free chain.
There was a problem hiding this comment.
I believe this case is already handled. We should separate this issue into two scenarios based on the state of the directory:
-
If fi = 0 is a file and fi = 1 is empty:
In this case, the nr_blk for fi = 0 should be >= 2 (as nr_blk is used to record contiguous empty blocks). When the file at fi = 0 is removed, we only need to update the metadata to reflect that the slot is now part of the free space. -
If fi = 0 is a file and fi = 1 also contains a file:
The nr_blk would be 1. When the file at fi = 0 is removed, it starts recording contiguous empty blocks from that position. Since the following slot (fi = 1) is occupied, the logic remains consistent and doesn't require an explicit change for the fi = 0 case.
To clarify, nr_blk has two different meanings in our implementation:
- If a block is empty: It indicates the number of contiguous empty blocks.
- If the leading block contains a file: It indicates the total span, including the used block and any trailing empty space.
| } | ||
| dblock = (struct simplefs_dir_block *) bh_ext->b_data; | ||
|
|
||
| strncpy(dblock->files[fi].filename, dest_dentry->d_name.name, |
There was a problem hiding this comment.
In-place rename in full directory breaks the hash invariant. When src_dir == dest_dir and the directory is full, this just overwrites the filename in the old slot. But the entry stays at the hash position of the old name. Future lookups hash the new name to a different bucket and will not find the file.
This makes renamed files unreachable by normal lookup/delete. Remove the in-place shortcut; do remove-then-reinsert using the new name's hash, or return an error if the target bucket is full.
There was a problem hiding this comment.
I think an in-place rename is sufficient in this case. If src_dir == dest_dir and the directory is full, even if we were to remove the old entry and re-insert it, the new entry would likely end up in the same slot anyway.
Even if the new filename's hash doesn't match the current slot, the file remains reachable. Our lookup mechanism performs a linear search starting from the hash index and continues through the ring until it finds the entry or returns to the start. Therefore, modifying the filename in-place avoids the overhead of a full remove-and-reinsert operation without sacrificing correctness.
|
|
||
| uint32_t simplefs_hash(struct dentry *dentry) | ||
| { | ||
| const char *str = dentry->d_name.name; |
There was a problem hiding this comment.
Unkeyed FNV-1a is vulnerable to HashDoS. FNV-1a is deterministic and trivially reversible. Attackers can craft filenames that all collide, forcing O(n) linear scans on every directory operation.
Consider using the kernel's full_name_hash() which is SipHash-based with a per-boot random key, or at minimum incorporate a per-superblock random salt.
There was a problem hiding this comment.
Based on our previous discussion
full_name_hash() is designed for VFS dentry caching, not for on-disk indexing. It uses a per-boot salt that changes on reboot. Consider: what happens to hash placement after reboot?
If full_name_hash() returns different values after reboot (it does), lookups will start from wrong positions. The linear probe fallback saves correctness, but destroys the performance benefit.
Use a deterministic hash like FNV-1a or djb2.
I believe we should stick with this approach to ensure consistency across reboots.
3d8fd72 to
f3a1971
Compare
Introduce a hash-based mechanism to speed up file creation and lookup operations. The hash function enables faster access to extent and logical block extent index, improving overall filesystem performance. hash_code = file_hash(file_name); extent index = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT block index = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT; Use perf to measure: 1. File Creation (random) Legacy: 259.842753513 seconds time elapsed 23.000247000 seconds user 150.380145000 seconds sys full_name_hash: 222.274028604 seconds time elapsed 20.794966000 seconds user 151.941876000 seconds sys 2. File Listing (random) Legacy: min time: 0.00171 s max time: 0.03799 s avg time: 0.00423332 s tot time: 129.539510 s full_name_hash: min time: 0.00171 s max time: 0.03588 s avg time: 0.00305601 s tot time: 93.514040 s 3. files Removal (Random) Legacy: 106.921706288 seconds time elapsed 16.987883000 seconds user 91.268661000 seconds sys full_name_hash: 86.132655220 seconds time elapsed 19.180209000 seconds user 68.476075000 seconds sys
f3a1971 to
7323d62
Compare









Previously, SimpleFS used a sequential insertion method to create files, which worked efficiently when the filesystem contained only a small number of files.
However, in real-world use cases, filesystems often manage a large number of files, making sequential search and insertion inefficient.
Inspired by Ext4’s hash-based directory indexing, this change adopts a hash function to accelerate file indexing and improve scalability.
Change:
Implemented hash-based file index lookup
Improved scalability for large directory structures
hash_code = file_hash(file_name);
extent index = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT
block index = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
Performance test
legacy:
full_name_hash
legacy:
full_name_hash
Use perf stat ls -la to measure the query time for each file and sum up all elapsed times to calculate the total lookup cost.
Legacy :
min time: 0.00171 s
max time: 0.03799 s
avg time: 0.00423332 s
tot time: 129.539510 s
full_name_hash:
min time: 0.00171 s
max time: 0.03588 s
avg time: 0.00305601 s
tot time: 93.514040 s
Summary by cubic
Switched SimpleFS to hash-based directory indexing using a deterministic FNV-1a
simplefs_hashto map filenames to extent/block slots for faster create, lookup, and delete. On 30.6k files: create ~15% faster, delete ~19% faster, lookup ~28% faster.New Features
__file_lookupused by lookup/rename.Refactors
hash.c(FNV-1a) andCHECK_AND_SET_RING_INDEX; renamed helpers (e.g.,simplefs_get_new_ext) and variables for clarity; updated Makefile to includehash.o.Written for commit 7323d62. Summary will update on new commits.