fix(glob): discard partial dircache entry after prefix-filtered _find#2058
fix(glob): discard partial dircache entry after prefix-filtered _find#2058bruno-hays wants to merge 2 commits into
Conversation
e3d4779 to
749f2df
Compare
|
I had not anticipated such a solution! Does this miss the case that the dircache was already populated before calling find(), so we end up discarding legitimate listings? |
|
Seems to be causing explicit failures |
When `_glob` passes `prefix=` to `_find`, backends like s3fs, gcsfs and
adlfs perform a server-side filtered listing and store the result in
`dircache` under the parent directory key. That entry is a *partial*
listing (only files matching the prefix), but it gets treated as a
complete directory listing by every subsequent operation on the same
path.
Consequence: after `glob("dir/train-*")`, a call to `glob("dir/test-*")`
or `fs.exists("dir/test-file")` hits the cached train-only listing and
returns an empty result / False, even though the test files exist on the
remote storage. The regression was introduced in 2026.4.0 by the
prefix= optimisation (PR fsspec#1996).
Fix: after `_find` returns, remove the `dircache` entry for `root` when
a prefix was used. The next lookup for the same directory will perform
a fresh full listing and cache it correctly.
Adds a regression test using a mock backend that faithfully simulates
the partial-caching behaviour (cache-hit path returns only the
prefix-filtered subset, triggering the exact failure mode).
Co-authored-by: Cursor <cursoragent@cursor.com>
749f2df to
65cbeb8
Compare
|
I fixed the test, sorry for oversight. |
|
Here is my proposed fix for s3fs: fsspec/s3fs#1034 |
Problem
_globpasses aprefix=hint to_findso backends (s3fs, gcsfs, adlfs) can filter server-side up to the first wildcard. Those backends store the filtered result indircacheunder the parent directory key — but the entry contains only prefix-matching files, making it a partial (misleading) directory listing.Consequence: after
glob("dir/train-*"), a call toglob("dir/test-*")orfs.exists("dir/test-file")hits the stale partial entry and silently returns nothing /False, even though the files exist.This regression was introduced in 2026.4.0 by #1996.
Fix
After
_findreturns with a prefixed query, discard the dircache entry forroot. The result is already captured inallpathsso nothing is lost; the next lookup for the same directory fetches a fresh full listing.