Skip to content

restrict cache storage directory permissions to owner#2048

Merged
martindurant merged 2 commits into
fsspec:masterfrom
sahvx655-wq:cache-dir-permissions
Jun 26, 2026
Merged

restrict cache storage directory permissions to owner#2048
martindurant merged 2 commits into
fsspec:masterfrom
sahvx655-wq:cache-dir-permissions

Conversation

@sahvx655-wq

Copy link
Copy Markdown
Contributor

While looking at how filecache/blockcache/simplecache persist downloaded data, I noticed an inconsistency in the permissions they leave behind. The cache metadata file ends up 0o600 because it goes through atomic_write, which creates its temporary via mkstemp, but the cache directory and the cached data files inside it do not. CachingFileSystem.__init__, _mkcache and SimpleCacheFileSystem.__init__ each create the storage directory with a bare os.makedirs(..., exist_ok=True), so under a normal 0o022 umask the directory comes out 0o755 and the cached files 0o644. When cache_storage points at a persistent location (the documented way to keep a cache between runs), any other local user can traverse the directory and read another user's cached content, which for the http/s3/gcs backends is data that was fetched with that user's credentials.

The change sets mode=0o700 on the three os.makedirs calls that create the cache storage directory, so the directory fsspec owns becomes owner-only and the data files within inherit that protection whichever backend wrote them. Keeping it at the directory-creation point rather than chmod-ing each cached file individually puts the guard in one place and covers all three cache implementations and every write path through them; directories fsspec did not create are left as they are. I have added a parametrised regression test over filecache/simplecache/blockcache that fails on the current tree (the directory is 0o755) and passes once the mode is applied.

@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

gentle ping

@martindurant

Copy link
Copy Markdown
Member

Hmm, I am not sure what the default should be here. The user, after all, has complete control over where the data is written, so if they want to share privileged data with other users, we shouldn't hinder that.

I suggest that this can be made into an argument for the caching filesystems, but the default should remain what it is now (i.e., None, system default).

Default of None keeps the system default (umask), preserving current
behaviour; passing an octal mode such as 0o700 restricts the cache
storage directory to the owner.
@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

Fair point - if someone deliberately points the cache at a shared location we shouldn't override that. I've reworked it the way you suggested: there's now a cache_storage_mode argument on the caching filesystems, defaulting to None so the directory is created exactly as before (system default under the umask). Passing an octal mode like 0o700 is opt-in and keeps the cache directory, and so the data files inside it, owner-only.

It's threaded through the three places that create the storage dir (CachingFileSystem.__init__, _mkcache and SimpleCacheFileSystem.__init__) so it covers filecache/blockcache/simplecache, and the regression test now exercises the argument rather than asserting a restricted default.

@martindurant

Copy link
Copy Markdown
Member

(gcsfs tests fixed in #2061 )

@martindurant martindurant merged commit d256591 into fsspec:master Jun 26, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants