fix: close stale S3BucketRegionCache on session refresh; guard _sessions=None; serialize set_session() under asyncio.Lock#1028
Conversation
When set_session(refresh=True) replaces _s3creator, the old cache is now closed via __aexit__ before being discarded. Without this, the aiobotocore ClientSessions it holds are garbage-collected without an explicit close(), producing aiohttp "Unclosed client session" warnings at process exit. Also track the weakref finalizer as self._finalizer so that an explicit _close() / close() call can detach it, preventing a double-close when the S3FileSystem is subsequently garbage-collected. Closes #NNNN. See also: aio-libs/aiobotocore#866 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…essions=None The _close() / close = sync_wrapper(_close) addition was reverted. zarr's FsspecStore.close() calls _close_fs(fs) which uses fs.set_session() + session.close() — it never calls fs.close() directly, so exposing a sync close() wrapper introduced new behaviour (potential premature shutdown) with no caller. Also guard the stale-session check at set_session() against the newer aiobotocore behaviour where AIOHTTPSession.__aexit__() sets _sessions=None (older versions left it as an empty dict). Without this guard, _sessions.values() raises AttributeError after an explicit session.close(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… on empty _sessions
Two related bugs caused concurrent callers (e.g. zarr's asyncio.gather in
AsyncGroup.open) to tear down each other's in-flight S3 sessions:
1. The stale-session check treated an empty _sessions dict ({} — no
requests sent yet, not genuinely stale) the same as all-closed sessions
because all([]) is vacuously True. Added `sessions and` guard so an
empty dict never triggers a refresh.
2. When multiple coroutines each saw self._s3 is None simultaneously they
each created their own creator, and Fix 1 (close stale creator on
refresh) then closed a creator whose client was still in use by a peer,
setting _sessions=None mid-request. Serialise the entire creator-setup
path under a per-instance asyncio.Lock (created lazily; the check +
assignment between two non-await lines is atomic in asyncio). Waiting
tasks re-check self._s3 under the lock and return early if a peer
already did the setup.
Also guard against _sessions=None (newer aiobotocore sets it to None after
__aexit__ rather than leaving an empty dict) in the stale-session check to
prevent AttributeError on .values().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
martindurant
left a comment
There was a problem hiding this comment.
OK, I think I have wrapped my head around it now. It's a pity that indenting such a big block messes us the look of the diff so badly; it's only the bit with awaits that needs to go inside the lock block correct? Maybe the kwargs can be worked out beforehand with the same indentation as before.
| if not hasattr(self, "_setup_lock"): | ||
| self._setup_lock = asyncio.Lock() | ||
| async with self._setup_lock: | ||
| # Re-check under the lock: a concurrent task may have set up the |
There was a problem hiding this comment.
Since there was no await between creation of the lock and acquiring it, no other coroutine could have run. The only way the lock might have been acquired is in another thread - but asyncio threads are not thread-safe anyway (and all async code should be running in the one event loop on the one thread)
OK, so the model is, that one coroutine might have run to the awaits below (get_client, __aenter__) while this coroutine is waiting for the lock?
There was a problem hiding this comment.
Yes, that's exactly right. A coroutine can yield control at the await get_client(...) and await __aenter__() calls below the lock, allowing a sibling coroutine (e.g. launched via asyncio.gather) to also enter set_session() while the first is suspended. Without the lock they'd each create a separate _s3creator, and when the first resumes and replaces self._s3creator, it calls __aexit__ on the stale one — tearing down a session the sibling is still actively using.
| if not hasattr(self, "_setup_lock"): | ||
| self._setup_lock = asyncio.Lock() |
There was a problem hiding this comment.
Prefer to use self._setup_lock = None in __init__.
Note that we don't remove the lock - should it be made in __init__ instead?
There was a problem hiding this comment.
Good point on declaring it in __init__ — happy to add self._setup_lock = None there for clarity. The reason the actual asyncio.Lock() creation stays lazy rather than moving entirely to __init__ is that AbstractFileSystem uses instance caching via __new__, so an S3FileSystem instance may be constructed outside any async context and potentially reused across different event loops. Creating the Lock lazily inside set_session() (which is always called from within an async context) ensures it's bound to the right loop. On Python 3.10+ this is less of a concern since locks no longer bind at construction time, but keeping it lazy is the safer cross-version choice.
There was a problem hiding this comment.
On Python 3.10+ this is less of a concern
We don't support 3.9 any more
- Move asyncio.Lock() creation from lazy hasattr check into __init__, valid now that Python 3.9 is no longer supported and locks no longer bind to the event loop at construction time. - Extend the stale-session check to treat sessions=None as stale. aiobotocore 3.x sets _sessions=None after __aexit__, whereas 2.x leaves a dict of closed sessions; without this fix test_session_close fails on aiobotocore 3.x because the second asyncio.run reuses a closed creator that is never refreshed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Only the awaits, session creation, and state mutations need to be inside the lock block. The kwargs computation is pure synchronous reads that can sit at the original indentation level, keeping the diff readable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Can you check with #1020 - does that also offer the right guard now? |
Problem
Three related issues in
S3FileSystem.set_session()that together causeaiohttpUnclosed client session/Unclosed connectorwarnings when the server closes connections mid-session:Stale
_s3creatoris silently abandoned on refresh. Whenset_session(refresh=True)replacesself._s3creatorwith a newS3BucketRegionCache, the old cache is never closed. It holds aiobotocore-createdaiohttp.ClientSessionobjects (1 general + up to N regional). GC'd without an explicitclose()→ aiohttp emits warnings.Vacuous-truth refresh on empty
_sessions.all(_.closed for _ in [])is alwaysTrue, so a freshly-enteredAIOHTTPSessionwith_sessions = {}(no requests sent yet) incorrectly triggersrefresh=True. This causes the next issue to fire spuriously.Concurrent callers (e.g.
asyncio.gather) each create their own creator. Whenself._s3 is None, multiple concurrent coroutines all fall through to the setup path. Without serialization, Task B's Fix 1 (close stale creator) tears down Task A's creator while Task A is mid-request — setting_sessions = Noneon an activeAIOHTTPSession, producing:at
aiobotocore/httpsession.py.The stale-check also needs to guard against newer aiobotocore setting
_sessions = None(rather than{}) after__aexit__.Changes
s3fs/core.pyFix 1 — Close stale
_s3creatoron refresh: Beforeself._s3creator = s3creator, callawait old_creator.__aexit__(None, None, None)so all regional aiobotocore clients and their aiohttp sessions are closed cleanly.Fix 2 — Guard stale-session check: Change the stale-session refresh condition to:
sessions is not None— newer aiobotocore sets_sessions = Noneafter__aexit__; treatNoneas "not stale, just closed"sessions(non-empty) — an empty dict means no requests have been sent yet; not staleFix 3 — Serialize creator setup under
asyncio.Lock: Wrap the entire creator-setup path inasync with self._setup_lock. Concurrent tasks wait at the lock, then re-checkself._s3 is not Noneand reuse the already-created session rather than creating their own. This makes Fix 1 safe: the old creator is only closed once, by the task that wins the lock, before any new sessions are open.s3fs/tests/test_s3fs.pytest_stale_creator_closed_on_refresh— patches__aexit__on the old creator and asserts it was called duringset_session(refresh=True)docs/source/changelog.rst— Unreleased section added.Motivation
Discovered while integrating zarr v3's
FsspecStore.close()(zarr-developers/zarr-python#4003). zarr'sasyncio.gatherfires 4 concurrentFsspecStore.get()calls on the sameS3FileSysteminstance, triggering issue 3. Issues 1 and 2 produce residual unclosed-session warnings once the render completes.Related: aiobotocore/aiobotocore#866
Test
All existing tests pass. New test confirms
__aexit__is called on the old creator during refresh.