Skip to content

Don't rebuild HTTP session with no connections#1020

Merged
martindurant merged 8 commits into
fsspec:mainfrom
martindurant:no_rebuild
Jun 4, 2026
Merged

Don't rebuild HTTP session with no connections#1020
martindurant merged 8 commits into
fsspec:mainfrom
martindurant:no_rebuild

Conversation

@martindurant

@martindurant martindurant commented Apr 20, 2026

Copy link
Copy Markdown
Member

Fixes #1019 Fixes #1025

Closes #1028

@martindurant

Copy link
Copy Markdown
Member Author

@Koncopd , latest aiobotocore doesn't like that; but I'm not sure why we are getting an unstarted session the second time around.

@martindurant

Copy link
Copy Markdown
Member Author

Given that there isn't much cost to "rebuilding" the http_sessions without any actual live sessions, and that it only happens once, I think it's not worth trying to prob aiobotocore in the right direction here. What do you think, @Koncopd ?

@Koncopd

Koncopd commented Apr 20, 2026

Copy link
Copy Markdown

As I said, my problem is that rebuilding aiobotocore session (not http sessions) erases the credentials I pass. And if I provide a session, it should not happen that I just lose it on set_session.

@Koncopd

Koncopd commented Apr 20, 2026

Copy link
Copy Markdown

Here I mean

self.session = aiobotocore.session.AioSession(**self.kwargs)

@Koncopd

Koncopd commented Apr 20, 2026

Copy link
Copy Markdown

Btw the failing test (test_session_close) looks strange to me, it runs run_program(run) with True and False, but run parameter doesn’t seem to be passed anywhere inside run_program. I assume it just runs the same code, but fails on the second execution only, so the problem is probably due to caching.

@martindurant

Copy link
Copy Markdown
Member Author

it just runs the same code, but fails on the second execution only

Yes, that is exactly the idea. If I remember, the variable is there to allow a breakpoint in the function, where we know which invocation we are on.

@Koncopd

Koncopd commented Apr 22, 2026

Copy link
Copy Markdown

Thank you, i will try to check the problem later.

@lukedyer-uipath

Copy link
Copy Markdown

Hi — came at this from a perf regression we hit downstream (one of our test suites went from 49s to 173s on s3fs 2026.4.0). I put together a small repro that walks through the states the cached-session check can end up in, in case it's useful: https://github.com/lukedyer-uipath/s3fs-pr1002-repro

I worked through the diagnosis and built the linked repro with help from Claude Code, so any of the claims below are worth a sanity check before acting on — flagging that up front.

I might be misreading some of this, so please correct me, but it looks like there are a couple of states beyond the empty-dict case from #1019 that this PR's hsess._sessions and all(...) might not catch — sharing in case they shape the fix here:

_sessions is None. On aiobotocore 3.x, AIOHTTPSession.__aexit__ clears the dict and sets self._sessions = None, so after await client.close() the getattr(...).values() chain raises AttributeError. The repro from #1001 seems to land here on 3.x. With the hsess._sessions and all(...) guard the check would short-circuit cleanly, but I think the cached client (with its now-closed underlying session) would still be returned on the next call, so the failure might just shift downstream — happy to be wrong about that.

Closed inner sessions with _sessions intact. This is the one we're hitting: when client.close() runs on aiobotocore 2.x, or test fixtures (aiomoto in our case) close their managed clients on teardown, the dict is left populated with aiohttp.ClientSession objects whose .closed = True. The all(...) then fires True on every set_session(), so we get a fresh _create_client + botocore.load_service_model (~25ms disk read) per S3 op. cProfile on one of our tests:

                       s3fs 2026.1.0   s3fs 2026.4.0
set_session            50              164
_create_client         8 (84% hit)     164 (0% hit)
load_service_model     9               101 (~4.3s)

I don't have a confident view on the right shape of the fix — .closed is a reasonable proxy when the user has explicitly closed the client, but it's also true in cases where the cached client is still serviceable (the empty-dict case here, and the closed-but-loop-still-fine case above). A stricter signal might help (something that distinguishes "this client can't be used anymore" from "this client has done some work that's been cleaned up"), but I appreciate that's hard to extract without poking at aiohttp internals further than feels comfortable. Both feel like calls only you'd be in a position to make.

For what it's worth, we've sidestepped this downstream by replacing aiomoto with ThreadedMotoServer over real HTTP — no fixture closes inner sessions, so the check stays quiet. We're unblocked either way; just wanted to share what we found in case it's useful.

@martindurant

Copy link
Copy Markdown
Member Author

Iterating with the bot has produced this suggested code (rather verbose, of course), for discussion. The shape of it feels ok and, more importantly, the tests poke some of the conditions. This also folds in the anti-race asyncio lock.
@lukedyer-uipath @mark-boer you have thoughts?

@lukedyer-uipath

Copy link
Copy Markdown

This fixes the performance regression issue I was having. Thanks for taking a look into it!

A suggestion of an extra test to cover the niche performance issue I found 👇
def test_set_session_closed_sessions_rebuilds_once(s3):
    """Regression test for the #1019 perf regression: a populated-but-all-closed
    _sessions dict must rebuild the client exactly once, then reuse it. The
    rebuilt client has an empty _sessions dict, which must not count as
    "all closed" (vacuous all([]) == True) and force a refresh on every call.
    """
    import aiobotocore.session as aio_session
    from unittest import mock

    s3.pipe(f"{test_bucket_name}/dir/afile", b"small")

    create_client_calls = 0
    original_create_client = aio_session.AioSession._create_client

    async def counting_create_client(self, *args, **kwargs):
        nonlocal create_client_calls
        create_client_calls += 1
        return await original_create_client(self, *args, **kwargs)

    async def run():
        fs = S3FileSystem(
            anon=False,
            asynchronous=True,
            client_kwargs={"endpoint_url": endpoint_uri},
            skip_instance_cache=True,
        )
        await fs._ls(f"{test_bucket_name}/dir")  # populates _sessions
        sessions = fs._s3._endpoint.http_session._sessions
        assert sessions, "expected a populated _sessions dict after an op"

        for sess in sessions.values():
            await sess.close()

        baseline = create_client_calls  # ignore the warmup build above
        iterations = 10
        for _ in range(iterations):
            await fs.set_session()

        rebuilds = create_client_calls - baseline
        assert rebuilds == 1, (
            f"set_session rebuilt the client {rebuilds} times across {iterations} "
            f"calls; expected 1. >1 means the empty-dict case forces a refresh on "
            f"every call (#1019 regression)."
        )
        await fs.set_session(refresh=True)  # clean up

    with mock.patch.object(
        aio_session.AioSession, "_create_client", counting_create_client
    ):
        asyncio.run(run())

@martindurant martindurant merged commit 162f23e into fsspec:main Jun 4, 2026
16 checks passed
@martindurant martindurant deleted the no_rebuild branch June 4, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in s3fs s3fs>=2026.2.0 always rebuilds the session

3 participants