Fix Octopus Direct: refresh JWT on HTTP 401/403 (intelligent go slots)#3833
Fix Octopus Direct: refresh JWT on HTTP 401/403 (intelligent go slots)#3833springfall2008 merged 2 commits intomainfrom
Conversation
When the Octopus API returns a bare HTTP 401 or 403 (e.g. while polling intelligent go dispatch slots), the existing token-refresh logic was bypassed because it only checked the *response body* for Kraken GraphQL error codes. A bare HTTP auth error produced a None body, so the stale token was never refreshed and the error looped until an add-on restart. Fix: check response.status for 401/403 before reading the body in async_graphql_query, and force a token refresh + single retry on the first attempt (_retry_count == 0), mirroring the existing GraphQL-body error-code path. Add Tests 10 and 10b to test_octopus_rate_limit.py covering the new HTTP-status-level auth-error path and a failed-refresh edge case. Move the test summary outside the for/else block so it covers all tests. Agent-Logs-Url: https://github.com/springfall2008/batpred/sessions/b41b0847-5974-414f-bf29-b2623e36cf30 Co-authored-by: springfall2008 <48591903+springfall2008@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes Octopus Direct GraphQL polling getting stuck after transport-level auth failures by refreshing the JWT on bare HTTP 401/403 responses and retrying once.
Changes:
- Update
OctopusAPI.async_graphql_queryto detect HTTP 401/403 before reading/parsing the response body, then force token refresh + single retry. - Add regression tests covering the new HTTP-status auth failure path (401 and 403) and the “refresh fails” case; adjust test summary placement to always run.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| apps/predbat/octopus.py | Adds HTTP-status-level auth failure handling (401/403) with forced refresh + single retry in async_graphql_query. |
| apps/predbat/tests/test_octopus_rate_limit.py | Adds new tests (10/10b) for 401/403 refresh+retry behavior and ensures the summary executes for all outcomes. |
| async with client.post(url, json=payload, headers=headers) as response: | ||
| # Check for HTTP-level 401/403 (transport-level auth failure) and retry once. | ||
| # This handles cases where the JWT has been revoked server-side and the server | ||
| # returns a bare 401/403 status rather than a GraphQL error body — which would | ||
| # otherwise loop forever without ever refreshing the token. | ||
| if response.status in [401, 403] and _retry_count == 0: | ||
| self.log(f"OctopusAPI: HTTP {response.status} for graphql query {request_context}, forcing token refresh and retry") | ||
| record_api_call("octopus", False, "auth_error") | ||
| self.graphql_token = None | ||
| retry_token = await self.async_refresh_token() | ||
| if retry_token is None: | ||
| self.failures_total += 1 | ||
| self.log(f"Warn: OctopusAPI: Failed to refresh token for retry of graphql query {request_context}") | ||
| return None | ||
| return await self.async_graphql_query(query, request_context, returns_data=returns_data, ignore_errors=ignore_errors, _retry_count=1, use_backend=use_backend) |
There was a problem hiding this comment.
The new 401/403 retry path awaits the recursive retry call while still inside the async with client.post(...) as response: block. That means the original response isn't released back to the aiohttp connection pool until after the retry finishes, which can temporarily tie up a connection slot (and can become problematic if connector limits are low). Consider releasing/closing the response (or deferring the retry until after the async with exits) before awaiting the retry request.
When the Octopus API returns a bare HTTP 401/403 (no GraphQL body),
async_graphql_querywould silently returnNonewithout refreshing the JWT. The existing refresh logic only triggered on Kraken GraphQL error codes in the response body — a transport-level auth failure bypassed it entirely, causing every subsequent poll (e.g. intelligent go dispatch slots) to fail identically until an add-on restart.Changes
octopus.py—async_graphql_query: Checkresponse.statusfor 401/403 before reading the body. On the first attempt, clear the token, force a refresh, and retry once — mirroring the existing GraphQL-body error-code path:test_octopus_rate_limit.py: Added Tests 10 and 10b covering the HTTP-status-level auth-error path (both 401 and 403 trigger refresh+retry; failed refresh returnsNoneand incrementsfailures_total). Also moved the test summary block outside thefor/elseso it covers all tests.Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
gitlab.com/usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin REDACTED(dns block)https://api.github.com/repos/springfall2008/batpred/contents/apps/predbat/home/REDACTED/work/batpred/batpred/coverage/venv/bin/python3 python3 ../apps/predbat/unit_test.py --test octopus_rate_limit(http block)If you need me to access, download, or install something from one of these locations, you can either: