Summary
We are seeing long-running HTTP MCP backend tool calls fail at around 120 seconds even when the gateway and caller are configured to allow much longer execution.
This appears to be caused by a hardcoded overall HTTP client timeout in the HTTP backend transport path, which is distinct from the documented and configurable gateway toolTimeout and per-server connect_timeout.
In practice, this means:
toolTimeout can be set higher, but backend HTTP tool calls still fail around 120s.
connect_timeout is not sufficient because it only covers transport setup / connect behavior.
- Callers see transport errors like
Client.Timeout exceeded while awaiting headers even though the backend may still be processing the request.
Observed behavior
From a real workflow using an HTTP MCP backend:
- the backend tool call ran long enough to exceed the transport timeout
- the caller failed with an HTTP client timeout
- the backend then treated the request as disconnected and canceled the work
Representative error:
MCP error -32005: Post "http://host.docker.internal:8000/mcp": context deadline exceeded
(Client.Timeout exceeded while awaiting headers)
This was initially confusing because the surrounding system had larger time budgets configured, but the failure still occurred near the transport layer.
Why this looks like an mcpg issue
There are three different timeout concepts involved:
-
Gateway toolTimeout
- This is documented and configurable.
- Default appears to be 60s.
-
Per-server HTTP connect_timeout
- This is documented and configurable.
- It applies to HTTP transport establishment / fallback attempts.
-
Overall HTTP request timeout for HTTP backends
- This appears to be hardcoded to 120s in the HTTP client used by
NewHTTPConnection.
- This is the timeout that seems to terminate long-running backend tool calls.
The key problem is that item 3 is currently the real ceiling for long-running HTTP tool calls, regardless of larger configured toolTimeout values.
Relevant implementation details
In NewHTTPConnection, the HTTP client is created with a fixed overall timeout:
httpClient := &http.Client{
Timeout: 120 * time.Second, // Overall request timeout
Transport: &http.Transport{
DialContext: (&net.Dialer{
Timeout: connectTimeout,
}).DialContext,
...
ResponseHeaderTimeout: connectTimeout,
},
}
There is already explicit config support for:
- gateway
toolTimeout
- server
connect_timeout
And the code/comments already distinguish connect_timeout from the HTTP client's overall timeout.
There are also tests that explicitly acknowledge the 120s client timeout, for example an integration timeout test comment that says it is skipped because "the HTTP client has a 120s timeout".
That makes this look like current transport behavior rather than caller-side misconfiguration.
Why connect_timeout is not the fix
The docs and code indicate that connect_timeout is for transport setup and fallback attempts:
- streamable HTTP connect
- SSE connect
- plain JSON fallback connection behavior
It is not an end-to-end execution timeout for a long-running tools/call request after the transport is already established.
So increasing connect_timeout does not address the main issue for long-running HTTP tool execution.
Impact
This affects any HTTP MCP backend that can legitimately take more than about 120 seconds to return a result, including:
- large-repo semantic/query backends
- indexing/query services
- long-running analysis tools
- backends that stream late or do significant preprocessing before sending headers
The net effect is that:
- callers cannot rely on configured
toolTimeout for HTTP backends
- long-running tools fail with transport-level timeout errors
- backend services may continue work until they notice the disconnect
- diagnosing the problem is difficult because the failure looks like a backend/tool issue, but the hard limit is actually in the gateway transport
Example failure
Example workflow run:
This run is useful because it shows that:
- the repo-mind HTTP backend started successfully
- the MCP gateway connected successfully to the backend
- the later failure happened during the actual backend tool call, not during startup
Relevant evidence from the run:
✓ repo-mind: connected
⏱️ TIMING: Server check for repo-mind took 60ms
✗ search (MCP: repo-mind) · 401 unauthorized pull request public repository signed out unauthentica…
└ MCP server 'repo-mind': McpError: MCP error -32005: calling "tools/call": sending "tools/call":
rejected by transport: Post "http://host.docker.internal:8000/mcp": context deadline exceeded
(Client.Timeout exceeded while awaiting headers)
And from the MCP gateway summary for the same run:
- 🔍 rpc **repo-mind**→`tools/call` `search`
- 🔍 rpc **repo-mind**←`resp` ⚠️`calling "tools/call": sending "tools/call": rejected by transport: Post "http://host.docker.internal:8000/mcp": context deadline exceeded (Client.Timeout exceeded while awaiting headers)`
This is the important part: the backend was already connected, tools were already registered, and the failure still occurred during tools/call. That points at the execution/request timeout path, not the connection timeout path.
Expected behavior
One of the following should be true:
-
The HTTP backend request timeout should be derived from gateway toolTimeout.
- If
toolTimeout is 600s, the underlying HTTP request should be allowed to run that long.
-
The HTTP backend request timeout should be separately configurable.
- For example, a per-server
request_timeout or gateway-level httpRequestTimeout.
-
At minimum, the effective timeout behavior should be documented clearly.
- If a hardcoded 120s limit is intended, that should be explicit in docs because it overrides the practical usefulness of larger
toolTimeout values for HTTP backends.
Actual behavior
HTTP backends appear to be capped by a hardcoded 120s client timeout even when:
- the caller has a larger tool budget
- the gateway
toolTimeout is larger
- the backend is still actively processing the request
Suggested fix
Preferred:
- Make the HTTP client request timeout configurable instead of hardcoded.
Reasonable options:
- Derive the HTTP client timeout from gateway
toolTimeout.
- Add a new explicit timeout field for HTTP backend request execution.
- If both exist, use a clear precedence rule.
- Consider relying on request context deadlines consistently instead of a fixed
http.Client.Timeout when possible.
The cleanest model seems to be:
connect_timeout: connection/setup/fallback
toolTimeout: gateway/tool execution budget
- HTTP request timeout: either derived from
toolTimeout or configurable explicitly, but not silently hardcoded to 120s
Potential acceptance criteria
- HTTP MCP backend tools can run longer than 120s when gateway/tool configuration allows it.
- A configured larger
toolTimeout is actually honored for HTTP backends.
- There is test coverage showing a long-running HTTP backend request is not cut off at 120s when configuration permits it.
- Documentation clearly distinguishes:
- startup timeout
- connect timeout
- tool execution timeout
- HTTP backend request timeout
Minimal reproduction idea
- Start
gh-aw-mcpg with an HTTP backend server.
- Expose a tool that intentionally sleeps for more than 120 seconds before returning.
- Configure a gateway
toolTimeout larger than 120 seconds.
- Invoke the tool through the gateway.
- Observe that the request still fails around 120 seconds with an HTTP client timeout.
Additional context
This came up while debugging a real GitHub Actions workflow that already had larger tool budgets configured. The workflow-side timeouts were not the limiting factor. The failure was due to the transport ceiling in the HTTP backend path.
If useful, I can also provide:
- the full caller and backend log excerpts from the workflow run
- a concrete repro server
- a proposed patch direction for threading the timeout through configuration and transport creation
Summary
We are seeing long-running HTTP MCP backend tool calls fail at around 120 seconds even when the gateway and caller are configured to allow much longer execution.
This appears to be caused by a hardcoded overall HTTP client timeout in the HTTP backend transport path, which is distinct from the documented and configurable gateway
toolTimeoutand per-serverconnect_timeout.In practice, this means:
toolTimeoutcan be set higher, but backend HTTP tool calls still fail around 120s.connect_timeoutis not sufficient because it only covers transport setup / connect behavior.Client.Timeout exceeded while awaiting headerseven though the backend may still be processing the request.Observed behavior
From a real workflow using an HTTP MCP backend:
Representative error:
This was initially confusing because the surrounding system had larger time budgets configured, but the failure still occurred near the transport layer.
Why this looks like an mcpg issue
There are three different timeout concepts involved:
Gateway
toolTimeoutPer-server HTTP
connect_timeoutOverall HTTP request timeout for HTTP backends
NewHTTPConnection.The key problem is that item 3 is currently the real ceiling for long-running HTTP tool calls, regardless of larger configured
toolTimeoutvalues.Relevant implementation details
In
NewHTTPConnection, the HTTP client is created with a fixed overall timeout:There is already explicit config support for:
toolTimeoutconnect_timeoutAnd the code/comments already distinguish
connect_timeoutfrom the HTTP client's overall timeout.There are also tests that explicitly acknowledge the 120s client timeout, for example an integration timeout test comment that says it is skipped because "the HTTP client has a 120s timeout".
That makes this look like current transport behavior rather than caller-side misconfiguration.
Why
connect_timeoutis not the fixThe docs and code indicate that
connect_timeoutis for transport setup and fallback attempts:It is not an end-to-end execution timeout for a long-running
tools/callrequest after the transport is already established.So increasing
connect_timeoutdoes not address the main issue for long-running HTTP tool execution.Impact
This affects any HTTP MCP backend that can legitimately take more than about 120 seconds to return a result, including:
The net effect is that:
toolTimeoutfor HTTP backendsExample failure
Example workflow run:
This run is useful because it shows that:
Relevant evidence from the run:
And from the MCP gateway summary for the same run:
This is the important part: the backend was already connected, tools were already registered, and the failure still occurred during
tools/call. That points at the execution/request timeout path, not the connection timeout path.Expected behavior
One of the following should be true:
The HTTP backend request timeout should be derived from gateway
toolTimeout.toolTimeoutis 600s, the underlying HTTP request should be allowed to run that long.The HTTP backend request timeout should be separately configurable.
request_timeoutor gateway-levelhttpRequestTimeout.At minimum, the effective timeout behavior should be documented clearly.
toolTimeoutvalues for HTTP backends.Actual behavior
HTTP backends appear to be capped by a hardcoded 120s client timeout even when:
toolTimeoutis largerSuggested fix
Preferred:
Reasonable options:
toolTimeout.http.Client.Timeoutwhen possible.The cleanest model seems to be:
connect_timeout: connection/setup/fallbacktoolTimeout: gateway/tool execution budgettoolTimeoutor configurable explicitly, but not silently hardcoded to 120sPotential acceptance criteria
toolTimeoutis actually honored for HTTP backends.Minimal reproduction idea
gh-aw-mcpgwith an HTTP backend server.toolTimeoutlarger than 120 seconds.Additional context
This came up while debugging a real GitHub Actions workflow that already had larger tool budgets configured. The workflow-side timeouts were not the limiting factor. The failure was due to the transport ceiling in the HTTP backend path.
If useful, I can also provide: