[BUG] MCP Streamable HTTP Server: CLOSE-WAIT socket leak exhausts Tomcat thread pool when clients disconnect

## Description

When using `spring-ai-starter-mcp-server-webmvc` with Streamable HTTP transport, the server does not properly release sockets and Tomcat threads after an MCP client disconnects. Connections accumulate in TCP `CLOSE-WAIT` state indefinitely, eventually exhausting the Tomcat thread pool and making the server completely unresponsive — including to health checks.

## Environment

- **Spring AI**: 1.1.4
- **Spring Boot**: 3.4.1
- **Java**: JDK 25
- **Server**: Tomcat 10.1.34 (WAR deployment)
- **OS**: Linux (Kubernetes, 2 CPU / 4 GB)

## Configuration

```yaml
spring:
  ai:
    mcp:
      server:
        type: SYNC
        protocol: STREAMABLE
        streamable-http:
          mcp-endpoint: /mcp
          keep-alive-interval: 0s
```

## Steps to Reproduce

1. Deploy an MCP Server using `spring-ai-starter-mcp-server-webmvc` with Streamable HTTP transport
2. Have an external MCP client send `POST /mcp` requests (initialize → tools/call)
3. Client receives the response and closes the TCP connection (sends FIN)
4. Repeat at moderate frequency (e.g., 2-3 requests/second from a load balancer)
5. After ~60 seconds, observe socket states on the server

## Observed Behavior

```bash
$ ss -tlnp | grep 8080
LISTEN 151    150    *:8080    *:*     # backlog full

$ ss -tnp | grep 8080 | head -5
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:47140
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:42756
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:47446
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:50138
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:43160

$ ss -tnp | grep 8080 | wc -l
150   # all threads consumed

$ curl --max-time 5 http://localhost:8080/health
curl: (28) Failed to connect to localhost port 8080: Connection timed out
```

- All 150 connections are in `CLOSE-WAIT` (client sent FIN, server never called close)
- All connections come from the same upstream load balancer IP
- Tomcat thread pool is fully exhausted — no new requests can be processed
- The application itself started successfully (`Started in 12.6 seconds`)

## Expected Behavior

When a client closes the TCP connection, the MCP server transport should:

1. Detect the peer shutdown (IOException on write or read returning -1)
2. Complete or cancel the Servlet async context
3. Close the server-side socket
4. Release the Tomcat thread back to the pool
5. Remove the session from the internal session map

## Root Cause Analysis

The Streamable HTTP transport appears to hold open an SSE-style async response for each MCP session. When the client disconnects:

- The `AsyncContext` is never completed or timed out
- The server-side `OutputStream`/`Sink` has no subscriber check
- The Servlet container keeps the thread allocated to the async response
- The OS socket enters CLOSE-WAIT because the server process never calls `close()`

## Impact

- **Severity: Critical** in production with active MCP clients
- Server becomes completely unresponsive within seconds under moderate load
- K8s rolling deployments fail because new pods are immediately flooded by retrying clients
- No automatic recovery without restart + stopping upstream traffic

## Workaround

Configure Tomcat to force-close idle connections:

```yaml
server:
  tomcat:
    connection-timeout: 30000
    keep-alive-timeout: 30000
    max-connections: 200
    threads:
      max: 200
```

This allows Tomcat to reclaim stuck connections after 30 seconds, but is not a proper fix.

## Suggested Fix

Register an `AsyncListener` on the servlet async context when opening the SSE stream:

```java
asyncContext.addListener(new AsyncListener() {
    @Override
    public void onComplete(AsyncEvent event) { cleanupSession(sessionId); }
    
    @Override
    public void onTimeout(AsyncEvent event) { cleanupSession(sessionId); }
    
    @Override
    public void onError(AsyncEvent event) { cleanupSession(sessionId); }
    
    @Override
    public void onStartAsync(AsyncEvent event) { }
});
```

Additionally, set a reasonable `asyncTimeout` on the context (e.g., 5 minutes) so that abandoned sessions are eventually reclaimed even without explicit client disconnect detection.

## Related

This bug is related to (but distinct from) a KeepAliveScheduler issue in the MCP Java SDK where dead sessions are never evicted after ping failures. I have filed that separately at modelcontextprotocol/java-sdk.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] MCP Streamable HTTP Server: CLOSE-WAIT socket leak exhausts Tomcat thread pool when clients disconnect #6384

Description

Environment

Configuration

Steps to Reproduce

Observed Behavior

Expected Behavior

Root Cause Analysis

Impact

Workaround

Suggested Fix

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] MCP Streamable HTTP Server: CLOSE-WAIT socket leak exhausts Tomcat thread pool when clients disconnect #6384

Description

Description

Environment

Configuration

Steps to Reproduce

Observed Behavior

Expected Behavior

Root Cause Analysis

Impact

Workaround

Suggested Fix

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions