Skip to content

[BUG] MCP Streamable HTTP Server: CLOSE-WAIT socket leak exhausts Tomcat thread pool when clients disconnect #6384

@lxq19991111

Description

@lxq19991111

Description

When using spring-ai-starter-mcp-server-webmvc with Streamable HTTP transport, the server does not properly release sockets and Tomcat threads after an MCP client disconnects. Connections accumulate in TCP CLOSE-WAIT state indefinitely, eventually exhausting the Tomcat thread pool and making the server completely unresponsive — including to health checks.

Environment

  • Spring AI: 1.1.4
  • Spring Boot: 3.4.1
  • Java: JDK 25
  • Server: Tomcat 10.1.34 (WAR deployment)
  • OS: Linux (Kubernetes, 2 CPU / 4 GB)

Configuration

spring:
  ai:
    mcp:
      server:
        type: SYNC
        protocol: STREAMABLE
        streamable-http:
          mcp-endpoint: /mcp
          keep-alive-interval: 0s

Steps to Reproduce

  1. Deploy an MCP Server using spring-ai-starter-mcp-server-webmvc with Streamable HTTP transport
  2. Have an external MCP client send POST /mcp requests (initialize → tools/call)
  3. Client receives the response and closes the TCP connection (sends FIN)
  4. Repeat at moderate frequency (e.g., 2-3 requests/second from a load balancer)
  5. After ~60 seconds, observe socket states on the server

Observed Behavior

$ ss -tlnp | grep 8080
LISTEN 151    150    *:8080    *:*     # backlog full

$ ss -tnp | grep 8080 | head -5
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:47140
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:42756
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:47446
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:50138
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:43160

$ ss -tnp | grep 8080 | wc -l
150   # all threads consumed

$ curl --max-time 5 http://localhost:8080/health
curl: (28) Failed to connect to localhost port 8080: Connection timed out
  • All 150 connections are in CLOSE-WAIT (client sent FIN, server never called close)
  • All connections come from the same upstream load balancer IP
  • Tomcat thread pool is fully exhausted — no new requests can be processed
  • The application itself started successfully (Started in 12.6 seconds)

Expected Behavior

When a client closes the TCP connection, the MCP server transport should:

  1. Detect the peer shutdown (IOException on write or read returning -1)
  2. Complete or cancel the Servlet async context
  3. Close the server-side socket
  4. Release the Tomcat thread back to the pool
  5. Remove the session from the internal session map

Root Cause Analysis

The Streamable HTTP transport appears to hold open an SSE-style async response for each MCP session. When the client disconnects:

  • The AsyncContext is never completed or timed out
  • The server-side OutputStream/Sink has no subscriber check
  • The Servlet container keeps the thread allocated to the async response
  • The OS socket enters CLOSE-WAIT because the server process never calls close()

Impact

  • Severity: Critical in production with active MCP clients
  • Server becomes completely unresponsive within seconds under moderate load
  • K8s rolling deployments fail because new pods are immediately flooded by retrying clients
  • No automatic recovery without restart + stopping upstream traffic

Workaround

Configure Tomcat to force-close idle connections:

server:
  tomcat:
    connection-timeout: 30000
    keep-alive-timeout: 30000
    max-connections: 200
    threads:
      max: 200

This allows Tomcat to reclaim stuck connections after 30 seconds, but is not a proper fix.

Suggested Fix

Register an AsyncListener on the servlet async context when opening the SSE stream:

asyncContext.addListener(new AsyncListener() {
    @Override
    public void onComplete(AsyncEvent event) { cleanupSession(sessionId); }
    
    @Override
    public void onTimeout(AsyncEvent event) { cleanupSession(sessionId); }
    
    @Override
    public void onError(AsyncEvent event) { cleanupSession(sessionId); }
    
    @Override
    public void onStartAsync(AsyncEvent event) { }
});

Additionally, set a reasonable asyncTimeout on the context (e.g., 5 minutes) so that abandoned sessions are eventually reclaimed even without explicit client disconnect detection.

Related

This bug is related to (but distinct from) a KeepAliveScheduler issue in the MCP Java SDK where dead sessions are never evicted after ping failures. I have filed that separately at modelcontextprotocol/java-sdk.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions