[Windows Only] Agent v5+ APPCRASH (Heap Corruption 0xc0000374) and TCP Socket Hang with Upstream Swarm

1. Environment

OS: Windows Server (This issue is exclusively observed on Windows)

Fluent Bit Agent: v5.x (Tested on 5.0.3+)

Fluent Bit Server (Upstream): v5.0.5 running behind a Docker Swarm VIP

Use Case: Heavy multiline application logs buffering to filesystem and forwarding to a central aggregator.

2. Description
There is a critical Windows-specific stability issue in the v5+ branch. The agent exhibits two distinct failure modes under moderate-to-heavy loads:

Issue A: Heap Corruption (APPCRASH)
The fluent-bit.exe service crashes silently. The Windows Event Viewer registers an APPCRASH event with exception code 0xc0000374 (Heap Corruption) in ntdll.dll. This is typically triggered when processing large multiline log blocks using the tail plugin combined with storage.type filesystem. When the crash occurs, in-flight memory chunks are lost.

Issue B: TCP Socket Hang (Deadlock)
When the upstream aggregator container (Docker Swarm) is restarted, the Windows agent completely freezes log transmission. The forward output plugin fails to detect the dropped connection (likely missing TCP RST from Swarm), keeps the "dead" socket open, and waits indefinitely for an ACK. The agent ignores net.io_timeout and net.keepalive_idle_timeout. Log flow only resumes after manually restarting the Windows service.

3. Agent Configuration to Reproduce

YAML

```
service:
  flush: 5
  storage.type: filesystem
  storage.path: "C:/fluent-bit/buffer/"
  storage.sync: normal
  storage.backlog.mem_limit: 50M
  storage.max_chunks_up: 256
  
pipeline:
  inputs:
    - name: tail
      path: "C:/AppLogs/*/*/*/*.log"
      tag: app.logs
      multiline.parser: custom_multiline
      read_from_head: false
      db: "C:/fluent-bit/buffer/tail.db"
      buffer_chunk_size: 5M
      buffer_max_size: 10M

  outputs:
    - name: forward
      match: "*"
      host: 10.0.0.100
      port: 24224
      require_ack_response: true
      retry_limit: no_retries
      net.connect_timeout: 5s
      net.io_timeout: 15s
      net.keepalive: on
      net.keepalive_idle_timeout: 10s
```
4. Expected Behavior

The tail plugin on Windows must handle large multiline logs without corrupting the heap memory (0xc0000374).

The forward plugin must strictly enforce net.io_timeout, forcefully close dead sockets when the upstream server goes away silently, and autonomously re-establish the connection without requiring a manual service restart.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Windows Only] Agent v5+ APPCRASH (Heap Corruption 0xc0000374) and TCP Socket Hang with Upstream Swarm #11798

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Windows Only] Agent v5+ APPCRASH (Heap Corruption 0xc0000374) and TCP Socket Hang with Upstream Swarm #11798

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions