- Environment
OS: Windows Server (This issue is exclusively observed on Windows)
Fluent Bit Agent: v5.x (Tested on 5.0.3+)
Fluent Bit Server (Upstream): v5.0.5 running behind a Docker Swarm VIP
Use Case: Heavy multiline application logs buffering to filesystem and forwarding to a central aggregator.
- Description
There is a critical Windows-specific stability issue in the v5+ branch. The agent exhibits two distinct failure modes under moderate-to-heavy loads:
Issue A: Heap Corruption (APPCRASH)
The fluent-bit.exe service crashes silently. The Windows Event Viewer registers an APPCRASH event with exception code 0xc0000374 (Heap Corruption) in ntdll.dll. This is typically triggered when processing large multiline log blocks using the tail plugin combined with storage.type filesystem. When the crash occurs, in-flight memory chunks are lost.
Issue B: TCP Socket Hang (Deadlock)
When the upstream aggregator container (Docker Swarm) is restarted, the Windows agent completely freezes log transmission. The forward output plugin fails to detect the dropped connection (likely missing TCP RST from Swarm), keeps the "dead" socket open, and waits indefinitely for an ACK. The agent ignores net.io_timeout and net.keepalive_idle_timeout. Log flow only resumes after manually restarting the Windows service.
- Agent Configuration to Reproduce
YAML
service:
flush: 5
storage.type: filesystem
storage.path: "C:/fluent-bit/buffer/"
storage.sync: normal
storage.backlog.mem_limit: 50M
storage.max_chunks_up: 256
pipeline:
inputs:
- name: tail
path: "C:/AppLogs/*/*/*/*.log"
tag: app.logs
multiline.parser: custom_multiline
read_from_head: false
db: "C:/fluent-bit/buffer/tail.db"
buffer_chunk_size: 5M
buffer_max_size: 10M
outputs:
- name: forward
match: "*"
host: 10.0.0.100
port: 24224
require_ack_response: true
retry_limit: no_retries
net.connect_timeout: 5s
net.io_timeout: 15s
net.keepalive: on
net.keepalive_idle_timeout: 10s
- Expected Behavior
The tail plugin on Windows must handle large multiline logs without corrupting the heap memory (0xc0000374).
The forward plugin must strictly enforce net.io_timeout, forcefully close dead sockets when the upstream server goes away silently, and autonomously re-establish the connection without requiring a manual service restart.
OS: Windows Server (This issue is exclusively observed on Windows)
Fluent Bit Agent: v5.x (Tested on 5.0.3+)
Fluent Bit Server (Upstream): v5.0.5 running behind a Docker Swarm VIP
Use Case: Heavy multiline application logs buffering to filesystem and forwarding to a central aggregator.
There is a critical Windows-specific stability issue in the v5+ branch. The agent exhibits two distinct failure modes under moderate-to-heavy loads:
Issue A: Heap Corruption (APPCRASH)
The fluent-bit.exe service crashes silently. The Windows Event Viewer registers an APPCRASH event with exception code 0xc0000374 (Heap Corruption) in ntdll.dll. This is typically triggered when processing large multiline log blocks using the tail plugin combined with storage.type filesystem. When the crash occurs, in-flight memory chunks are lost.
Issue B: TCP Socket Hang (Deadlock)
When the upstream aggregator container (Docker Swarm) is restarted, the Windows agent completely freezes log transmission. The forward output plugin fails to detect the dropped connection (likely missing TCP RST from Swarm), keeps the "dead" socket open, and waits indefinitely for an ACK. The agent ignores net.io_timeout and net.keepalive_idle_timeout. Log flow only resumes after manually restarting the Windows service.
YAML
The tail plugin on Windows must handle large multiline logs without corrupting the heap memory (0xc0000374).
The forward plugin must strictly enforce net.io_timeout, forcefully close dead sockets when the upstream server goes away silently, and autonomously re-establish the connection without requiring a manual service restart.