GUACAMOLE-2265: Fix worker process leak after client disconnect.#662
Open
escra wants to merge 1 commit intoapache:staging/1.6.1from
Open
GUACAMOLE-2265: Fix worker process leak after client disconnect.#662escra wants to merge 1 commit intoapache:staging/1.6.1from
escra wants to merge 1 commit intoapache:staging/1.6.1from
Conversation
Worker processes may remain alive indefinitely after all users have disconnected, particularly under resource pressure (e.g. cgroup pids-limit). The root cause is threefold: 1. guacd_proc_add_user() ignores the pthread_create() return value. When thread creation fails (EAGAIN), the user thread never runs, so guacd_proc_stop() is never called and the worker blocks in recvmsg() forever. 2. The worker main loop blocks indefinitely in guacd_recv_fd() with no timeout or health check. If guacd_proc_stop() is not triggered for any reason, the worker hangs at 0% CPU consuming a PID slot. 3. The RDP fail path in guac_rdp_handle_connection() does not clean up the display, FreeRDP instance, keyboard, or SVC list, leaking resources on every failed connection attempt. Fix: - Handle pthread_create() failure by closing the FD, freeing params, and calling guacd_proc_stop(). - Replace the bare recvmsg() loop with a poll()-based loop that checks client state every 5 seconds. Workers exit when the client has stopped or all users have disconnected. An absolute safety timeout (30s) handles edge cases where cleanup is blocked. - Add resource cleanup to the fail label in guac_rdp_handle_connection().
Contributor
|
@escra I believe the issue you're describing here may be a duplicate of GUACAMOLE-2118, and your fix may address that issue. Please check on that issue and see if it matches, and re-tag your issue as GUACAMOLE-2118. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Worker processes may remain alive indefinitely after all users have
disconnected, particularly under resource pressure (e.g. cgroup pids-limit).
Over days/weeks this leads to steady PID accumulation until no new connections
can be created.
Root causes
1.
pthread_create()failure ignored inguacd_proc_add_user()When thread creation fails with EAGAIN (common under cgroup pids-limit), the
user thread never runs, so
guacd_proc_stop()is never called and the workerblocks in
recvmsg()forever.2. Worker main loop blocks indefinitely in
recvmsg()The loop in
guacd_exec_proc()callsguacd_recv_fd()with no timeout orhealth check. If
guacd_proc_stop()is not triggered for any reason, theworker hangs at 0% CPU consuming a PID slot indefinitely.
3. RDP fail path does not clean up resources
guac_rdp_handle_connection()does not free the display, FreeRDP instance,keyboard, or SVC list on the fail path, leaking resources on every failed
connection attempt.
Fix
pthread_create()failure by closing the FD, freeing params, andcalling
guacd_proc_stop()recvmsg()loop with apoll()-based loop that checksclient state every 5 seconds (
GUACD_PROC_IDLE_TIMEOUT_MS). Workers exitwhen the client has stopped or all users have disconnected. An absolute
safety timeout (
GUACD_PROC_MAX_IDLE_MS= 30s) handles edge cases wherethe normal exit path is blocked
faillabel inguac_rdp_handle_connection()JIRA
GUACAMOLE-2265
Related
Sporadic hanging connections after upgrade
Improve process management to prevent zombie process accumulation
Test plan
pids.max=50, create 60 concurrentconnections, verify workers exit after disconnect
leak via
/proc/[pid]/fdcountreturns to baseline