Skip to content

fix: prevent double-free of controlChannelRequest on connection timeout#342

Open
bdasnevesKP wants to merge 1 commit intoYlianst:masterfrom
bdasnevesKP:fix/controlchannel-double-free
Open

fix: prevent double-free of controlChannelRequest on connection timeout#342
bdasnevesKP wants to merge 1 commit intoYlianst:masterfrom
bdasnevesKP:fix/controlchannel-double-free

Conversation

@bdasnevesKP
Copy link
Copy Markdown

@bdasnevesKP bdasnevesKP commented Apr 1, 2026

Background

I'm not a C developer. I came across issue Ylianst/MeshCentral#7407 and the related issues #110, #151, and #281 about the agent leaking memory during connection retries. I used GitHub Copilot to help me trace through the code and identify the root cause.

Root Cause

In \meshcore/agentcore.c, \MeshServer_ConnectEx_NetworkError\ frees the \j\ pointer (which is the same as \agent->controlChannelRequest) but does not set \agent->controlChannelRequest = NULL\ afterward.

The issue is that \ILibWebClient_CancelRequest\ — called immediately after — executes synchronously on the chain thread (see the \ILibChain_RunOnMicrostackThread\ macro in \ILibParsers.h):

#define ILibChain_RunOnMicrostackThread(chain, handler, user) \
    if(ILibIsRunningOnChainThread(chain)==0){ \
        ILibChain_RunOnMicrostackThreadEx(chain, handler, user); \
    } else { \
        handler(chain,user);  // <-- synchronous when already on chain thread \
    }

This synchronous cancel internally calls \MeshServer_OnResponse\ with \ReceiveStatus_Complete. That function checks \agent->controlChannelRequest != NULL\ and calls \ILibMemory_Free(agent->controlChannelRequest)\ — on a pointer that was already freed moments earlier. This is a double-free, which corrupts the heap allocator's internal state and causes memory usage to grow continuously with every retry cycle (~every 20 seconds when the server is unreachable).

The Fix

One line added in \MeshServer_ConnectEx_NetworkError, right after \ILibMemory_Free(j):

agent->controlChannelRequest = NULL;

This ensures \MeshServer_OnResponse\ safely skips the cleanup block (it checks for != NULL\ first), preventing the double-free.

Caveats

  • I have not been able to compile or test this change — I don't have the build environment set up.
  • The logic analysis was done with AI assistance (GitHub Copilot). The reasoning is sound to the best of my understanding, but I'd strongly appreciate a review from someone familiar with the codebase before merging.
  • I'm happy to update or close this PR if the analysis is wrong.

When MeshServer_ConnectEx_NetworkError fires (20s connection timeout),
it frees j (==agent->controlChannelRequest) but does not set the pointer
to NULL. ILibWebClient_CancelRequest, called immediately after, executes
synchronously on the chain thread (ILibChain_RunOnMicrostackThread macro
calls the handler inline when already on the chain thread). This triggers
MeshServer_OnResponse(ReceiveStatus_Complete) which sees a non-NULL
controlChannelRequest and calls ILibMemory_Free on the already-freed
pointer, causing a double-free and heap corruption on every retry cycle.

Fix: set agent->controlChannelRequest = NULL right after ILibMemory_Free(j)
so MeshServer_OnResponse safely skips the cleanup block.

Relates to: Ylianst#110, Ylianst#151, Ylianst#281
Relates to: Ylianst/MeshCentral#7407

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@si458
Copy link
Copy Markdown
Collaborator

si458 commented Apr 28, 2026

this doesnt appear to fix the issue 😢
i build a test agent with this patch,
set 0.0.0.0 my.domain.com in the /etc/hosts file so i knew the machine would never connect
left it for 24 hours, the memory usage showed 300MB and was still increasing :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants