fix(ipc): Add timeout to AwaitMessage to prevent indefinite blocking #401

Aman-Cool · 2026-01-25T09:37:54Z

Prevent IPC hangs during container startup

This PR fixes a long-standing reliability issue in urunc’s IPC handshake between create and start.

Previously, the IPC helper AwaitMessage() would block indefinitely while waiting for a Unix socket connection and message. If the peer process never connected — for example because containerd restarted, the urunc start process was OOM-killed, or the node was under heavy load — the waiting process would never exit. This resulted in orphaned urunc --reexec processes, containers stuck in ContainerCreating, and gradual resource leaks on the node, with no clear error reported.

The fix adds a bounded timeout to the IPC accept and read steps. When the expected message is not received in time, the process now exits with a clear error instead of hanging forever. This makes failed container startups deterministic and observable, while leaving the normal, successful startup path unchanged.

In short: container creation now either succeeds, fails, or times out — but it no longer gets stuck silently.

netlify · 2026-01-25T09:37:59Z

✅ Deploy Preview for urunc canceled.

Name	Link
🔨 Latest commit	`247d9be`
🔍 Latest deploy log	https://app.netlify.com/projects/urunc/deploys/6975e57460b1a80008a2665f

- Add IPCAcceptTimeout (60s) and IPCReadTimeout (10s) to prevent orphaned processes when counterpart never connects - Fix closure bug in executeHooksConcurrently using wrong loop variable - Fix isRunning() using annotType instead of annotHypervisor - Add tests for timeout and wrong message handling Signed-off-by: Aman-Cool <aman017102007@gmail.com>

Aman-Cool · 2026-01-25T09:44:33Z

This adds reasonable IPC timeouts so urunc doesn’t hang indefinitely during create/start, making failures safer and easier to recover from.

cmainas · 2026-01-26T08:40:15Z

Hello @Aman-Cool ,

thank you for this contribution. Please create an issue before opening a PR. Have you encountered such an issue you describe? Are there any steps to reproduce it?

The waiting of the reexec process is a container runtime design choice. I am not negative to adding a timeout but I think we need to search a bit more on how other container runtimes handle such cases and what would be a reasonable timeout.

Aman-Cool · 2026-01-26T12:11:49Z

Thanks @cmainas for the feedback.
I agree it’s worth looking at how other runtimes approach IPC handshakes, but I want to clarify my perspective on the timeout itself. The intent here isn’t to tune a performance parameter, but to avoid an unbounded wait in a failure path. In the scenarios I’ve observed (e.g. peer process never connecting due to restart or termination), an infinite block results in leaked processes and stuck container state, whereas a conservative timeout makes the failure explicit and recoverable.
I’ll open an issue to document the problem and the conditions under which this occurs, and we can use that as a place to discuss whether the timeout should be configurable or adjusted further.

Aman-Cool force-pushed the fix/ipc-timeout-prevent-hanging branch from d84f485 to 247d9be Compare January 25, 2026 09:42

cmainas added the ok-to-test label Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ipc): Add timeout to AwaitMessage to prevent indefinite blocking #401

fix(ipc): Add timeout to AwaitMessage to prevent indefinite blocking #401

Uh oh!

Aman-Cool commented Jan 25, 2026

Uh oh!

netlify bot commented Jan 25, 2026 •

edited

Loading

Uh oh!

Aman-Cool commented Jan 25, 2026

Uh oh!

cmainas commented Jan 26, 2026

Uh oh!

Aman-Cool commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(ipc): Add timeout to AwaitMessage to prevent indefinite blocking #401

Are you sure you want to change the base?

fix(ipc): Add timeout to AwaitMessage to prevent indefinite blocking #401

Uh oh!

Conversation

Aman-Cool commented Jan 25, 2026

Prevent IPC hangs during container startup

Uh oh!

netlify bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for urunc canceled.

Uh oh!

Aman-Cool commented Jan 25, 2026

Uh oh!

cmainas commented Jan 26, 2026

Uh oh!

Aman-Cool commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Jan 25, 2026 •

edited

Loading