Skip to content

Use overlapped pipes in WSLC session terminate tests#40264

Draft
benhillis wants to merge 10 commits intofeature/wsl-for-appsfrom
fix/wslc-overlapped-pipe
Draft

Use overlapped pipes in WSLC session terminate tests#40264
benhillis wants to merge 10 commits intofeature/wsl-for-appsfrom
fix/wslc-overlapped-pipe

Conversation

@benhillis
Copy link
Copy Markdown
Member

@benhillis benhillis commented Apr 21, 2026

Problem

The LoadImage and ImportImage session-terminate sub-tests hang in the release pipeline, causing a 2-hour timeout. The tests pass in PR CI but fail consistently in the internal release pipeline on older Windows versions.

Root Cause

CreatePipe creates synchronous pipe handles. The relay's ReadHandle::Schedule() calls ReadFile with an OVERLAPPED structure, but on a synchronous handle this blocks the thread instead of returning ERROR_IO_PENDING. This prevents WaitForMultipleObjects in Run() from ever checking the session terminating event. When Terminate() signals the event and calls CancelIoEx, it fails because CancelIoEx is unreliable on synchronous handles across Windows versions. Terminate() then deadlocks trying to acquire an exclusive lock held by the blocked LoadImage call.

Fix

Replace CreatePipe with OpenAnonymousPipe(2, true, false) to create an overlapped read pipe, so ReadFile returns ERROR_IO_PENDING and the relay's event loop can detect the termination signal.

The "pipe close" sub-tests are left using CreatePipe since they work correctly — closing the write end causes ERROR_BROKEN_PIPE which the relay handles inline without needing WaitForMultipleObjects.

Copilot AI review requested due to automatic review settings April 21, 2026 18:13
@benhillis benhillis requested a review from a team as a code owner April 21, 2026 18:13
The LoadImage and ImportImage 'session terminate' sub-tests used CreatePipe
which creates synchronous pipe handles. When the relay calls ReadFile with an
OVERLAPPED structure on a synchronous handle, the call blocks the thread
instead of returning ERROR_IO_PENDING. This prevents WaitForMultipleObjects
from ever checking the session terminating event, causing a deadlock when
Terminate() is called.

Replace CreatePipe with CreateNamedPipeW (FILE_FLAG_OVERLAPPED) + CreateFileW
so ReadFile returns ERROR_IO_PENDING and the relay's event loop can detect the
termination signal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@benhillis benhillis force-pushed the fix/wslc-overlapped-pipe branch from 0134487 to a234ca0 Compare April 21, 2026 18:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates WSLC session termination tests to use overlapped-capable pipe handles so the relay’s event-loop cancellation path works reliably across Windows versions, avoiding hangs in release pipelines.

Changes:

  • Replace CreatePipe with CreateNamedPipeW(FILE_FLAG_OVERLAPPED) + CreateFileW in the LoadImage “session terminate” sub-test.
  • Do the same replacement in the ImportImage “session terminate” sub-test.
  • Add inline rationale describing why overlapped pipes are required for correct cancellation behavior.
Comments suppressed due to low confidence (2)

test/windows/WSLCTests.cpp:1049

  • After creating the named pipe server handle, the server end is never explicitly connected (ConnectNamedPipe/Helpers::ConnectPipe). On some Windows versions this can leave the pipe in a listening/disconnected state (ReadFile can fail with ERROR_PIPE_LISTENING, or the client CreateFile/WriteFile can block), reintroducing hangs/flakiness. Consider calling ConnectNamedPipe on the server handle after CreateFileW succeeds (treat ERROR_PIPE_CONNECTED as success), or use wsl::windows::common::helpers::ConnectPipe for the connection step before passing the handle to LoadImage.
            VERIFY_WIN32_BOOL_SUCCEEDED(WriteFile(pipeWrite.get(), "data", 4, &bytesWritten, nullptr));

            testCompleted.SetEvent();

test/windows/WSLCTests.cpp:1165

  • Same pattern as the LoadImage terminate test: the named pipe server handle isn’t explicitly connected (no ConnectNamedPipe/ConnectPipe). To avoid ERROR_PIPE_LISTENING / blocking behavior differences across Windows versions, connect the server end after CreateFileW returns (accept ERROR_PIPE_CONNECTED as success) before starting ImportImage.
    WSLC_TEST_METHOD(DeleteImage)
    {
        // Prepare alpine image to delete.

@benhillis benhillis assigned yao-msft and unassigned yao-msft Apr 21, 2026
Ben Hillis and others added 2 commits April 21, 2026 21:10
Move ResetTestSession() after get_future().get() to ensure the
operation thread's COM call has completed before destroying the proxy.

When Terminate() returns, the service has completed cleanup but the RPC
response for LoadImage/ImportImage may not have been delivered to the
client thread yet. Calling ResetTestSession() (which resets the COM
pointer) before the future is resolved can destroy the proxy while the
operation thread is still waiting for its RPC response, causing a
use-after-free on older Windows builds where timing differs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 22, 2026 04:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@OneBlue OneBlue marked this pull request as draft April 23, 2026 20:22
Copilot AI review requested due to automatic review settings April 23, 2026 21:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.

Comment thread test/windows/WSLCTests.cpp Outdated
Comment on lines 292 to 296
std::thread operationThread([&]() {
terminateResult.set_value(m_defaultSession->LoadImage(ToCOMInputHandle(pipeRead.get()), nullptr, 1024 * 1024));
WI_ASSERT(testCompleted.is_signaled());
});

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worker thread captures and dereferences m_defaultSession by reference; later in this block the main thread terminates/resets the session. To avoid any potential race/lifetime issues with the com_ptr while a COM call is in flight, capture a local com_ptr copy (e.g., auto session = m_defaultSession) and use that in the thread, and only reset m_defaultSession after the thread has joined.

Copilot uses AI. Check for mistakes.
Comment thread test/windows/CMakeLists.txt Outdated
Comment on lines +1 to +3
set(SOURCES
SimpleTests.cpp
UnitTests.cpp
MountTests.cpp
NetworkTests.cpp
Plan9Tests.cpp
DrvFsTests.cpp
Common.cpp
PluginTests.cpp
PolicyTests.cpp
InstallerTests.cpp
WSLCTests.cpp
WslcSdkTests.cpp)
WSLCTests.cpp)
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/windows/CMakeLists.txt now only builds Common.cpp and WSLCTests.cpp, dropping all other Windows test sources (UnitTests, NetworkTests, WslcSdkTests, etc.). This is a large behavior change and doesn’t match the PR description; if unintentional, please restore the full SOURCES list (or explain/move tests to a new target explicitly).

Copilot uses AI. Check for mistakes.
Comment on lines 33 to 35
add_dependencies(wsltests wslserviceidl wslclib wslc wslcsdk)
add_subdirectory(testplugin)
add_subdirectory(wslc)

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wslc test subdirectory is no longer included (add_subdirectory(wslc) was removed), so the CLI/unit tests under test/windows/wslc won’t be built or run. If the goal is only to adjust overlapped pipes in specific subtests, this should be reverted; otherwise the PR description should be updated and CI impact assessed.

Copilot uses AI. Check for mistakes.
Comment thread test/windows/Common.cpp Outdated
//

#define LXSS_WATCHDOG_TIMEOUT (3 * 60 * 60 * 1000)
#define LXSS_WATCHDOG_TIMEOUT (5 * 60 * 1000)
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the watchdog timeout is 3 hours, but the value was changed to 5 minutes. Either update the comment to match and justify the new global timeout (it may terminate legitimately long-running tests), or revert/choose a value consistent with the intended suite runtime.

Suggested change
#define LXSS_WATCHDOG_TIMEOUT (5 * 60 * 1000)
#define LXSS_WATCHDOG_TIMEOUT (3 * 60 * 60 * 1000)

Copilot uses AI. Check for mistakes.
Comment thread test/windows/WSLCTests.cpp Outdated
Comment on lines +224 to +228
WSLC_TEST_METHOD(LoadImage)
{
auto [hr, deletedImages] = DeleteImageNoThrow(Image, Flags);
VERIFY_SUCCEEDED(hr);

return std::move(deletedImages);
}
std::filesystem::path imageTar = GetTestImagePath("hello-world:latest");
wil::unique_handle imageTarFileHandle{
CreateFileW(imageTar.c_str(), GENERIC_READ, FILE_SHARE_READ, nullptr, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, nullptr)};
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description calls out both LoadImage and ImportImage session-terminate hangs, but this file now only contains a LoadImage test (ImportImage coverage appears removed from this test class). Please ensure the ImportImage terminate scenario is still covered (either reintroduce it here or move it elsewhere) so the original regression is actually prevented by tests.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 24, 2026 01:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread test/windows/Common.cpp
Comment on lines 46 to 50
// The watchdog timeout is set to 3 hours.
//

#define LXSS_WATCHDOG_TIMEOUT (3 * 60 * 60 * 1000)
#define LXSS_WATCHDOG_TIMEOUT (60 * 60 * 1000)
#define LXSS_WATCHDOG_TIMEOUT_WINDOW 1000
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the watchdog timeout is set to 3 hours, but LXSS_WATCHDOG_TIMEOUT is now 1 hour. Please update the comment (and any related documentation) or adjust the constant so they match, since this value controls when the test watchdog will fast-fail the process.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants