Use overlapped pipes in WSLC session terminate tests#40264
Use overlapped pipes in WSLC session terminate tests#40264benhillis wants to merge 10 commits intofeature/wsl-for-appsfrom
Conversation
The LoadImage and ImportImage 'session terminate' sub-tests used CreatePipe which creates synchronous pipe handles. When the relay calls ReadFile with an OVERLAPPED structure on a synchronous handle, the call blocks the thread instead of returning ERROR_IO_PENDING. This prevents WaitForMultipleObjects from ever checking the session terminating event, causing a deadlock when Terminate() is called. Replace CreatePipe with CreateNamedPipeW (FILE_FLAG_OVERLAPPED) + CreateFileW so ReadFile returns ERROR_IO_PENDING and the relay's event loop can detect the termination signal. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0134487 to
a234ca0
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates WSLC session termination tests to use overlapped-capable pipe handles so the relay’s event-loop cancellation path works reliably across Windows versions, avoiding hangs in release pipelines.
Changes:
- Replace
CreatePipewithCreateNamedPipeW(FILE_FLAG_OVERLAPPED)+CreateFileWin theLoadImage“session terminate” sub-test. - Do the same replacement in the
ImportImage“session terminate” sub-test. - Add inline rationale describing why overlapped pipes are required for correct cancellation behavior.
Comments suppressed due to low confidence (2)
test/windows/WSLCTests.cpp:1049
- After creating the named pipe server handle, the server end is never explicitly connected (ConnectNamedPipe/Helpers::ConnectPipe). On some Windows versions this can leave the pipe in a listening/disconnected state (ReadFile can fail with ERROR_PIPE_LISTENING, or the client CreateFile/WriteFile can block), reintroducing hangs/flakiness. Consider calling ConnectNamedPipe on the server handle after CreateFileW succeeds (treat ERROR_PIPE_CONNECTED as success), or use wsl::windows::common::helpers::ConnectPipe for the connection step before passing the handle to LoadImage.
VERIFY_WIN32_BOOL_SUCCEEDED(WriteFile(pipeWrite.get(), "data", 4, &bytesWritten, nullptr));
testCompleted.SetEvent();
test/windows/WSLCTests.cpp:1165
- Same pattern as the LoadImage terminate test: the named pipe server handle isn’t explicitly connected (no ConnectNamedPipe/ConnectPipe). To avoid ERROR_PIPE_LISTENING / blocking behavior differences across Windows versions, connect the server end after CreateFileW returns (accept ERROR_PIPE_CONNECTED as success) before starting ImportImage.
WSLC_TEST_METHOD(DeleteImage)
{
// Prepare alpine image to delete.
Move ResetTestSession() after get_future().get() to ensure the operation thread's COM call has completed before destroying the proxy. When Terminate() returns, the service has completed cleanup but the RPC response for LoadImage/ImportImage may not have been delivered to the client thread yet. Calling ResetTestSession() (which resets the COM pointer) before the future is resolved can destroy the proxy while the operation thread is still waiting for its RPC response, causing a use-after-free on older Windows builds where timing differs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This reverts commit db9ff5a.
| std::thread operationThread([&]() { | ||
| terminateResult.set_value(m_defaultSession->LoadImage(ToCOMInputHandle(pipeRead.get()), nullptr, 1024 * 1024)); | ||
| WI_ASSERT(testCompleted.is_signaled()); | ||
| }); | ||
|
|
There was a problem hiding this comment.
The worker thread captures and dereferences m_defaultSession by reference; later in this block the main thread terminates/resets the session. To avoid any potential race/lifetime issues with the com_ptr while a COM call is in flight, capture a local com_ptr copy (e.g., auto session = m_defaultSession) and use that in the thread, and only reset m_defaultSession after the thread has joined.
| set(SOURCES | ||
| SimpleTests.cpp | ||
| UnitTests.cpp | ||
| MountTests.cpp | ||
| NetworkTests.cpp | ||
| Plan9Tests.cpp | ||
| DrvFsTests.cpp | ||
| Common.cpp | ||
| PluginTests.cpp | ||
| PolicyTests.cpp | ||
| InstallerTests.cpp | ||
| WSLCTests.cpp | ||
| WslcSdkTests.cpp) | ||
| WSLCTests.cpp) |
There was a problem hiding this comment.
test/windows/CMakeLists.txt now only builds Common.cpp and WSLCTests.cpp, dropping all other Windows test sources (UnitTests, NetworkTests, WslcSdkTests, etc.). This is a large behavior change and doesn’t match the PR description; if unintentional, please restore the full SOURCES list (or explain/move tests to a new target explicitly).
| add_dependencies(wsltests wslserviceidl wslclib wslc wslcsdk) | ||
| add_subdirectory(testplugin) | ||
| add_subdirectory(wslc) | ||
|
|
There was a problem hiding this comment.
The wslc test subdirectory is no longer included (add_subdirectory(wslc) was removed), so the CLI/unit tests under test/windows/wslc won’t be built or run. If the goal is only to adjust overlapped pipes in specific subtests, this should be reverted; otherwise the PR description should be updated and CI impact assessed.
| // | ||
|
|
||
| #define LXSS_WATCHDOG_TIMEOUT (3 * 60 * 60 * 1000) | ||
| #define LXSS_WATCHDOG_TIMEOUT (5 * 60 * 1000) |
There was a problem hiding this comment.
The comment says the watchdog timeout is 3 hours, but the value was changed to 5 minutes. Either update the comment to match and justify the new global timeout (it may terminate legitimately long-running tests), or revert/choose a value consistent with the intended suite runtime.
| #define LXSS_WATCHDOG_TIMEOUT (5 * 60 * 1000) | |
| #define LXSS_WATCHDOG_TIMEOUT (3 * 60 * 60 * 1000) |
| WSLC_TEST_METHOD(LoadImage) | ||
| { | ||
| auto [hr, deletedImages] = DeleteImageNoThrow(Image, Flags); | ||
| VERIFY_SUCCEEDED(hr); | ||
|
|
||
| return std::move(deletedImages); | ||
| } | ||
| std::filesystem::path imageTar = GetTestImagePath("hello-world:latest"); | ||
| wil::unique_handle imageTarFileHandle{ | ||
| CreateFileW(imageTar.c_str(), GENERIC_READ, FILE_SHARE_READ, nullptr, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, nullptr)}; |
There was a problem hiding this comment.
PR description calls out both LoadImage and ImportImage session-terminate hangs, but this file now only contains a LoadImage test (ImportImage coverage appears removed from this test class). Please ensure the ImportImage terminate scenario is still covered (either reintroduce it here or move it elsewhere) so the original regression is actually prevented by tests.
| // The watchdog timeout is set to 3 hours. | ||
| // | ||
|
|
||
| #define LXSS_WATCHDOG_TIMEOUT (3 * 60 * 60 * 1000) | ||
| #define LXSS_WATCHDOG_TIMEOUT (60 * 60 * 1000) | ||
| #define LXSS_WATCHDOG_TIMEOUT_WINDOW 1000 |
There was a problem hiding this comment.
The comment says the watchdog timeout is set to 3 hours, but LXSS_WATCHDOG_TIMEOUT is now 1 hour. Please update the comment (and any related documentation) or adjust the constant so they match, since this value controls when the test watchdog will fast-fail the process.
Problem
The
LoadImageandImportImagesession-terminate sub-tests hang in the release pipeline, causing a 2-hour timeout. The tests pass in PR CI but fail consistently in the internal release pipeline on older Windows versions.Root Cause
CreatePipecreates synchronous pipe handles. The relay'sReadHandle::Schedule()callsReadFilewith anOVERLAPPEDstructure, but on a synchronous handle this blocks the thread instead of returningERROR_IO_PENDING. This preventsWaitForMultipleObjectsinRun()from ever checking the session terminating event. WhenTerminate()signals the event and callsCancelIoEx, it fails becauseCancelIoExis unreliable on synchronous handles across Windows versions.Terminate()then deadlocks trying to acquire an exclusive lock held by the blockedLoadImagecall.Fix
Replace
CreatePipewithOpenAnonymousPipe(2, true, false)to create an overlapped read pipe, soReadFilereturnsERROR_IO_PENDINGand the relay's event loop can detect the termination signal.The "pipe close" sub-tests are left using
CreatePipesince they work correctly — closing the write end causesERROR_BROKEN_PIPEwhich the relay handles inline without needingWaitForMultipleObjects.