fix: prevent deadlock during Wayland client destruction#819
fix: prevent deadlock during Wayland client destruction#819Dami-star wants to merge 1 commit intolinuxdeepin:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Dami-star The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Reviewer's GuideGuards Wayland event processing against re-entrancy during client destruction by introducing an isProcessingEvents flag, adding safety checks around wl_display access, and tightening shutdown/disconnect behavior in WServerPrivate::stop(). Sequence diagram for guarded Wayland event processing with re-entrancy preventionsequenceDiagram
actor QtThread
participant QtEventDispatcher
participant WServerPrivate
participant WaylandEventLoop
participant WaylandDisplay
QtThread->>QtEventDispatcher: aboutToBlock
QtEventDispatcher->>WServerPrivate: processWaylandEvents()
alt isProcessingEvents is true
WServerPrivate-->>QtEventDispatcher: return (skip processing)
else isProcessingEvents is false
WServerPrivate->>WServerPrivate: isProcessingEvents = true
WServerPrivate->>WServerPrivate: ScopeGuard created (will reset isProcessingEvents)
WServerPrivate->>WServerPrivate: check display and handle
alt display or handle is null
WServerPrivate-->>QtEventDispatcher: return
else display and handle valid
WServerPrivate->>WaylandEventLoop: wl_event_loop_dispatch(loop, 0)
WaylandEventLoop-->>WServerPrivate: ret
alt ret != 0
WServerPrivate->>WServerPrivate: log dispatch error
end
WServerPrivate->>WServerPrivate: recheck display and handle
alt display or handle is null
WServerPrivate-->>QtEventDispatcher: return
else display and handle still valid
WServerPrivate->>WaylandDisplay: wl_display_flush_clients(handle)
WaylandDisplay-->>WServerPrivate: clients flushed
end
end
WServerPrivate->>WServerPrivate: ScopeGuard destructor resets isProcessingEvents
end
rect rgb(230,230,250)
note over QtEventDispatcher,WServerPrivate: During client destruction, aboutToBlock may fire again
QtEventDispatcher->>WServerPrivate: processWaylandEvents()
WServerPrivate->>WServerPrivate: isProcessingEvents already true, return immediately
end
Updated class diagram for WServerPrivate with isProcessingEvents flagclassDiagram
class WServerPrivate {
+GlobalFilterFunc globalFilterFunc
+void* globalFilterFuncData
+bool isProcessingEvents
+void init()
+void stop()
+void initSocket(WSocket* socketServer)
}
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- If
processWaylandEventscan be invoked from a different thread thanstop(), consider makinginClientDestroyatomic or otherwise synchronizing access to avoid data races when it is read in the lambda and written instop(). - To make the client-destruction guard harder to misuse in future changes, consider wrapping the
inClientDestroytoggling aroundwl_display_destroy_clientsin a small RAII helper or at least a dedicated function, instead of manually setting/resetting the flag instop().
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- If `processWaylandEvents` can be invoked from a different thread than `stop()`, consider making `inClientDestroy` atomic or otherwise synchronizing access to avoid data races when it is read in the lambda and written in `stop()`.
- To make the client-destruction guard harder to misuse in future changes, consider wrapping the `inClientDestroy` toggling around `wl_display_destroy_clients` in a small RAII helper or at least a dedicated function, instead of manually setting/resetting the flag in `stop()`.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
卡死时堆栈:#659 (comment) |
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent re-entrancy/deadlock during Wayland client destruction by blocking Wayland event processing while clients are being torn down (Waylib server-side, Qt event loop integration).
Changes:
- Add an
inClientDestroyflag to gateprocessWaylandEventsand avoid flushing clients while destruction is in progress. - Stop Qt/Wayland event processing (socket notifier + event dispatcher connections) before calling
wl_display_destroy_clients()during shutdown.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
waylib/src/server/kernel/wserver.cpp |
Adds guards around Wayland event processing and reorders shutdown steps to avoid re-entrancy during client destruction. |
waylib/src/server/kernel/private/wserver_p.h |
Introduces inClientDestroy state flag in WServerPrivate. |
| auto processWaylandEvents = [this] { | ||
| // Prevent re-entry during client destruction | ||
| if (inClientDestroy) | ||
| return; | ||
|
|
||
| if (!display || !display->handle()) | ||
| return; | ||
|
|
||
| int ret = wl_event_loop_dispatch(loop, 0); | ||
| if (ret) | ||
| fprintf(stderr, "wl_event_loop_dispatch error: %d\n", ret); | ||
|
|
||
| if (inClientDestroy || !display || !display->handle()) | ||
| return; | ||
| wl_display_flush_clients(display->handle()); |
There was a problem hiding this comment.
processWaylandEvents now returns early when inClientDestroy is true, but inClientDestroy is only toggled inside WServerPrivate::stop(). This means client destruction that happens during normal runtime (e.g. via wl_client_destroy triggered by disconnect/write failure) will still run with inClientDestroy == false, so the new guard won’t prevent re-entrancy in those cases. Consider either (1) turning this into a true re-entrancy guard for processWaylandEvents (set/reset a dedicated flag inside the lambda, preferably via RAII), or (2) toggling the flag in the actual client-destruction paths (where wl_client_destroy is invoked / destroy listeners run).
processWaylandEvents 现在只在 inClientDestroy 为 true 时提前返回,但 inClientDestroy 仅在 WServerPrivate::stop() 里被切换。这意味着在正常运行时发生的客户端销毁(例如断开连接/写失败触发的 wl_client_destroy)仍会在 inClientDestroy == false 的情况下执行,新加的保护不会阻止这类场景下的重入。建议要么:(1) 把它改成真正的 processWaylandEvents 重入保护(在 lambda 内部设置/复位专用标志,最好用 RAII 保证复位),要么:(2) 在实际的客户端销毁路径(调用 wl_client_destroy 或 destroy listener 触发处)切换该标志。
| // Stop event processing before destroying clients | ||
| sockNot.reset(); | ||
| QThread::currentThread()->eventDispatcher()->disconnect(q); | ||
|
|
There was a problem hiding this comment.
QThread::currentThread()->eventDispatcher() can be nullptr for threads without an installed event dispatcher; dereferencing it here will crash. Please guard the pointer (or store the dispatcher used in init() and disconnect from that instance), e.g. check for null before calling disconnect(q).
QThread::currentThread()->eventDispatcher() 在没有安装事件分发器的线程里可能为 nullptr;这里直接解引用会导致崩溃。建议对该指针做空值保护(或保存 init() 里使用的 dispatcher 并从同一个实例断开连接),例如在调用 disconnect(q) 前先判空。
57d87b7 to
5a4c01e
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- If
processWaylandEventscan ever be invoked from multiple threads,isProcessingEventsshould be made thread-safe (e.g.,std::atomic<bool>or confined to a single known thread) to avoid data races when guarding re-entrancy. - The locally defined
ScopeGuardinside the lambda is a bit ad hoc; consider extracting a reusable RAII helper or using an existing scope guard utility to keep this pattern consistent and easier to maintain across the codebase.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- If `processWaylandEvents` can ever be invoked from multiple threads, `isProcessingEvents` should be made thread-safe (e.g., `std::atomic<bool>` or confined to a single known thread) to avoid data races when guarding re-entrancy.
- The locally defined `ScopeGuard` inside the lambda is a bit ad hoc; consider extracting a reusable RAII helper or using an existing scope guard utility to keep this pattern consistent and easier to maintain across the codebase.
## Individual Comments
### Comment 1
<location path="waylib/src/server/kernel/wserver.cpp" line_range="107" />
<code_context>
int fd = wl_event_loop_get_fd(loop);
auto processWaylandEvents = [this] {
+ // Prevent re-entry during event processing
+ if (isProcessingEvents)
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the reentrancy flag handling and display/handle checks into small helper types/functions so the lambda focuses only on the event-processing logic.
You can keep the new safety behavior while simplifying the lambda by factoring out the ad‑hoc RAII and display checks into small, named helpers.
For example, define a tiny reentrancy guard and display helper once in the function scope instead of inside the lambda body:
```cpp
struct ReentrancyGuard {
bool &flag;
bool active;
explicit ReentrancyGuard(bool &f) : flag(f), active(!f) {
if (active)
flag = true;
}
~ReentrancyGuard() {
if (active)
flag = false;
}
explicit operator bool() const { return active; }
};
auto currentDisplayHandle = [this]() -> wl_display * {
return display ? display->handle() : nullptr;
};
```
Then the lambda becomes focused on the actual event processing:
```cpp
auto processWaylandEvents = [this, &loop]() {
ReentrancyGuard guard(isProcessingEvents);
if (!guard)
return;
if (auto handle = currentDisplayHandle(); !handle)
return;
int ret = wl_event_loop_dispatch(loop, 0);
if (ret)
fprintf(stderr, "wl_event_loop_dispatch error: %d\n", ret);
if (auto handle = currentDisplayHandle(); !handle)
return;
wl_display_flush_clients(handle);
};
```
This preserves:
- Reentrancy protection (`isProcessingEvents` guard)
- Double validation of `display`/`handle()` (before dispatch, before flush)
while removing the nested `struct ScopeGuard` definition and duplicative `if (!display || !display->handle())` noise inside the lambda.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @@ -105,9 +105,28 @@ void WServerPrivate::init() | |||
| int fd = wl_event_loop_get_fd(loop); | |||
|
|
|||
| auto processWaylandEvents = [this] { | |||
There was a problem hiding this comment.
issue (complexity): Consider extracting the reentrancy flag handling and display/handle checks into small helper types/functions so the lambda focuses only on the event-processing logic.
You can keep the new safety behavior while simplifying the lambda by factoring out the ad‑hoc RAII and display checks into small, named helpers.
For example, define a tiny reentrancy guard and display helper once in the function scope instead of inside the lambda body:
struct ReentrancyGuard {
bool &flag;
bool active;
explicit ReentrancyGuard(bool &f) : flag(f), active(!f) {
if (active)
flag = true;
}
~ReentrancyGuard() {
if (active)
flag = false;
}
explicit operator bool() const { return active; }
};
auto currentDisplayHandle = [this]() -> wl_display * {
return display ? display->handle() : nullptr;
};Then the lambda becomes focused on the actual event processing:
auto processWaylandEvents = [this, &loop]() {
ReentrancyGuard guard(isProcessingEvents);
if (!guard)
return;
if (auto handle = currentDisplayHandle(); !handle)
return;
int ret = wl_event_loop_dispatch(loop, 0);
if (ret)
fprintf(stderr, "wl_event_loop_dispatch error: %d\n", ret);
if (auto handle = currentDisplayHandle(); !handle)
return;
wl_display_flush_clients(handle);
};This preserves:
- Reentrancy protection (
isProcessingEventsguard) - Double validation of
display/handle()(before dispatch, before flush)
while removing the nested struct ScopeGuard definition and duplicative if (!display || !display->handle()) noise inside the lambda.
| if (inClientDestroy) | ||
| return; | ||
|
|
||
| if (!display || !display->handle()) |
|
|
||
| // RAII guard to ensure the flag is reset when the scope exits, | ||
| // regardless of how the function returns (normal return, early return, etc.) | ||
| struct ScopeGuard { |
dc371af to
fd0fe18
Compare
There was a problem hiding this comment.
Sorry @Dami-star, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
42f14b9 to
e510387
Compare
There was a problem hiding this comment.
Sorry @Dami-star, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
wl_display_flush_clients uses wl_list_for_each_safe to iterate the client list. The _safe variant pre-saves the next pointer so that removing the *current* node is safe. However, when wl_client_destroy is called on the current client, its destroy_signal may cascade and destroy the pre-saved *next* client as well. The destroyed next client undergoes wl_list_remove + wl_list_init, making its link self-referencing (prev = next = self). On the next loop iteration, client == next and the termination condition (&client->link != head) is always true, causing an infinite loop with 100% CPU. Fix: - Replace wl_display_flush_clients with safeFlushClients() that detects self-linked nodes (node->next == node) and stops traversal. Unlike wl_display_flush_clients, safeFlushClients uses wl_client_flush (pure sendmsg I/O) instead of wl_connection_flush + wl_client_destroy, so it cannot trigger cascading client destruction during traversal. - Separate dispatch and flush into different callbacks: activated does only wl_event_loop_dispatch, aboutToBlock does only safeFlushClients. This matches the upstream wl_display_run order (flush before dispatch). - Add isProcessingEvents reentrant guard to prevent aboutToBlock from re-entering during dispatch. - In stop(), disconnect event handlers before destroying clients to prevent callbacks from firing during teardown.
wl_display_flush_clients uses wl_list_for_each_safe to iterate the
client list. The _safe variant pre-saves the next pointer so that
removing the current node is safe. However, when wl_client_destroy
is called on the current client, its destroy_signal may cascade and
destroy the pre-saved next client as well. The destroyed next client
undergoes wl_list_remove + wl_list_init, making its link self-referencing
(prev = next = self). On the next loop iteration, client == next and
the termination condition (&client->link != head) is always true,
causing an infinite loop with 100% CPU.
Fix:
self-linked nodes (node->next == node) and stops traversal. Unlike
wl_display_flush_clients, safeFlushClients uses wl_client_flush (pure
sendmsg I/O) instead of wl_connection_flush + wl_client_destroy, so
it cannot trigger cascading client destruction during traversal.
only wl_event_loop_dispatch, aboutToBlock does only safeFlushClients.
This matches the upstream wl_display_run order (flush before dispatch).
re-entering during dispatch.
prevent callbacks from firing during teardown.
Summary by Sourcery
Prevent re-entrant Wayland event processing during client destruction to avoid deadlocks or crashes.
Bug Fixes:
Enhancements: