Skip to content

fix: prevent deadlock during Wayland client destruction#819

Draft
Dami-star wants to merge 1 commit intolinuxdeepin:masterfrom
Dami-star:fix-crash
Draft

fix: prevent deadlock during Wayland client destruction#819
Dami-star wants to merge 1 commit intolinuxdeepin:masterfrom
Dami-star:fix-crash

Conversation

@Dami-star
Copy link
Copy Markdown
Collaborator

@Dami-star Dami-star commented Apr 9, 2026

wl_display_flush_clients uses wl_list_for_each_safe to iterate the
client list. The _safe variant pre-saves the next pointer so that
removing the current node is safe. However, when wl_client_destroy
is called on the current client, its destroy_signal may cascade and
destroy the pre-saved next client as well. The destroyed next client
undergoes wl_list_remove + wl_list_init, making its link self-referencing
(prev = next = self). On the next loop iteration, client == next and
the termination condition (&client->link != head) is always true,
causing an infinite loop with 100% CPU.

Fix:

  • Replace wl_display_flush_clients with safeFlushClients() that detects
    self-linked nodes (node->next == node) and stops traversal. Unlike
    wl_display_flush_clients, safeFlushClients uses wl_client_flush (pure
    sendmsg I/O) instead of wl_connection_flush + wl_client_destroy, so
    it cannot trigger cascading client destruction during traversal.
  • Separate dispatch and flush into different callbacks: activated does
    only wl_event_loop_dispatch, aboutToBlock does only safeFlushClients.
    This matches the upstream wl_display_run order (flush before dispatch).
  • Add isProcessingEvents reentrant guard to prevent aboutToBlock from
    re-entering during dispatch.
  • In stop(), disconnect event handlers before destroying clients to
    prevent callbacks from firing during teardown.

Summary by Sourcery

Prevent re-entrant Wayland event processing during client destruction to avoid deadlocks or crashes.

Bug Fixes:

  • Guard Wayland event processing with a re-entrancy flag and early returns to avoid corrupted client state during wl_client destruction.
  • Add checks for display and its handle before and after dispatching Wayland events to prevent access to invalid display handles.
  • Ensure event dispatcher disconnection during server stop only occurs when a dispatcher exists, avoiding potential null access or invalid connections.

Enhancements:

  • Refine server shutdown cleanup logic for Wayland clients and event dispatcher connections.

@deepin-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Dami-star

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Apr 9, 2026

Reviewer's Guide

Guards Wayland event processing against re-entrancy during client destruction by introducing an isProcessingEvents flag, adding safety checks around wl_display access, and tightening shutdown/disconnect behavior in WServerPrivate::stop().

Sequence diagram for guarded Wayland event processing with re-entrancy prevention

sequenceDiagram
    actor QtThread
    participant QtEventDispatcher
    participant WServerPrivate
    participant WaylandEventLoop
    participant WaylandDisplay

    QtThread->>QtEventDispatcher: aboutToBlock
    QtEventDispatcher->>WServerPrivate: processWaylandEvents()
    alt isProcessingEvents is true
        WServerPrivate-->>QtEventDispatcher: return (skip processing)
    else isProcessingEvents is false
        WServerPrivate->>WServerPrivate: isProcessingEvents = true
        WServerPrivate->>WServerPrivate: ScopeGuard created (will reset isProcessingEvents)
        WServerPrivate->>WServerPrivate: check display and handle
        alt display or handle is null
            WServerPrivate-->>QtEventDispatcher: return
        else display and handle valid
            WServerPrivate->>WaylandEventLoop: wl_event_loop_dispatch(loop, 0)
            WaylandEventLoop-->>WServerPrivate: ret
            alt ret != 0
                WServerPrivate->>WServerPrivate: log dispatch error
            end
            WServerPrivate->>WServerPrivate: recheck display and handle
            alt display or handle is null
                WServerPrivate-->>QtEventDispatcher: return
            else display and handle still valid
                WServerPrivate->>WaylandDisplay: wl_display_flush_clients(handle)
                WaylandDisplay-->>WServerPrivate: clients flushed
            end
        end
        WServerPrivate->>WServerPrivate: ScopeGuard destructor resets isProcessingEvents
    end

    rect rgb(230,230,250)
    note over QtEventDispatcher,WServerPrivate: During client destruction, aboutToBlock may fire again
    QtEventDispatcher->>WServerPrivate: processWaylandEvents()
    WServerPrivate->>WServerPrivate: isProcessingEvents already true, return immediately
    end
Loading

Updated class diagram for WServerPrivate with isProcessingEvents flag

classDiagram
    class WServerPrivate {
        +GlobalFilterFunc globalFilterFunc
        +void* globalFilterFuncData
        +bool isProcessingEvents
        +void init()
        +void stop()
        +void initSocket(WSocket* socketServer)
    }
Loading

File-Level Changes

Change Details Files
Guard Wayland event dispatch/flush against re-entrancy and use-after-destroy during Qt-driven callbacks.
  • Wrap processWaylandEvents body with an isProcessingEvents flag and early return on re-entry
  • Add RAII scope guard to reliably reset the isProcessingEvents flag on all returns
  • Insert null checks for display and its handle before and after wl_event_loop_dispatch to avoid accessing destroyed display
  • Call wl_display_flush_clients only when display and its handle are still valid
waylib/src/server/kernel/wserver.cpp
waylib/src/server/kernel/private/wserver_p.h
Tighten server shutdown to safely destroy Wayland clients and disconnect the event dispatcher without dereferencing null pointers.
  • Remove stray blank line after wl_display_destroy_clients for clarity
  • Wrap eventDispatcher() use in a null-check before disconnecting
  • Use QObject::disconnect with explicit sender/receiver arguments to safely disconnect the dispatcher from the server object
waylib/src/server/kernel/wserver.cpp

Possibly linked issues

  • chore: suppress compile warnings #29: PR adds isProcessingEvents guard and safer shutdown to fix the wl_client_destroy crash shown in issue traceback.
  • #30629: PR adds re-entrancy guard and safety checks in WServer::init around wl_event_loop_dispatch, matching the reported crash stack.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • If processWaylandEvents can be invoked from a different thread than stop(), consider making inClientDestroy atomic or otherwise synchronizing access to avoid data races when it is read in the lambda and written in stop().
  • To make the client-destruction guard harder to misuse in future changes, consider wrapping the inClientDestroy toggling around wl_display_destroy_clients in a small RAII helper or at least a dedicated function, instead of manually setting/resetting the flag in stop().
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- If `processWaylandEvents` can be invoked from a different thread than `stop()`, consider making `inClientDestroy` atomic or otherwise synchronizing access to avoid data races when it is read in the lambda and written in `stop()`.
- To make the client-destruction guard harder to misuse in future changes, consider wrapping the `inClientDestroy` toggling around `wl_display_destroy_clients` in a small RAII helper or at least a dedicated function, instead of manually setting/resetting the flag in `stop()`.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@Dami-star
Copy link
Copy Markdown
Collaborator Author

卡死时堆栈:#659 (comment)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent re-entrancy/deadlock during Wayland client destruction by blocking Wayland event processing while clients are being torn down (Waylib server-side, Qt event loop integration).

Changes:

  • Add an inClientDestroy flag to gate processWaylandEvents and avoid flushing clients while destruction is in progress.
  • Stop Qt/Wayland event processing (socket notifier + event dispatcher connections) before calling wl_display_destroy_clients() during shutdown.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
waylib/src/server/kernel/wserver.cpp Adds guards around Wayland event processing and reorders shutdown steps to avoid re-entrancy during client destruction.
waylib/src/server/kernel/private/wserver_p.h Introduces inClientDestroy state flag in WServerPrivate.

Comment thread waylib/src/server/kernel/wserver.cpp Outdated
Comment on lines 107 to 121
auto processWaylandEvents = [this] {
// Prevent re-entry during client destruction
if (inClientDestroy)
return;

if (!display || !display->handle())
return;

int ret = wl_event_loop_dispatch(loop, 0);
if (ret)
fprintf(stderr, "wl_event_loop_dispatch error: %d\n", ret);

if (inClientDestroy || !display || !display->handle())
return;
wl_display_flush_clients(display->handle());
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processWaylandEvents now returns early when inClientDestroy is true, but inClientDestroy is only toggled inside WServerPrivate::stop(). This means client destruction that happens during normal runtime (e.g. via wl_client_destroy triggered by disconnect/write failure) will still run with inClientDestroy == false, so the new guard won’t prevent re-entrancy in those cases. Consider either (1) turning this into a true re-entrancy guard for processWaylandEvents (set/reset a dedicated flag inside the lambda, preferably via RAII), or (2) toggling the flag in the actual client-destruction paths (where wl_client_destroy is invoked / destroy listeners run).

processWaylandEvents 现在只在 inClientDestroy 为 true 时提前返回,但 inClientDestroy 仅在 WServerPrivate::stop() 里被切换。这意味着在正常运行时发生的客户端销毁(例如断开连接/写失败触发的 wl_client_destroy)仍会在 inClientDestroy == false 的情况下执行,新加的保护不会阻止这类场景下的重入。建议要么:(1) 把它改成真正的 processWaylandEvents 重入保护(在 lambda 内部设置/复位专用标志,最好用 RAII 保证复位),要么:(2) 在实际的客户端销毁路径(调用 wl_client_destroy 或 destroy listener 触发处)切换该标志。

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

合理,已修改压测验证

Comment thread waylib/src/server/kernel/wserver.cpp Outdated
Comment on lines +140 to +143
// Stop event processing before destroying clients
sockNot.reset();
QThread::currentThread()->eventDispatcher()->disconnect(q);

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QThread::currentThread()->eventDispatcher() can be nullptr for threads without an installed event dispatcher; dereferencing it here will crash. Please guard the pointer (or store the dispatcher used in init() and disconnect from that instance), e.g. check for null before calling disconnect(q).

QThread::currentThread()->eventDispatcher() 在没有安装事件分发器的线程里可能为 nullptr;这里直接解引用会导致崩溃。建议对该指针做空值保护(或保存 init() 里使用的 dispatcher 并从同一个实例断开连接),例如在调用 disconnect(q) 前先判空。

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Dami-star Dami-star force-pushed the fix-crash branch 2 times, most recently from 57d87b7 to 5a4c01e Compare April 10, 2026 02:23
@Dami-star Dami-star marked this pull request as draft April 10, 2026 02:33
@Dami-star Dami-star marked this pull request as ready for review April 10, 2026 02:35
Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • If processWaylandEvents can ever be invoked from multiple threads, isProcessingEvents should be made thread-safe (e.g., std::atomic<bool> or confined to a single known thread) to avoid data races when guarding re-entrancy.
  • The locally defined ScopeGuard inside the lambda is a bit ad hoc; consider extracting a reusable RAII helper or using an existing scope guard utility to keep this pattern consistent and easier to maintain across the codebase.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- If `processWaylandEvents` can ever be invoked from multiple threads, `isProcessingEvents` should be made thread-safe (e.g., `std::atomic<bool>` or confined to a single known thread) to avoid data races when guarding re-entrancy.
- The locally defined `ScopeGuard` inside the lambda is a bit ad hoc; consider extracting a reusable RAII helper or using an existing scope guard utility to keep this pattern consistent and easier to maintain across the codebase.

## Individual Comments

### Comment 1
<location path="waylib/src/server/kernel/wserver.cpp" line_range="107" />
<code_context>
     int fd = wl_event_loop_get_fd(loop);

     auto processWaylandEvents = [this] {
+        // Prevent re-entry during event processing
+        if (isProcessingEvents)
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the reentrancy flag handling and display/handle checks into small helper types/functions so the lambda focuses only on the event-processing logic.

You can keep the new safety behavior while simplifying the lambda by factoring out the ad‑hoc RAII and display checks into small, named helpers.

For example, define a tiny reentrancy guard and display helper once in the function scope instead of inside the lambda body:

```cpp
struct ReentrancyGuard {
    bool &flag;
    bool active;

    explicit ReentrancyGuard(bool &f) : flag(f), active(!f) {
        if (active)
            flag = true;
    }
    ~ReentrancyGuard() {
        if (active)
            flag = false;
    }
    explicit operator bool() const { return active; }
};

auto currentDisplayHandle = [this]() -> wl_display * {
    return display ? display->handle() : nullptr;
};
```

Then the lambda becomes focused on the actual event processing:

```cpp
auto processWaylandEvents = [this, &loop]() {
    ReentrancyGuard guard(isProcessingEvents);
    if (!guard)
        return;

    if (auto handle = currentDisplayHandle(); !handle)
        return;

    int ret = wl_event_loop_dispatch(loop, 0);
    if (ret)
        fprintf(stderr, "wl_event_loop_dispatch error: %d\n", ret);

    if (auto handle = currentDisplayHandle(); !handle)
        return;

    wl_display_flush_clients(handle);
};
```

This preserves:

- Reentrancy protection (`isProcessingEvents` guard)
- Double validation of `display`/`handle()` (before dispatch, before flush)

while removing the nested `struct ScopeGuard` definition and duplicative `if (!display || !display->handle())` noise inside the lambda.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread waylib/src/server/kernel/wserver.cpp Outdated
@@ -105,9 +105,28 @@ void WServerPrivate::init()
int fd = wl_event_loop_get_fd(loop);

auto processWaylandEvents = [this] {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting the reentrancy flag handling and display/handle checks into small helper types/functions so the lambda focuses only on the event-processing logic.

You can keep the new safety behavior while simplifying the lambda by factoring out the ad‑hoc RAII and display checks into small, named helpers.

For example, define a tiny reentrancy guard and display helper once in the function scope instead of inside the lambda body:

struct ReentrancyGuard {
    bool &flag;
    bool active;

    explicit ReentrancyGuard(bool &f) : flag(f), active(!f) {
        if (active)
            flag = true;
    }
    ~ReentrancyGuard() {
        if (active)
            flag = false;
    }
    explicit operator bool() const { return active; }
};

auto currentDisplayHandle = [this]() -> wl_display * {
    return display ? display->handle() : nullptr;
};

Then the lambda becomes focused on the actual event processing:

auto processWaylandEvents = [this, &loop]() {
    ReentrancyGuard guard(isProcessingEvents);
    if (!guard)
        return;

    if (auto handle = currentDisplayHandle(); !handle)
        return;

    int ret = wl_event_loop_dispatch(loop, 0);
    if (ret)
        fprintf(stderr, "wl_event_loop_dispatch error: %d\n", ret);

    if (auto handle = currentDisplayHandle(); !handle)
        return;

    wl_display_flush_clients(handle);
};

This preserves:

  • Reentrancy protection (isProcessingEvents guard)
  • Double validation of display/handle() (before dispatch, before flush)

while removing the nested struct ScopeGuard definition and duplicative if (!display || !display->handle()) noise inside the lambda.

Comment thread waylib/src/server/kernel/wserver.cpp Outdated
if (inClientDestroy)
return;

if (!display || !display->handle())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这时候还能到 processWaylandEvents 里面来?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

防御修改,已移除,压测中

Comment thread waylib/src/server/kernel/wserver.cpp Outdated

// RAII guard to ensure the flag is reset when the scope exits,
// regardless of how the function returns (normal return, early return, etc.)
struct ScopeGuard {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以直接用 QScopedValueRollback

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改压测中

@Dami-star Dami-star marked this pull request as draft April 10, 2026 08:52
@Dami-star Dami-star force-pushed the fix-crash branch 3 times, most recently from dc371af to fd0fe18 Compare April 13, 2026 03:31
@Dami-star Dami-star marked this pull request as ready for review April 13, 2026 05:03
Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Dami-star, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@Dami-star Dami-star requested a review from zccrs April 13, 2026 05:49
@Dami-star Dami-star marked this pull request as draft April 13, 2026 06:08
@Dami-star Dami-star force-pushed the fix-crash branch 4 times, most recently from 42f14b9 to e510387 Compare April 13, 2026 08:31
@Dami-star Dami-star marked this pull request as ready for review April 13, 2026 09:08
Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Dami-star, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@Dami-star Dami-star marked this pull request as draft April 14, 2026 03:02
wl_display_flush_clients uses wl_list_for_each_safe to iterate the
client list. The _safe variant pre-saves the next pointer so that
removing the *current* node is safe. However, when wl_client_destroy
is called on the current client, its destroy_signal may cascade and
destroy the pre-saved *next* client as well. The destroyed next client
undergoes wl_list_remove + wl_list_init, making its link self-referencing
(prev = next = self). On the next loop iteration, client == next and
the termination condition (&client->link != head) is always true,
causing an infinite loop with 100% CPU.

Fix:
- Replace wl_display_flush_clients with safeFlushClients() that detects
  self-linked nodes (node->next == node) and stops traversal. Unlike
  wl_display_flush_clients, safeFlushClients uses wl_client_flush (pure
  sendmsg I/O) instead of wl_connection_flush + wl_client_destroy, so
  it cannot trigger cascading client destruction during traversal.
- Separate dispatch and flush into different callbacks: activated does
  only wl_event_loop_dispatch, aboutToBlock does only safeFlushClients.
  This matches the upstream wl_display_run order (flush before dispatch).
- Add isProcessingEvents reentrant guard to prevent aboutToBlock from
  re-entering during dispatch.
- In stop(), disconnect event handlers before destroying clients to
  prevent callbacks from firing during teardown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants