fix: Use atomic for degree_list and use memory fence to ensure thread safety by zhanglei1949 · Pull Request #386 · alibaba/neug

zhanglei1949 · 2026-05-21T08:32:49Z

This pull request refactors the mutable CSR (Compressed Sparse Row) graph storage implementation to use std::atomic<int> for vertex degree storage, ensuring thread safety and preventing torn reads/writes in concurrent environments. The changes update all relevant code paths to use atomic operations, with appropriate memory orderings, and adjust buffer management accordingly.

The most important changes are:

Thread Safety and Atomic Degree Management:

Replaced all raw int degree arrays with std::atomic<int> in both the in-memory representation and all access/manipulation code, ensuring atomicity and visibility guarantees for concurrent readers and writers. All reads and writes to degrees now use explicit memory orderings (acquire for reads, release for writes, and relaxed where appropriate). [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

Reader/Writer Synchronization:

Updated all code that reads or writes the degree array (e.g., edge insertions, deletions, compaction, batch operations, and neighbor iteration) to use atomic loads and stores, ensuring that buffer pointer updates and degree increments are properly sequenced for safe concurrent access. [1] [2] [3] [4]

Buffer Management Adjustments:

Modified logic for buffer resizing, copying, and pointer updates to ensure that atomic degree increments/releases are used as synchronization anchors, so readers always see a consistent view of the adjacency buffers. [1] [2] [3] [4]

Test and Include Updates:

Added necessary includes for <atomic>, <thread>, and <vector> in test files to support atomic operations and potential multithreaded tests.

Overall, these changes make the mutable CSR implementation safe for concurrent modifications and queries, which is crucial for high-performance, multi-threaded graph processing.

…sistency in generic view

Copilot

Pull request overview

This PR aims to fix the reported concurrent mutable-CSR reallocation race (Issue #381) by making per-vertex degree reads/writes atomic and using acquire/release ordering so lock-free readers can safely traverse adjacency buffers while writers insert edges and trigger reallocations.

Changes:

Refactors mutable CSR degree storage access to use atomic operations in writer paths and reader views.
Updates GenericView/TypedView neighbor iteration to snapshot degree with acquire loads.
Adds concurrent read/write regression tests for get_edges() and foreach_nbr_lt().

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
`tests/storage/test_mutable_csr.cc`	Adds multithreaded regression tests intended to catch torn degree/buffer snapshot races.
`src/storages/csr/mutable_csr.cc`	Converts degree array accesses to atomic loads/stores in multiple mutation/maintenance code paths.
`include/neug/storages/csr/mutable_csr.h`	Updates `put_edge` and related helpers to use atomic degree operations and adjusts vertex capacity computation.
`include/neug/storages/csr/generic_view.h`	Uses atomic degree loads in `get_edges()` and typed neighbor iteration to avoid torn reads during concurrent writes.

Comments suppressed due to low confidence (1)

include/neug/storages/csr/mutable_csr.h:146

The degree is incremented via fetch_add(memory_order_release) before the new neighbor entry is fully written. A concurrent reader that bounds iteration by degree (acquire load) can observe the increased degree and read an uninitialized/partially-written slot. Reserve an index, write neighbor/data/timestamp, then publish the new degree with a release store (or otherwise ensure the release operation occurs after the entry is initialized).

    int32_t prev_size = sizes[src].fetch_add(1, std::memory_order_release);
    auto& nbr = buffers[src][prev_size];
    nbr.neighbor = dst;
    nbr.data = data;
    nbr.timestamp.store(ts);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

      return 0;
    }
-    return degree_list_->GetDataSize() / sizeof(int);
+    return degree_list_->GetDataSize() / sizeof(std::atomic<int>);


    adj_list_buffer_->Resize(vnum * sizeof(nbr_t*));
    degree_list_->Resize(vnum * sizeof(int));
    cap_list_->Resize(vnum * sizeof(int));
    auto** buf_arr = reinterpret_cast<nbr_t**>(adj_list_buffer_->GetData());
-    auto* sz_arr = reinterpret_cast<int*>(degree_list_->GetData());
+    auto* sz_arr = reinterpret_cast<std::atomic<int>*>(degree_list_->GetData());


+    auto& nbr = buf_arr[src][sz_arr[src].fetch_add(1, std::memory_order_relaxed)];
    nbr.neighbor = dst;
    nbr.data = data;
    nbr.timestamp.store(ts);


+    // Atomic loads for torn-read safety: snapshot degree and buffer pointer
+    const int deg = reinterpret_cast<const std::atomic<int>*>(degrees)[v].load(
+        std::memory_order_acquire);
+    const MutableNbr<T>* base = adjlists[v];
+    const MutableNbr<T>* ptr = base + deg - 1;
+    const MutableNbr<T>* end = base - 1;


+    // Atomic load for torn-read safety: snapshot degree
+    const int deg = reinterpret_cast<const std::atomic<int>*>(degrees)[v].load(
+        std::memory_order_acquire);
+    const MutableNbr<T>* base = adjlists[v];


      const char* start_ptr = reinterpret_cast<const char*>(
          reinterpret_cast<const int64_t*>(adjlists_)[v]);
      ret.start_ptr = start_ptr;
-      ret.end_ptr = start_ptr + degrees_[v] * cfg_.stride;
+      ret.end_ptr = start_ptr + deg * cfg_.stride;
    }


+// This is a regression test for the torn-read bug fixed by using
+// std::atomic_ref in put_edge / GenericView / TypedView.


zhanglei1949 added 3 commits May 21, 2026 16:30

use atmoic for degree_list and use memory fence to avoid memory-incon…

7631d0a

…sistency in generic view

rename var

a048dea

minor

7cec2e3

zhanglei1949 requested a review from liulx20 May 21, 2026 08:42

format

120e176

zhanglei1949 requested a review from Copilot May 22, 2026 02:20

Copilot started reviewing on behalf of zhanglei1949 May 22, 2026 02:20 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Use atomic for degree_list and use memory fence to ensure thread safety#386

fix: Use atomic for degree_list and use memory fence to ensure thread safety#386
zhanglei1949 wants to merge 4 commits into
alibaba:mainfrom
zhanglei1949:zl/fix-mcsr-put-edge

zhanglei1949 commented May 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// This is a regression test for the torn-read bug fixed by using
		// std::atomic_ref in put_edge / GenericView / TypedView.

Conversation

zhanglei1949 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhanglei1949 commented May 21, 2026 •

edited

Loading