Skip to content

fix: Use atomic for degree_list and use memory fence to ensure thread safety#386

Open
zhanglei1949 wants to merge 4 commits into
alibaba:mainfrom
zhanglei1949:zl/fix-mcsr-put-edge
Open

fix: Use atomic for degree_list and use memory fence to ensure thread safety#386
zhanglei1949 wants to merge 4 commits into
alibaba:mainfrom
zhanglei1949:zl/fix-mcsr-put-edge

Conversation

@zhanglei1949
Copy link
Copy Markdown
Member

@zhanglei1949 zhanglei1949 commented May 21, 2026

FIx #381

This pull request refactors the mutable CSR (Compressed Sparse Row) graph storage implementation to use std::atomic<int> for vertex degree storage, ensuring thread safety and preventing torn reads/writes in concurrent environments. The changes update all relevant code paths to use atomic operations, with appropriate memory orderings, and adjust buffer management accordingly.

The most important changes are:

Thread Safety and Atomic Degree Management:

  • Replaced all raw int degree arrays with std::atomic<int> in both the in-memory representation and all access/manipulation code, ensuring atomicity and visibility guarantees for concurrent readers and writers. All reads and writes to degrees now use explicit memory orderings (acquire for reads, release for writes, and relaxed where appropriate). [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

Reader/Writer Synchronization:

  • Updated all code that reads or writes the degree array (e.g., edge insertions, deletions, compaction, batch operations, and neighbor iteration) to use atomic loads and stores, ensuring that buffer pointer updates and degree increments are properly sequenced for safe concurrent access. [1] [2] [3] [4]

Buffer Management Adjustments:

  • Modified logic for buffer resizing, copying, and pointer updates to ensure that atomic degree increments/releases are used as synchronization anchors, so readers always see a consistent view of the adjacency buffers. [1] [2] [3] [4]

Test and Include Updates:

  • Added necessary includes for <atomic>, <thread>, and <vector> in test files to support atomic operations and potential multithreaded tests.

Overall, these changes make the mutable CSR implementation safe for concurrent modifications and queries, which is crucial for high-performance, multi-threaded graph processing.

@zhanglei1949 zhanglei1949 requested a review from liulx20 May 21, 2026 08:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix the reported concurrent mutable-CSR reallocation race (Issue #381) by making per-vertex degree reads/writes atomic and using acquire/release ordering so lock-free readers can safely traverse adjacency buffers while writers insert edges and trigger reallocations.

Changes:

  • Refactors mutable CSR degree storage access to use atomic operations in writer paths and reader views.
  • Updates GenericView/TypedView neighbor iteration to snapshot degree with acquire loads.
  • Adds concurrent read/write regression tests for get_edges() and foreach_nbr_lt().

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
tests/storage/test_mutable_csr.cc Adds multithreaded regression tests intended to catch torn degree/buffer snapshot races.
src/storages/csr/mutable_csr.cc Converts degree array accesses to atomic loads/stores in multiple mutation/maintenance code paths.
include/neug/storages/csr/mutable_csr.h Updates put_edge and related helpers to use atomic degree operations and adjusts vertex capacity computation.
include/neug/storages/csr/generic_view.h Uses atomic degree loads in get_edges() and typed neighbor iteration to avoid torn reads during concurrent writes.
Comments suppressed due to low confidence (1)

include/neug/storages/csr/mutable_csr.h:146

  • The degree is incremented via fetch_add(memory_order_release) before the new neighbor entry is fully written. A concurrent reader that bounds iteration by degree (acquire load) can observe the increased degree and read an uninitialized/partially-written slot. Reserve an index, write neighbor/data/timestamp, then publish the new degree with a release store (or otherwise ensure the release operation occurs after the entry is initialized).
    int32_t prev_size = sizes[src].fetch_add(1, std::memory_order_release);
    auto& nbr = buffers[src][prev_size];
    nbr.neighbor = dst;
    nbr.data = data;
    nbr.timestamp.store(ts);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return 0;
}
return degree_list_->GetDataSize() / sizeof(int);
return degree_list_->GetDataSize() / sizeof(std::atomic<int>);
Comment on lines 220 to +224
adj_list_buffer_->Resize(vnum * sizeof(nbr_t*));
degree_list_->Resize(vnum * sizeof(int));
cap_list_->Resize(vnum * sizeof(int));
auto** buf_arr = reinterpret_cast<nbr_t**>(adj_list_buffer_->GetData());
auto* sz_arr = reinterpret_cast<int*>(degree_list_->GetData());
auto* sz_arr = reinterpret_cast<std::atomic<int>*>(degree_list_->GetData());
Comment on lines +515 to 518
auto& nbr = buf_arr[src][sz_arr[src].fetch_add(1, std::memory_order_relaxed)];
nbr.neighbor = dst;
nbr.data = data;
nbr.timestamp.store(ts);
Comment on lines +409 to +414
// Atomic loads for torn-read safety: snapshot degree and buffer pointer
const int deg = reinterpret_cast<const std::atomic<int>*>(degrees)[v].load(
std::memory_order_acquire);
const MutableNbr<T>* base = adjlists[v];
const MutableNbr<T>* ptr = base + deg - 1;
const MutableNbr<T>* end = base - 1;
// Atomic load for torn-read safety: snapshot degree
const int deg = reinterpret_cast<const std::atomic<int>*>(degrees)[v].load(
std::memory_order_acquire);
const MutableNbr<T>* base = adjlists[v];
Comment on lines 616 to 620
const char* start_ptr = reinterpret_cast<const char*>(
reinterpret_cast<const int64_t*>(adjlists_)[v]);
ret.start_ptr = start_ptr;
ret.end_ptr = start_ptr + degrees_[v] * cfg_.stride;
ret.end_ptr = start_ptr + deg * cfg_.stride;
}
Comment on lines +694 to +695
// This is a regression test for the torn-read bug fixed by using
// std::atomic_ref in put_edge / GenericView / TypedView.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Segfault in TypedView::foreach_nbr_lt — likely concurrent CSR buffer reallocation race

2 participants