Skip to content

feat: Introduce cow-based Update Transaction#370

Open
zhanglei1949 wants to merge 3 commits into
alibaba:mainfrom
zhanglei1949:zl/update-txn-cow
Open

feat: Introduce cow-based Update Transaction#370
zhanglei1949 wants to merge 3 commits into
alibaba:mainfrom
zhanglei1949:zl/update-txn-cow

Conversation

@zhanglei1949
Copy link
Copy Markdown
Member

@zhanglei1949 zhanglei1949 commented May 18, 2026

Fix #330

This PR contains two parts:

  • Workspace & checkpoint implementation
  • Update Copy on write implementation.

The Workspace & checkpoint implementation is based on #148

virtual void resize(vid_t vnum) = 0;

virtual size_t capacity() const = 0;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close() is now inherited from Module.

}

void dump(const std::string& filename) override {
bool is_data_unmodified() const {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Could we avoid casting data_buffer to MMapContainer*?

@zhanglei1949 zhanglei1949 requested a review from Copilot May 20, 2026 03:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@zhanglei1949 zhanglei1949 force-pushed the zl/update-txn-cow branch 2 times, most recently from 6cbf42f to 754e8e3 Compare May 22, 2026 02:43
@zhanglei1949 zhanglei1949 force-pushed the zl/update-txn-cow branch 4 times, most recently from 1afcc6e to a422a14 Compare May 25, 2026 06:25
@zhanglei1949 zhanglei1949 force-pushed the zl/update-txn-cow branch 3 times, most recently from a257c67 to 89cdad4 Compare May 27, 2026 08:46
zhanglei1949 added a commit that referenced this pull request Jun 1, 2026
…dary (#422)

## What do these changes do?

Harden `PropertyGraph`'s label-id and edge-triplet validation so that
bad caller-supplied IDs no longer become undefined behavior in Release
builds, and unify the existing private check helpers so the string and
`label_t` overloads behave consistently.

- Add `label_t` overloads of the private helpers and make all four
overloads `const`. Both `edge_triplet_check` overloads now delegate
per-vertex validation to `vertex_label_check`, so they share logic and
produce equally granular errors:
  ```cpp
  Status vertex_label_check(const std::string&) const;
  Status vertex_label_check(label_t) const;
  Status edge_triplet_check(const std::string&, const std::string&,
                            const std::string&) const;
  Status edge_triplet_check(label_t, label_t, label_t) const;
  ```
- **Mutating APIs** (`BatchAddVertices`, `BatchDeleteVertices`,
`DeleteVertex` ×2, `DeleteEdge`, `BatchDeleteEdges` ×2, `AddVertex`,
`UpdateVertexProperty`, `UpdateEdgeProperty`) now
`RETURN_IF_NOT_OK(vertex_label_check(...) / edge_triplet_check(...))`
instead of `assert(...)` or relying on the callee to catch the bad ID.
- **Non-Status accessors** (`get_vertex_table` ×2,
`GetVertexPropertyColumn` ×2, `GetVertexSet`, `LidNum`, `VertexNum`,
`IsValidLid`, `get_lid`, `GetOid`) now call
`schema_.ensure_vertex_label_valid(...)` (throws
`InvalidArgumentException`) — the only viable signal channel for
non-`Status` returns.
- `GetVertexPropertyColumn(label, col_id)` additionally bounds-checks
`col_id` against the label's property count.
- Unify error wording across overloads: `"Vertex label '<name>' is not
valid"` / `"Vertex label id <N> is not valid"`; same `"Edge triplet
<...> is not valid"` shape. Drop the `LOG(ERROR)` calls from the
existing string-based helpers (caller decides).

No behavior change for valid inputs. Verified no tests assert on the old
error wording.

## Background

These changes were originally bundled inside the `zl/update-txn-cow`
branch (PR #370 split work) but have no semantic dependency on the
workspace/checkpoint or COW refactor. Splitting them out keeps those
downstream PRs focused and lets this hardening land independently
against `main`.

## Related issue number

Fixes #421

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
refine module_store and module_desc

refactor snapshot_meta/checkpoint, remove unncessary changes

remove module store

minor ref

ref wal apply

refine

refine checkpoint manage

fix ci

minor

refine

fix

minor refine

refine module register macro

fix format

rm file_names.h

minor refine

minor refine

supporing fast dump

remove some cmments

TBD: add dirty flag for MutableCSR, to avoid redumping the data

minor fix

remove method generate_uuid

fix test

revert is_dirty_ flag and compute md5 for mutable_csr's nbr_list before dump

fix test

check whether data_buffer is modified at column level

refine

remove unused field last_descriptor

cherry-pick c964e58

fix issues reported by aone copilot

two fold check

remove some comment

revert changes to indexer_ in VertexTable

minor

stash changes to cow

fix tests

fix format

Committed-by: xiaolei.zl from Dev container

minor changes

fix test

fixing

fix version manager and tests

fix test

refactor: consolidate Schema vertex/edge label check API

Unify the naming convention of label/property existence and validity
checks on Schema (and the schema-check helpers it surfaces through
PropertyGraph).

Renames (snake_case `is_*` family):
- contains_vertex_label / vertex_label_valid  -> is_vertex_label_valid
- contains_edge_label   / edge_label_valid    -> is_edge_label_valid
- exist(...)                                  -> is_edge_triplet_valid(...)
- has_edge_label(...)                         -> has_edge_triplet(...)
- IsVertexLabelSoftDeleted                    -> is_vertex_label_soft_deleted (then removed)
- IsEdgeLabelSoftDeleted                      -> is_edge_label_soft_deleted (then removed)
- IsVertexPropertySoftDeleted                 -> is_vertex_property_soft_deleted (then removed)
- IsEdgePropertySoftDeleted                   -> is_edge_property_soft_deleted (then removed)

Removed (zero production callers):
- Schema::is_*_soft_deleted (8 overloads) + corresponding test cases
- Schema::edge_triplet_valid (weak tomb-only check; merged into the
  stronger is_edge_triplet_valid)
- PropertyGraph::edge_triplet_exist (dead code, no callers)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix format

some renaming

fixing

append changes

rm cache file

refactor: extract read-side view layer over PropertyGraph storage

Add VertexTableView / EdgeTableView / GraphView / TableView wrapping the
existing storage components without back-pointers to the owning storage.
View objects capture raw pointers to the heap-stable parts of each storage
component (shared_ptr-managed indexer / v_ts / Csr / Table) at construction;
no mutation flows through the underlying storage handle they were built from.

Required two small additions on the underlying side:

- RefColumnBase::set(idx, value), implemented on TypedRefColumn<T> and
  TypedRefColumn<string_view>, so TableView can write through a column ref.
- VertexTable holds a stable shared_ptr<RefColumnBase> wrapping
  indexer_->get_keys(), so VertexTableView can borrow a raw pointer for
  primary-key-by-name lookups without owning it.

Friend declarations added on EdgeTable / VertexTable / PropertyGraph so the
sub-view ctors can capture private state without exposing it publicly.

This PR introduces the view types only; no caller is migrated. ReadTransaction
/ InsertTransaction / StorageReadInterface migration is left for a follow-up,
where the view becomes the stable observation point that the COW
UpdateTransaction work in alibaba#86 needs.

Fixes alibaba#400

fix test

ref

introduce views

Remove all insert api

minor

format

Committed-by: xiaolei.zl from Dev container

format

minor

docs: update extensions index.md to reflect JSON built-in status and use PARQUET examples (alibaba#406)

fix alibaba#405

Updates `doc/source/extensions/index.md` so that the extensions overview
is consistent with `doc/source/extensions/load_json.md`, where JSON has
been documented as a built-in feature since v0.1.2.

1. In the "Available Extensions" table, annotate the JSON row with
`(built-in since v0.1.2)` so readers know JSON no longer requires
`INSTALL` / `LOAD`.

fix

remove uneeded files

remove unneeded views

update tests

remove unneeded fields for csrview

refine interface

explicitly call PrepareForXXX
zhanglei1949 and others added 2 commits June 5, 2026 11:02
…ship tracking

Introduce CowRef<T> wrapper that tracks exclusive ownership via a boolean
flag, eliminating all shared_ptr::use_count() calls (fragile, deprecated
in concurrent contexts). Replace pointer-arithmetic aliasing detection in
MutableCsr with a per-vertex adjlist_owned_ bitset. Merge PrepareFor*
COW-prep into PropertyGraph DML methods so callers cannot forget the
protocol. Add ForkAsShared helpers to CsrBase, ColumnBase, and
VertexTimestamp. Make EdgeTable/VertexTable PrepareFor* methods private.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implementation COW for UpdateTransaction

2 participants