Skip to content

refactor: switch containers to mimalloc-aware aliases and use sel_t for row indices#470

Open
liulx20 wants to merge 2 commits into
alibaba:mainfrom
liulx20:refactor/execution-mimalloc-sel-t
Open

refactor: switch containers to mimalloc-aware aliases and use sel_t for row indices#470
liulx20 wants to merge 2 commits into
alibaba:mainfrom
liulx20:refactor/execution-mimalloc-sel-t

Conversation

@liulx20
Copy link
Copy Markdown
Collaborator

@liulx20 liulx20 commented Jun 3, 2026

Container changes (mostly cherry-picked from 562bfcc, plus follow-ups):

  • Add include/neug/utils/mi_allocator.h with neug_allocator (mi_stl_allocator when WITH_MIMALLOC, std::allocator otherwise) and aliases vector_t, sel_vec_t, flat_hash_map_t, flat_hash_set_t, string_t.
  • Sweep the execution layer: std::vector -> vector_t, std::map/std::unordered_map -> flat_hash_map_t, std::set/std::unordered_set -> flat_hash_set_t. All hash containers now route allocations through neug_allocator.

Row-index width:

  • sel_t (uint32_t) replaces size_t for row indices on the hot path:
    • Column accessors: get_elem/has_value/get_value/get_vertex/get_edge/ get_path now take sel_t.
    • Context::row_num() returns sel_t; expression eval_record(ctx, sel_t) and predicate operator()(ctx, sel_t) follow suit.
    • Row-iteration loop counters (against row_num / ctx.row_num() / col->size() / data_.size() / left_size / right_size / num_rows / nrows) become sel_t.
    • merge/insert ops take Context& ctx, sel_t row.
    • sel_vec_t replaces std::vector<size_t> for shuffle/dedup offsets and selection indices throughout the execution layer.

@liulx20 liulx20 changed the title refactor(execution): switch containers to mimalloc-aware aliases and use sel_t for row indices refactor: switch containers to mimalloc-aware aliases and use sel_t for row indices Jun 3, 2026
@liulx20 liulx20 force-pushed the refactor/execution-mimalloc-sel-t branch 7 times, most recently from 77440d8 to 163d49c Compare June 3, 2026 14:00
…use sel_t for row indices

Container changes (mostly cherry-picked from 562bfcc, plus follow-ups):
- Add include/neug/utils/mi_allocator.h with neug_allocator (mi_stl_allocator
  when WITH_MIMALLOC, std::allocator otherwise) and aliases vector_t,
  sel_vec_t, flat_hash_map_t, flat_hash_set_t, string_t.
- Sweep the execution layer: std::vector -> vector_t,
  std::map/std::unordered_map -> flat_hash_map_t,
  std::set/std::unordered_set -> flat_hash_set_t.
  All hash containers now route allocations through neug_allocator.

Row-index width:
- sel_t (uint32_t) replaces size_t for row indices on the hot path:
  - Column accessors: get_elem/has_value/get_value/get_vertex/get_edge/
    get_path now take sel_t.
  - Context::row_num() returns sel_t; expression eval_record(ctx, sel_t)
    and predicate operator()(ctx, sel_t) follow suit.
  - Row-iteration loop counters (against row_num / ctx.row_num() /
    col->size() / data_.size() / left_size / right_size / num_rows /
    nrows) become sel_t.
  - merge/insert ops take Context& ctx, sel_t row.
  - sel_vec_t replaces std::vector<size_t> for shuffle/dedup offsets and
    selection indices throughout the execution layer.

Other:
- Add std::hash<neug::Interval> so flat_hash_set_t<Interval> works in the
  generic dedup path of ValueColumn<T>::generate_dedup_offset.
- Adapt the GenericView -> CsrView and Schema::exist ->
  is_edge_triplet_valid / vertex_label_valid -> is_vertex_label_valid
  rename to main's current naming (the cherry-picked commit predated
  those upstream renames).
- Update execution tests (test_value_column.cc, test_runtime_column.cc)
  for the new sel_vec_t / flat_hash_set_t / sel_t signatures.

Build verified: cmake -DBUILD_EXECUTABLES=ON -DBUILD_HTTP_SERVER=ON
-DBUILD_TYPE=RELEASE -DBUILD_TEST=ON -DWITH_MIMALLOC=ON && make -j8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] munmap_chunk(): invalid pointer crash when WITH_MIMALLOC=ON and Arrow uses glibc allocator

1 participant