Skip to content

Deduplicate compute_number_of_edges CUB/Thrust kernels via explicit instantiation#5489

Open
bdice wants to merge 12 commits intorapidsai:mainfrom
bdice:deduplicate-compute-number-of-edges
Open

Deduplicate compute_number_of_edges CUB/Thrust kernels via explicit instantiation#5489
bdice wants to merge 12 commits intorapidsai:mainfrom
bdice:deduplicate-compute-number-of-edges

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented Apr 12, 2026

Summary

  • Moves edge_partition_device_view_t::compute_number_of_edges* and compute_local_degrees* method bodies out of the header edge_partition_device_view.cuh into a separate _impl.cuh file, with explicit template instantiations in 4 dedicated .cu translation units (SG/MG × v32/v64).
  • Replaces two lambda-based cuda::transform_iterator types in prims headers with named functors (bitmap_offset_to_vertex_op_t, sparse_hypersparse_major_op_t) so those iterator types can also be explicitly instantiated without leaking the heavy _impl.cuh back into broadly-included headers.
  • Unifies sync/async method pairs: sync compute_number_of_edges and compute_number_of_edges_with_mask are now thin inline wrappers that delegate to their _async counterparts via rmm::device_scalar, eliminating separate Thrust transform_reduce kernel instantiations.
  • Unifies mask/no-mask variants: merges local_degree_op_t and local_degree_with_mask_op_t into a single functor with a nullable uint32_t const* mask parameter. No-mask methods become inline wrappers that pass nullptr, reusing the same with-mask instantiations.
  • Deduplicates vertex_t* / vertex_t const* instantiations: SFINAE overloads const-cast non-const pointer arguments to const, so only vertex_t const* instantiations are needed in the .cu files.
  • Reduces combined binary size by ~83 MB (7.9%): libcugraph.so 475→456 MB, libcugraph_mg.so 576→512 MB.

Background

The 8 compute_number_of_edges* / compute_local_degrees* template methods on edge_partition_device_view_t were defined inline in edge_partition_device_view.cuh. Since this header is transitively included by ~44 .cu translation units (via prims headers), each TU compiled its own copies of the CUB DeviceReduce::Sum and Thrust transform_reduce kernels for every iterator type used. This was the single largest source of CUDA device code duplication in the library.

Approach

Following the existing codebase pattern (e.g., graph_view_impl.cuh + graph_view_sg_v32_e32.cu):

  1. compute_number_of_edges_impl.cuh: Contains the out-of-line method definitions for all methods on both MG and SG partial specializations.
  2. 4 .cu files: Each includes _impl.cuh and provides explicit instantiations for common MajorIterator types (vertex_t const*, counting_iterator<vertex_t>).
  3. Named functors: Two prims headers had rare cuda::transform_iterator<lambda> types that required the method definitions to be visible. Replacing the lambdas with named functors (compute_number_of_edges_functors.cuh) makes the iterator types nameable for explicit instantiation in the MG .cu files.
  4. od_shortest_distances_impl.cuh: Retains the _impl.cuh include for its own rare iterator type, but this file is only compiled by 2 SG TUs so the impact is negligible.
  5. Sync/async unification: All sync methods (compute_number_of_edges, compute_number_of_edges_with_mask) are inline wrappers that allocate an rmm::device_scalar<size_t>, call the _async variant, and return scalar.value(stream). This eliminates separate Thrust transform_reduce kernel codepaths — only CUB DeviceReduce::Sum kernels remain.
  6. Mask/no-mask unification: local_degree_op_t and local_degree_with_mask_op_t are merged into a single local_degree_op_t with a MaskIterator template parameter (defaulting to uint32_t const*). A compute_degree helper uses a runtime null check on the mask pointer: if (mask_first) calls count_set_bits, otherwise returns the raw degree. No-mask methods pass nullptr, reusing the exact same template instantiations as with-mask calls. The branch is uniform across all GPU threads (same pointer for all elements), so the prediction cost is negligible.
  7. Pointer const-cast deduplication: SFINAE-constrained overloads intercept non-const pointer MajorIterator arguments (T* where !std::is_const_v<T>) and forward to the T const* variant. This halves the pointer-type instantiations — only vertex_t const* is explicitly instantiated in the .cu files.

Size reduction breakdown

Stage libcugraph.so libcugraph_mg.so Combined
Baseline 475 MB 576 MB 1051 MB
Explicit instantiation 459 MB (-16) 520 MB (-56) 979 MB (-72)
Sync/async unification 456 MB (-3) 513 MB (-7) 969 MB (-10)
Mask/no-mask unification 456 MB (-0) 512 MB (-1) 968 MB (-1)
Pointer const-cast dedup 456 MB (-0) 512 MB (-0) 968 MB (-0)
Latest Changes TBD TBD TBD
Total 456 MB (-19, -4.0%) 512 MB (-64, -11.1%) 968 MB (-83, -7.9%)

Instantiation count: 80 → 24 across the 4 dedicated .cu files.

Latest edits to software, tested on CUDA 12:

Stage libcugraph.so libcugraph_mg.so Combined
Baseline 974 MB 1316 MB 2290 MB
Latest Changes 914 MB (-60) 1050 MB (-266) 1964 MB (-326)
Total 914 MB (-60, -6.2%) 1050 MB (-266, -20.2%) 1964 MB (-326, -14.2%)

Latest edits to software, tested on CUDA 13:

Stage libcugraph.so libcugraph_mg.so Combined
Baseline 494 MB 620 MB 1114 MB
Latest Changes 473 MB (-21) 549 MB (-70) 1022 MB (-91)
Total 473 MB (-21 4.3%) 549 MB (-70, -11.3%) 1022 MB (-91, -8.2%)

Testing

All 92 C++ tests pass (test-cugraph-cpp -j10).

…nstantiation

Move edge_partition_device_view_t::compute_number_of_edges* and
compute_local_degrees* method bodies from inline definitions in
edge_partition_device_view.cuh to a separate _impl.cuh file, with
explicit template instantiations in dedicated .cu translation units.

This prevents 40+ TUs from each compiling their own copy of the
CUB DeviceReduce and Thrust transform_reduce kernels for these methods.

Two prims headers that required rare cuda::transform_iterator types
(using lambdas) had those lambdas replaced with named functors so
the iterator types can be explicitly instantiated without leaking
the heavy _impl.cuh back into broadly-included headers.

Binary size reduction vs baseline:
  libcugraph.so:    475 MB -> 459 MB (-16 MB, -3.4%)
  libcugraph_mg.so: 576 MB -> 520 MB (-56 MB, -9.7%)
  Combined:         1051 MB -> 979 MB (-72 MB, -6.9%)
@bdice bdice requested review from a team as code owners April 12, 2026 17:10
@bdice bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 12, 2026
Copy link
Copy Markdown
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, this looks like a right approach and will be very helpful in cutting cuGraph binary size. I have some suggestions to better follow cuGraph coding conventions.

{
return range_first + static_cast<vertex_t>(v_offset);
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped this file.

return range_first + static_cast<vertex_t>(v_offset);
}));
detail::bitmap_offset_to_vertex_op_t<vertex_t>{
local_frontier_range_firsts[partition_idx]});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work better than re-defining a new functor.

#include <cugraph/utilities/device_functors.cuh>
...
detail::shift_right_t<vertex_t>{local_frontier_range_firsts[partition_idx]}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to use shift_right_t

if (i < major_sparse_range_size) { return major_range_first + i; }
return *(dcs_nzd_vertices + (i - major_sparse_range_size));
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is pretty much of the opposite of

https://github.com/rapidsai/cugraph/blob/main/cpp/include/cugraph/edge_partition_device_view.cuh#L36

(but the function above is a function, this is a functor).

We may better co-locate these two functions in a single file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion is to create

major_from_major_hypersparse_idx_nocheck_impl and create a functor using this function.

sh_iter_32_t,
sh_iter_32_t,
rmm::cuda_stream_view) const;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here we are explicitly specializing for

MajorIterator = vertex_t const*
or
MajorIterator = thrust::counting_iterator<vertex_t>

and

MaskIterator = uint32_t const*

To make this more explicit,

what about creating

compute_(number_of_edges|local_degrees)(_with_mask)(_async) functions that take

raft::device_span<vertex_t const> majors,
raft::device_span<uint32_t const> masks

or

raft::device_span<uint32_t const> masks,
vertex_t major_first,
vertex_t major_last

and implement these function using a detail space function taking MaskIterator and MajorIterator?

This will make using public standard functions more explicit (and better resembles cugraph conventions) while stilling allowing to use the detail space function that works with any valid thrust fancy iterators.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one complication this introduces is that

inside a primitive we often don't know whether a passed iterator is a pointer or a fancy iterator. So, we may need to do something like

if constexpr (std::is_pointer_v) {
...
}
else {
...
}

but the simplification in the edge_partition_device_view_t class's public interface might be worth more than the increase in the code complexity in primitive internals.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the interface to do raft::device_span for the masks. I left the iterator approach in place for MajorIterator. extract_transform_if_v_frontier_e passes a transform iterator that has optimizations for a bit mask compacted frontier. I didn't see a clean way to make an API that would accommodate that without making a very specific interface. It seemed like leaving MajorIterator as is was cleaner to me.

handle.get_stream());
auto edge_partition_e_mask_span =
raft::device_span<uint32_t const>(edge_partition_e_mask->value_first(),
static_cast<size_t>(edge_partition.number_of_edges()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast<size_t>(edge_partition.number_of_edges())

=>

packed_bool_size(static_cast<size_t>(edge_partition.number_of_edges()))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are providing the wrong size in multiple places in this PR, we need to fix all of them.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

edge_partition.compute_number_of_edges_with_mask_async(
auto edge_partition_e_mask_span = raft::device_span<uint32_t const>(
edge_partition_e_mask->value_first(),
static_cast<size_t>(edge_partition.number_of_edges()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the size.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

edge_partition.compute_number_of_edges_with_mask_async(
auto edge_partition_e_mask_span = raft::device_span<uint32_t const>(
edge_partition_e_mask->value_first(),
static_cast<size_t>(edge_partition.number_of_edges()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the size.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

(*edge_partition_e_mask).value_first(),
raft::device_span<uint32_t const>(
(*edge_partition_e_mask).value_first(),
static_cast<size_t>(edge_partition.number_of_edges())),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the size.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if (edge_partition_e_mask) {
edge_partition_mask_span =
raft::device_span<uint32_t const>((*edge_partition_e_mask).value_first(),
static_cast<size_t>(edge_partition.number_of_edges()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the size.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

(*edge_partition_e_mask).value_first(), handle.get_stream());
auto edge_partition_mask_span =
raft::device_span<uint32_t const>((*edge_partition_e_mask).value_first(),
static_cast<size_t>(edge_partition.number_of_edges()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the size.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if (edge_partition_e_mask) {
auto edge_partition_mask_span =
raft::device_span<uint32_t const>((*edge_partition_e_mask).value_first(),
static_cast<size_t>(edge_partition.number_of_edges()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the size.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread cpp/include/cugraph/edge_partition_device_view.cuh
template void view_t::compute_number_of_edges_with_mask_async<int32_t const*>(
raft::device_span<uint32_t const>,
int32_t const*,
int32_t const*,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking both raft::device_span and raw pointers look a bit odd here. Should we better take both edge mask and vertex list using device_span?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up in latest push

template void view_t::compute_number_of_edges_with_mask_async<thrust::counting_iterator<int32_t>>(
raft::device_span<uint32_t const>,
thrust::counting_iterator<int32_t>,
thrust::counting_iterator<int32_t>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take local_vertex_partition_range_first, and local_vertex_partition_range_last intead? (and create counting iterators inside?) It will be more consistent with the rest of cugraph.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it a tuple of range_first/range_last, compiler was having trouble disambiguating.

thrust_tuple_get_or_identity<decltype(frontier_key_last), 0>(frontier_key_last);
auto frontier_majors = raft::device_span<vertex_t const>{
thrust_tuple_get_or_identity<decltype(frontier_key_first), 0>(frontier_key_first),
static_cast<size_t>(cuda::std::distance(frontier_key_first, frontier_key_last))};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have repeated_vertex_bucket_view_t; this generates keys on the fly using transform_iterator.
https://github.com/rapidsai/cugraph/blob/main/cpp/include/cugraph/prims/vertex_frontier.cuh#L714

If repeated_vertex_bucket_view_t is used, this code won't work.

I think we should support this case as well.

if constexpr (std::is_pointer_v<decltype(thrust_tuple_get_or_identity<decltype(frontier_key_first), 0>(frontier_key_first)>) {
  // call the common path function
}
else {
  // call the detail path function
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push.

packed_bool_size(static_cast<size_t>(edge_partition.number_of_edges())));
edge_partition.compute_number_of_edges_with_mask_async(
edge_partition_e_mask_span,
raft::device_span<vertex_t const>{major_first, num_keys},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something very minor but should we mix (ptr, size) and {ptr, size}? Better stick to the first approach for consistency?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

edge_partition.compute_number_of_edges_async(
major_first,
major_first + num_keys,
raft::device_span<vertex_t const>{major_first, num_keys},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something very minor but should we mix (ptr, size) and {ptr, size}? Better stick to the first approach for consistency?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

packed_bool_size(static_cast<size_t>(edge_partition.number_of_edges())));
ret += edge_partition.compute_number_of_edges_with_mask(
edge_partition_mask_span,
raft::device_span<vertex_t const>{local_frontier_vertex_first, frontier.size()},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something very minor but should we mix (ptr, size) and {ptr, size}? Better stick to the first approach for consistency?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

local_frontier_vertex_first + frontier.size(),
handle.get_stream());
ret += edge_partition.compute_number_of_edges(
raft::device_span<vertex_t const>{local_frontier_vertex_first, frontier.size()},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something very minor but should we mix (ptr, size) and {ptr, size}? Better stick to the first approach for consistency?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* SPDX-FileCopyrightText: Copyright (c) 2023-2026, NVIDIA CORPORATION.
* SPDX-License-Identifier: Apache-2.0
*/
#include "structure/edge_partition_device_view_impl.cuh"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we delete this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

* SPDX-FileCopyrightText: Copyright (c) 2023-2026, NVIDIA CORPORATION.
* SPDX-License-Identifier: Apache-2.0
*/
#include "structure/edge_partition_device_view_impl.cuh"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we delete this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

Comment on lines +30 to +38
using view_t = edge_partition_device_view_t<int64_t, int64_t, false>;
using od_extract_iter_t =
cuda::transform_iterator<extract_v_t<int64_t, uint32_t, uint64_t>, uint64_t const*>;
template void view_t::compute_number_of_edges_with_mask_async<od_extract_iter_t>(
raft::device_span<uint32_t const>,
od_extract_iter_t,
od_extract_iter_t,
raft::device_span<size_t>,
rmm::cuda_stream_view) const;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we delete this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted


__device__ return_type_t compute_degree(edge_t offset, edge_t degree) const
{
if (mask_first) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am debating whether we should be more explicit about the existence of mask here.

Say MaskIterator is not a pointer.

if (mask_first) will still be false with a default constructed fancy iterators? Maybe... yes, maybe... not.

Should we better be more explicit and use cuda::std::optional?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mimicking your original code, and we don't currently have any examples where it's not a raw pointer.

However, I'll make that change, it's simple enough.


template __host__ rmm::device_uvector<edge_t> compute_local_degrees_with_mask_sg(
raft::device_span<uint32_t const> edge_mask,
std::tuple<vertex_t, vertex_t> vertex_partition_range,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am debating between using "std::tuple" here versus using two separate vertex_t input parameters.

The former sounds better if we are freshly re-designing the API, but the latter looks a bit more consistent with the rest of the code base.

It seems like we are using two variables for input parameters while using the std::tuple for return values. What do you think?

(base) seunghwak@SK:~/RAPIDS/cugraph/cpp/include$ grep -rHn "range_last," .
./cugraph/edge_partition_view.hpp:51:                        vertex_t major_range_last,
./cugraph/edge_partition_view.hpp:53:                        vertex_t minor_range_last,
./cugraph/graph_view.hpp:644:      major_range_last,
./cugraph/graph_view.hpp:646:      minor_range_last,
./cugraph/detail/decompress_edge_partition.cuh:37:  vertex_t major_range_last,
./cugraph/detail/decompress_edge_partition.cuh:67:  vertex_t major_range_last,
./cugraph/prims/transform_reduce_e_by_src_dst_key.cuh:184:  typename GraphViewType::vertex_type major_range_last,
./cugraph/prims/transform_reduce_e_by_src_dst_key.cuh:270:  typename GraphViewType::vertex_type major_range_last,
./cugraph/prims/transform_reduce_e_by_src_dst_key.cuh:369:  typename GraphViewType::vertex_type major_range_last,
./cugraph/prims/vertex_frontier.cuh:123:  typename thrust::iterator_traits<VertexIterator>::value_type vertex_range_last,
./cugraph/prims/vertex_frontier.cuh:170:  typename thrust::iterator_traits<InputVertexIterator>::value_type vertex_range_last,
./cugraph/prims/vertex_frontier.cuh:212:  typename thrust::iterator_traits<OutputVertexIterator>::value_type vertex_range_last,
./cugraph/prims/detail/per_v_transform_reduce_e.cuh:2056:                  range_last = std::min(range_last, *(edge_partition.major_hypersparse_first()));
./cugraph/prims/detail/per_v_transform_reduce_e.cuh:2080:                    range_last,
./cugraph/prims/detail/extract_transform_if_v_frontier_e.cuh:1338:                range_last,
./cugraph/prims/transform_reduce_e.cuh:148:  typename GraphViewType::vertex_type major_range_last,
./cugraph/prims/transform_reduce_e.cuh:237:  typename GraphViewType::vertex_type major_range_last,
./cugraph/prims/transform_reduce_e.cuh:315:  typename GraphViewType::vertex_type major_range_last,
(base) seunghwak@SK:~/RAPIDS/cugraph/cpp/include$ grep -rHn "tuple<vertex_t, vertex_t>" .
./cugraph/src_dst_lookup_container.hpp:26:          typename value_t = cuda::std::tuple<vertex_t, vertex_t>>
./cugraph/graph_view.hpp:116:  std::tuple<vertex_t, vertex_t> local_vertex_partition_range() const
./cugraph/graph_view.hpp:137:  std::tuple<vertex_t, vertex_t> vertex_partition_range(size_t partition_idx) const
./cugraph/graph_view.hpp:167:  std::tuple<vertex_t, vertex_t> local_edge_partition_major_range(size_t partition_idx) const
./cugraph/graph_view.hpp:196:  std::tuple<vertex_t, vertex_t> local_edge_partition_minor_range() const
./cugraph/graph_view.hpp:398:  std::tuple<vertex_t, vertex_t> local_vertex_partition_range() const
./cugraph/graph_view.hpp:863:  std::tuple<vertex_t, vertex_t> local_vertex_partition_range() const
./cugraph/prims/transform_reduce_dst_nbr_intersection_of_e_endpoints_by_v.cuh:50:  __device__ int operator()(cuda::std::tuple<vertex_t, vertex_t> tup) const
./cugraph/prims/per_v_transform_reduce_dst_key_aggregated_outgoing_e.cuh:88:  operator()(cuda::std::tuple<vertex_t, vertex_t> val /* major, minor key */) const
./cugraph/prims/per_v_transform_reduce_dst_key_aggregated_outgoing_e.cuh:97:  __device__ bool operator()(cuda::std::tuple<vertex_t, vertex_t> pair) const
./cugraph/prims/per_v_transform_reduce_dst_key_aggregated_outgoing_e.cuh:137:  operator()(cuda::std::tuple<vertex_t, vertex_t> val /* major, minor key */) const
./cugraph/prims/detail/nbr_intersection.cuh:274:  __device__ edge_t operator()(cuda::std::tuple<vertex_t, vertex_t> pair) const
./cugraph/prims/detail/nbr_intersection.cuh:702:                               cuda::std::tuple<vertex_t, vertex_t>>);
./cugraph/prims/detail/nbr_intersection.cuh:760:          vertex_pair_first, thrust_tuple_get<cuda::std::tuple<vertex_t, vertex_t>, size_t{1}>{});
./cugraph/prims/detail/nbr_intersection.cuh:1122:        vertex_pair_first, thrust_tuple_get<cuda::std::tuple<vertex_t, vertex_t>, size_t{0}>{});
./cugraph/prims/detail/nbr_intersection.cuh:1181:        auto vertex_pair_buffer = allocate_dataframe_buffer<cuda::std::tuple<vertex_t, vertex_t>>(
./cugraph/utilities/graph_partition_utils.cuh:117:    cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const
./cugraph/utilities/graph_partition_utils.cuh:158:  __device__ int operator()(cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const
./cugraph/utilities/graph_partition_utils.cuh:193:    cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const
./cugraph/utilities/graph_partition_utils.cuh:225:  __device__ int operator()(cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const
./cugraph/utilities/graph_partition_utils.cuh:259:    cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const
./cugraph/utilities/graph_partition_utils.cuh:282:  __device__ int operator()(cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const
./cugraph/utilities/error_check_utils.cuh:38:  __device__ bool operator()(cuda::std::tuple<vertex_t, vertex_t> pair) const
./cugraph/graph_generators.hpp:206:  std::vector<std::tuple<vertex_t, vertex_t>> const& component_parameters_v);
./cugraph/graph_generators.hpp:284:  std::vector<std::tuple<vertex_t, vertex_t>> const& component_parameters_v);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the tuple because it makes it obvious that the two parameters are related. I agree it's an API change. We could either add a FIXME to update the rest of the API or I suppose we could ripple through the rest of the API in this PR also.

I can switch it back if you think that's better and we can grapple with it later. Switching it back will require using SFINAE syntax to disambiguate passing vertex_partition_range_first/vertex_partition_range_last from passing MajorIterator pairs, or combining those into one call and disambiguating the call to the detail implementation via constexpr syntax.

Copy link
Copy Markdown
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants