Deduplicate compute_number_of_edges CUB/Thrust kernels via explicit instantiation by bdice · Pull Request #5489 · rapidsai/cugraph

bdice · 2026-04-12T17:10:52Z

Summary

Moves edge_partition_device_view_t::compute_number_of_edges* and compute_local_degrees* method bodies out of the header edge_partition_device_view.cuh into a separate _impl.cuh file, with explicit template instantiations in 4 dedicated .cu translation units (SG/MG × v32/v64).
Replaces two lambda-based cuda::transform_iterator types in prims headers with named functors (bitmap_offset_to_vertex_op_t, sparse_hypersparse_major_op_t) so those iterator types can also be explicitly instantiated without leaking the heavy _impl.cuh back into broadly-included headers.
Unifies sync/async method pairs: sync compute_number_of_edges and compute_number_of_edges_with_mask are now thin inline wrappers that delegate to their _async counterparts via rmm::device_scalar, eliminating separate Thrust transform_reduce kernel instantiations.
Unifies mask/no-mask variants: merges local_degree_op_t and local_degree_with_mask_op_t into a single functor with a nullable uint32_t const* mask parameter. No-mask methods become inline wrappers that pass nullptr, reusing the same with-mask instantiations.
Deduplicates vertex_t* / vertex_t const* instantiations: SFINAE overloads const-cast non-const pointer arguments to const, so only vertex_t const* instantiations are needed in the .cu files.
Reduces combined binary size by ~83 MB (7.9%): libcugraph.so 475→456 MB, libcugraph_mg.so 576→512 MB.

Background

The 8 compute_number_of_edges* / compute_local_degrees* template methods on edge_partition_device_view_t were defined inline in edge_partition_device_view.cuh. Since this header is transitively included by ~44 .cu translation units (via prims headers), each TU compiled its own copies of the CUB DeviceReduce::Sum and Thrust transform_reduce kernels for every iterator type used. This was the single largest source of CUDA device code duplication in the library.

Approach

Following the existing codebase pattern (e.g., graph_view_impl.cuh + graph_view_sg_v32_e32.cu):

compute_number_of_edges_impl.cuh: Contains the out-of-line method definitions for all methods on both MG and SG partial specializations.
4 .cu files: Each includes _impl.cuh and provides explicit instantiations for common MajorIterator types (vertex_t const*, counting_iterator<vertex_t>).
Named functors: Two prims headers had rare cuda::transform_iterator<lambda> types that required the method definitions to be visible. Replacing the lambdas with named functors (compute_number_of_edges_functors.cuh) makes the iterator types nameable for explicit instantiation in the MG .cu files.
od_shortest_distances_impl.cuh: Retains the _impl.cuh include for its own rare iterator type, but this file is only compiled by 2 SG TUs so the impact is negligible.
Sync/async unification: All sync methods (compute_number_of_edges, compute_number_of_edges_with_mask) are inline wrappers that allocate an rmm::device_scalar<size_t>, call the _async variant, and return scalar.value(stream). This eliminates separate Thrust transform_reduce kernel codepaths — only CUB DeviceReduce::Sum kernels remain.
Mask/no-mask unification: local_degree_op_t and local_degree_with_mask_op_t are merged into a single local_degree_op_t with a MaskIterator template parameter (defaulting to uint32_t const*). A compute_degree helper uses a runtime null check on the mask pointer: if (mask_first) calls count_set_bits, otherwise returns the raw degree. No-mask methods pass nullptr, reusing the exact same template instantiations as with-mask calls. The branch is uniform across all GPU threads (same pointer for all elements), so the prediction cost is negligible.
Pointer const-cast deduplication: SFINAE-constrained overloads intercept non-const pointer MajorIterator arguments (T* where !std::is_const_v<T>) and forward to the T const* variant. This halves the pointer-type instantiations — only vertex_t const* is explicitly instantiated in the .cu files.

Size reduction breakdown

Stage	`libcugraph.so`	`libcugraph_mg.so`	Combined
Baseline	475 MB	576 MB	1051 MB
Explicit instantiation	459 MB (-16)	520 MB (-56)	979 MB (-72)
Sync/async unification	456 MB (-3)	513 MB (-7)	969 MB (-10)
Mask/no-mask unification	456 MB (-0)	512 MB (-1)	968 MB (-1)
Pointer const-cast dedup	456 MB (-0)	512 MB (-0)	968 MB (-0)
Latest Changes	TBD	TBD	TBD
Total	456 MB (-19, -4.0%)	512 MB (-64, -11.1%)	968 MB (-83, -7.9%)

Instantiation count: 80 → 24 across the 4 dedicated .cu files.

Latest edits to software, tested on CUDA 12:

Stage	`libcugraph.so`	`libcugraph_mg.so`	Combined
Baseline	974 MB	1316 MB	2290 MB
Latest Changes	914 MB (-60)	1050 MB (-266)	1964 MB (-326)
Total	914 MB (-60, -6.2%)	1050 MB (-266, -20.2%)	1964 MB (-326, -14.2%)

Latest edits to software, tested on CUDA 13:

Stage	`libcugraph.so`	`libcugraph_mg.so`	Combined
Baseline	494 MB	620 MB	1114 MB
Latest Changes	473 MB (-21)	549 MB (-70)	1022 MB (-91)
Total	473 MB (-21 4.3%)	549 MB (-70, -11.3%)	1022 MB (-91, -8.2%)

Testing

All 92 C++ tests pass (test-cugraph-cpp -j10).

…nstantiation Move edge_partition_device_view_t::compute_number_of_edges* and compute_local_degrees* method bodies from inline definitions in edge_partition_device_view.cuh to a separate _impl.cuh file, with explicit template instantiations in dedicated .cu translation units. This prevents 40+ TUs from each compiling their own copy of the CUB DeviceReduce and Thrust transform_reduce kernels for these methods. Two prims headers that required rare cuda::transform_iterator types (using lambdas) had those lambdas replaced with named functors so the iterator types can be explicitly instantiated without leaking the heavy _impl.cuh back into broadly-included headers. Binary size reduction vs baseline: libcugraph.so: 475 MB -> 459 MB (-16 MB, -3.4%) libcugraph_mg.so: 576 MB -> 520 MB (-56 MB, -9.7%) Combined: 1051 MB -> 979 MB (-72 MB, -6.9%)

…via inline wrappers

…op_t and local_degree_with_mask_op_t into single functor

…st-cast overloads

seunghwak

Thanks for the contribution, this looks like a right approach and will be very helpful in cutting cuGraph binary size. I have some suggestions to better follow cuGraph coding conventions.

seunghwak · 2026-04-13T19:47:25Z

+  {
+    return range_first + static_cast<vertex_t>(v_offset);
+  }
+};


We already have shift_right_t, no need to re-define here.

See
https://github.com/rapidsai/cugraph/blob/main/cpp/include/cugraph/utilities/device_functors.cuh#L157

Dropped this file.

seunghwak · 2026-04-13T21:19:34Z

-                      return range_first + static_cast<vertex_t>(v_offset);
-                    }));
+                  detail::bitmap_offset_to_vertex_op_t<vertex_t>{
+                    local_frontier_range_firsts[partition_idx]});


This should work better than re-defining a new functor.

#include <cugraph/utilities/device_functors.cuh> ... detail::shift_right_t<vertex_t>{local_frontier_range_firsts[partition_idx]}

Modified to use shift_right_t

seunghwak · 2026-04-13T23:27:19Z

+    if (i < major_sparse_range_size) { return major_range_first + i; }
+    return *(dcs_nzd_vertices + (i - major_sparse_range_size));
+  }
+};


This function is pretty much of the opposite of

https://github.com/rapidsai/cugraph/blob/main/cpp/include/cugraph/edge_partition_device_view.cuh#L36

(but the function above is a function, this is a functor).

We may better co-locate these two functions in a single file.

One suggestion is to create

major_from_major_hypersparse_idx_nocheck_impl and create a functor using this function.

seunghwak · 2026-04-14T00:09:29Z

+                                                                       sh_iter_32_t,
+                                                                       sh_iter_32_t,
+                                                                       rmm::cuda_stream_view) const;
+


So, here we are explicitly specializing for

MajorIterator = vertex_t const*
or
MajorIterator = thrust::counting_iterator<vertex_t>

and

MaskIterator = uint32_t const*

To make this more explicit,

what about creating

compute_(number_of_edges|local_degrees)(_with_mask)(_async) functions that take

raft::device_span<vertex_t const> majors,
raft::device_span<uint32_t const> masks

or

raft::device_span<uint32_t const> masks,
vertex_t major_first,
vertex_t major_last

and implement these function using a detail space function taking MaskIterator and MajorIterator?

This will make using public standard functions more explicit (and better resembles cugraph conventions) while stilling allowing to use the detail space function that works with any valid thrust fancy iterators.

Maybe one complication this introduces is that

inside a primitive we often don't know whether a passed iterator is a pointer or a fancy iterator. So, we may need to do something like

if constexpr (std::is_pointer_v) {
...
}
else {
...
}

but the simplification in the edge_partition_device_view_t class's public interface might be worth more than the increase in the code complexity in primitive internals.

I modified the interface to do raft::device_span for the masks. I left the iterator approach in place for MajorIterator. extract_transform_if_v_frontier_e passes a transform iterator that has optimizations for a bit mask compacted frontier. I didn't see a clean way to make an API that would accommodate that without making a very specific interface. It seemed like leaving MajorIterator as is was cleaner to me.

…ore consistent with other naming

… od_shortest_path

seunghwak · 2026-04-27T21:46:43Z

-                                                         handle.get_stream());
+      auto edge_partition_e_mask_span =
+        raft::device_span<uint32_t const>(edge_partition_e_mask->value_first(),
+                                          static_cast<size_t>(edge_partition.number_of_edges()));


static_cast<size_t>(edge_partition.number_of_edges())

=>

packed_bool_size(static_cast<size_t>(edge_partition.number_of_edges()))

I think we are providing the wrong size in multiple places in this PR, we need to fix all of them.

seunghwak · 2026-04-27T23:53:13Z

-                  edge_partition.compute_number_of_edges_with_mask_async(
+                  auto edge_partition_e_mask_span = raft::device_span<uint32_t const>(
                    edge_partition_e_mask->value_first(),
+                    static_cast<size_t>(edge_partition.number_of_edges()));


Fix the size.

seunghwak · 2026-04-27T23:53:22Z

-                edge_partition.compute_number_of_edges_with_mask_async(
+                auto edge_partition_e_mask_span = raft::device_span<uint32_t const>(
                  edge_partition_e_mask->value_first(),
+                  static_cast<size_t>(edge_partition.number_of_edges()));


Fix the size.

seunghwak · 2026-04-27T23:56:34Z

-            (*edge_partition_e_mask).value_first(),
+            raft::device_span<uint32_t const>(
+              (*edge_partition_e_mask).value_first(),
+              static_cast<size_t>(edge_partition.number_of_edges())),


Fix the size.

seunghwak · 2026-04-27T23:56:54Z

+    if (edge_partition_e_mask) {
+      edge_partition_mask_span =
+        raft::device_span<uint32_t const>((*edge_partition_e_mask).value_first(),
+                                          static_cast<size_t>(edge_partition.number_of_edges()));


Fix the size.

seunghwak · 2026-04-28T00:06:07Z

-        (*edge_partition_e_mask).value_first(), handle.get_stream());
+      auto edge_partition_mask_span =
+        raft::device_span<uint32_t const>((*edge_partition_e_mask).value_first(),
+                                          static_cast<size_t>(edge_partition.number_of_edges()));


Fix the size.

seunghwak · 2026-04-28T00:06:34Z

      if (edge_partition_e_mask) {
+        auto edge_partition_mask_span =
+          raft::device_span<uint32_t const>((*edge_partition_e_mask).value_first(),
+                                            static_cast<size_t>(edge_partition.number_of_edges()));


Fix the size.

seunghwak · 2026-04-28T01:17:22Z

+template void view_t::compute_number_of_edges_with_mask_async<int32_t const*>(
+  raft::device_span<uint32_t const>,
+  int32_t const*,
+  int32_t const*,


Taking both raft::device_span and raw pointers look a bit odd here. Should we better take both edge mask and vertex list using device_span?

Cleaned up in latest push

seunghwak · 2026-04-28T01:18:41Z

+template void view_t::compute_number_of_edges_with_mask_async<thrust::counting_iterator<int32_t>>(
+  raft::device_span<uint32_t const>,
+  thrust::counting_iterator<int32_t>,
+  thrust::counting_iterator<int32_t>,


Can we take local_vertex_partition_range_first, and local_vertex_partition_range_last intead? (and create counting iterators inside?) It will be more consistent with the rest of cugraph.

Made it a tuple of range_first/range_last, compiler was having trouble disambiguating.

seunghwak · 2026-04-30T17:44:41Z

-      thrust_tuple_get_or_identity<decltype(frontier_key_last), 0>(frontier_key_last);
+    auto frontier_majors = raft::device_span<vertex_t const>{
+      thrust_tuple_get_or_identity<decltype(frontier_key_first), 0>(frontier_key_first),
+      static_cast<size_t>(cuda::std::distance(frontier_key_first, frontier_key_last))};


We now have repeated_vertex_bucket_view_t; this generates keys on the fly using transform_iterator.
https://github.com/rapidsai/cugraph/blob/main/cpp/include/cugraph/prims/vertex_frontier.cuh#L714

If repeated_vertex_bucket_view_t is used, this code won't work.

I think we should support this case as well.

if constexpr (std::is_pointer_v<decltype(thrust_tuple_get_or_identity<decltype(frontier_key_first), 0>(frontier_key_first)>) { // call the common path function } else { // call the detail path function }

Fixed in next push.

seunghwak · 2026-04-30T18:08:01Z

+                  packed_bool_size(static_cast<size_t>(edge_partition.number_of_edges())));
+                edge_partition.compute_number_of_edges_with_mask_async(
+                  edge_partition_e_mask_span,
+                  raft::device_span<vertex_t const>{major_first, num_keys},


Something very minor but should we mix (ptr, size) and {ptr, size}? Better stick to the first approach for consistency?

seunghwak · 2026-04-30T18:08:08Z

                edge_partition.compute_number_of_edges_async(
-                  major_first,
-                  major_first + num_keys,
+                  raft::device_span<vertex_t const>{major_first, num_keys},


Something very minor but should we mix (ptr, size) and {ptr, size}? Better stick to the first approach for consistency?

seunghwak · 2026-04-30T18:33:52Z

+          packed_bool_size(static_cast<size_t>(edge_partition.number_of_edges())));
+        ret += edge_partition.compute_number_of_edges_with_mask(
+          edge_partition_mask_span,
+          raft::device_span<vertex_t const>{local_frontier_vertex_first, frontier.size()},


Something very minor but should we mix (ptr, size) and {ptr, size}? Better stick to the first approach for consistency?

seunghwak · 2026-04-30T18:33:58Z

-                                                      local_frontier_vertex_first + frontier.size(),
-                                                      handle.get_stream());
+        ret += edge_partition.compute_number_of_edges(
+          raft::device_span<vertex_t const>{local_frontier_vertex_first, frontier.size()},


Something very minor but should we mix (ptr, size) and {ptr, size}? Better stick to the first approach for consistency?

seunghwak · 2026-04-30T18:56:16Z

+ * SPDX-FileCopyrightText: Copyright (c) 2023-2026, NVIDIA CORPORATION.
 * SPDX-License-Identifier: Apache-2.0
 */
+#include "structure/edge_partition_device_view_impl.cuh"


Can't we delete this?

seunghwak · 2026-04-30T18:56:28Z

+ * SPDX-FileCopyrightText: Copyright (c) 2023-2026, NVIDIA CORPORATION.
 * SPDX-License-Identifier: Apache-2.0
 */
+#include "structure/edge_partition_device_view_impl.cuh"


Can't we delete this?

seunghwak · 2026-04-30T18:56:43Z

+using view_t = edge_partition_device_view_t<int64_t, int64_t, false>;
+using od_extract_iter_t =
+  cuda::transform_iterator<extract_v_t<int64_t, uint32_t, uint64_t>, uint64_t const*>;
+template void view_t::compute_number_of_edges_with_mask_async<od_extract_iter_t>(
+  raft::device_span<uint32_t const>,
+  od_extract_iter_t,
+  od_extract_iter_t,
+  raft::device_span<size_t>,
+  rmm::cuda_stream_view) const;


Can't we delete this?

seunghwak · 2026-04-30T19:06:45Z


+  __device__ return_type_t compute_degree(edge_t offset, edge_t degree) const
+  {
+    if (mask_first) {


I am debating whether we should be more explicit about the existence of mask here.

Say MaskIterator is not a pointer.

if (mask_first) will still be false with a default constructed fancy iterators? Maybe... yes, maybe... not.

Should we better be more explicit and use cuda::std::optional?

This is mimicking your original code, and we don't currently have any examples where it's not a raw pointer.

However, I'll make that change, it's simple enough.

seunghwak · 2026-04-30T19:16:48Z

+
+template __host__ rmm::device_uvector<edge_t> compute_local_degrees_with_mask_sg(
+  raft::device_span<uint32_t const> edge_mask,
+  std::tuple<vertex_t, vertex_t> vertex_partition_range,


I am debating between using "std::tuple" here versus using two separate vertex_t input parameters.

The former sounds better if we are freshly re-designing the API, but the latter looks a bit more consistent with the rest of the code base.

It seems like we are using two variables for input parameters while using the std::tuple for return values. What do you think?

(base) seunghwak@SK:~/RAPIDS/cugraph/cpp/include$ grep -rHn "range_last," . ./cugraph/edge_partition_view.hpp:51: vertex_t major_range_last, ./cugraph/edge_partition_view.hpp:53: vertex_t minor_range_last, ./cugraph/graph_view.hpp:644: major_range_last, ./cugraph/graph_view.hpp:646: minor_range_last, ./cugraph/detail/decompress_edge_partition.cuh:37: vertex_t major_range_last, ./cugraph/detail/decompress_edge_partition.cuh:67: vertex_t major_range_last, ./cugraph/prims/transform_reduce_e_by_src_dst_key.cuh:184: typename GraphViewType::vertex_type major_range_last, ./cugraph/prims/transform_reduce_e_by_src_dst_key.cuh:270: typename GraphViewType::vertex_type major_range_last, ./cugraph/prims/transform_reduce_e_by_src_dst_key.cuh:369: typename GraphViewType::vertex_type major_range_last, ./cugraph/prims/vertex_frontier.cuh:123: typename thrust::iterator_traits<VertexIterator>::value_type vertex_range_last, ./cugraph/prims/vertex_frontier.cuh:170: typename thrust::iterator_traits<InputVertexIterator>::value_type vertex_range_last, ./cugraph/prims/vertex_frontier.cuh:212: typename thrust::iterator_traits<OutputVertexIterator>::value_type vertex_range_last, ./cugraph/prims/detail/per_v_transform_reduce_e.cuh:2056: range_last = std::min(range_last, *(edge_partition.major_hypersparse_first())); ./cugraph/prims/detail/per_v_transform_reduce_e.cuh:2080: range_last, ./cugraph/prims/detail/extract_transform_if_v_frontier_e.cuh:1338: range_last, ./cugraph/prims/transform_reduce_e.cuh:148: typename GraphViewType::vertex_type major_range_last, ./cugraph/prims/transform_reduce_e.cuh:237: typename GraphViewType::vertex_type major_range_last, ./cugraph/prims/transform_reduce_e.cuh:315: typename GraphViewType::vertex_type major_range_last,

(base) seunghwak@SK:~/RAPIDS/cugraph/cpp/include$ grep -rHn "tuple<vertex_t, vertex_t>" . ./cugraph/src_dst_lookup_container.hpp:26: typename value_t = cuda::std::tuple<vertex_t, vertex_t>> ./cugraph/graph_view.hpp:116: std::tuple<vertex_t, vertex_t> local_vertex_partition_range() const ./cugraph/graph_view.hpp:137: std::tuple<vertex_t, vertex_t> vertex_partition_range(size_t partition_idx) const ./cugraph/graph_view.hpp:167: std::tuple<vertex_t, vertex_t> local_edge_partition_major_range(size_t partition_idx) const ./cugraph/graph_view.hpp:196: std::tuple<vertex_t, vertex_t> local_edge_partition_minor_range() const ./cugraph/graph_view.hpp:398: std::tuple<vertex_t, vertex_t> local_vertex_partition_range() const ./cugraph/graph_view.hpp:863: std::tuple<vertex_t, vertex_t> local_vertex_partition_range() const ./cugraph/prims/transform_reduce_dst_nbr_intersection_of_e_endpoints_by_v.cuh:50: __device__ int operator()(cuda::std::tuple<vertex_t, vertex_t> tup) const ./cugraph/prims/per_v_transform_reduce_dst_key_aggregated_outgoing_e.cuh:88: operator()(cuda::std::tuple<vertex_t, vertex_t> val /* major, minor key */) const ./cugraph/prims/per_v_transform_reduce_dst_key_aggregated_outgoing_e.cuh:97: __device__ bool operator()(cuda::std::tuple<vertex_t, vertex_t> pair) const ./cugraph/prims/per_v_transform_reduce_dst_key_aggregated_outgoing_e.cuh:137: operator()(cuda::std::tuple<vertex_t, vertex_t> val /* major, minor key */) const ./cugraph/prims/detail/nbr_intersection.cuh:274: __device__ edge_t operator()(cuda::std::tuple<vertex_t, vertex_t> pair) const ./cugraph/prims/detail/nbr_intersection.cuh:702: cuda::std::tuple<vertex_t, vertex_t>>); ./cugraph/prims/detail/nbr_intersection.cuh:760: vertex_pair_first, thrust_tuple_get<cuda::std::tuple<vertex_t, vertex_t>, size_t{1}>{}); ./cugraph/prims/detail/nbr_intersection.cuh:1122: vertex_pair_first, thrust_tuple_get<cuda::std::tuple<vertex_t, vertex_t>, size_t{0}>{}); ./cugraph/prims/detail/nbr_intersection.cuh:1181: auto vertex_pair_buffer = allocate_dataframe_buffer<cuda::std::tuple<vertex_t, vertex_t>>( ./cugraph/utilities/graph_partition_utils.cuh:117: cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const ./cugraph/utilities/graph_partition_utils.cuh:158: __device__ int operator()(cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const ./cugraph/utilities/graph_partition_utils.cuh:193: cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const ./cugraph/utilities/graph_partition_utils.cuh:225: __device__ int operator()(cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const ./cugraph/utilities/graph_partition_utils.cuh:259: cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const ./cugraph/utilities/graph_partition_utils.cuh:282: __device__ int operator()(cuda::std::tuple<vertex_t, vertex_t> pair /* major, minor */) const ./cugraph/utilities/error_check_utils.cuh:38: __device__ bool operator()(cuda::std::tuple<vertex_t, vertex_t> pair) const ./cugraph/graph_generators.hpp:206: std::vector<std::tuple<vertex_t, vertex_t>> const& component_parameters_v); ./cugraph/graph_generators.hpp:284: std::vector<std::tuple<vertex_t, vertex_t>> const& component_parameters_v);

I like the tuple because it makes it obvious that the two parameters are related. I agree it's an API change. We could either add a FIXME to update the rest of the API or I suppose we could ripple through the rest of the API in this PR also.

I can switch it back if you think that's better and we can grapple with it later. Switching it back will require using SFINAE syntax to disambiguate passing vertex_partition_range_first/vertex_partition_range_last from passing MajorIterator pairs, or combining those into one call and disambiguating the call to the detail implementation via constexpr syntax.

…me things

seunghwak

LGTM

bdice requested review from a team as code owners April 12, 2026 17:10

bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 12, 2026

bdice and others added 4 commits April 12, 2026 20:35

Unify sync/async compute_number_of_edges by delegating sync to async …

3874803

…via inline wrappers

Unify mask/no-mask variants via nullptr sentinel, merge local_degree_…

318a71e

…op_t and local_degree_with_mask_op_t into single functor

Deduplicate vertex_t* / vertex_t const* instantiations via SFINAE con…

8df34cf

…st-cast overloads

Fix clang-format issues

194e309

seunghwak reviewed Apr 14, 2026

View reviewed changes

ChuckHastings added 3 commits April 24, 2026 11:05

address some PR comments, rename some files, move some things to be m…

d4e4832

…ore consistent with other naming

Merge branch 'main' into deduplicate-compute-number-of-edges

6d604e9

remove old comment, add specialization of compute_number_of_edges for…

0995cc0

… od_shortest_path

seunghwak reviewed Apr 28, 2026

View reviewed changes

ChuckHastings added 3 commits April 30, 2026 08:30

Move implementation into detail methods, let default compile inline

04237f9

clean up edge_partition_device_view .cu files

7b4c6b1

Merge branch 'main' into deduplicate-compute-number-of-edges

7ba1017

seunghwak reviewed Apr 30, 2026

View reviewed changes

more PR comments - use cuda::std::optional instead of {}, clean up so…

b57d4c6

…me things

seunghwak approved these changes May 1, 2026

View reviewed changes

Conversation

bdice commented Apr 12, 2026 • edited by ChuckHastings Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Approach

Size reduction breakdown

Testing

Uh oh!

seunghwak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

bdice commented Apr 12, 2026 •

edited by ChuckHastings

Loading