Skip to content

perf(graph-analytics): make C++ faster than Python by swapping slow maps for contiguous vectors#42

Open
Phantom0299 wants to merge 3 commits into
iiitl:mainfrom
Phantom0299:perf/cpp-graph-analytics
Open

perf(graph-analytics): make C++ faster than Python by swapping slow maps for contiguous vectors#42
Phantom0299 wants to merge 3 commits into
iiitl:mainfrom
Phantom0299:perf/cpp-graph-analytics

Conversation

@Phantom0299
Copy link
Copy Markdown

@Phantom0299 Phantom0299 commented Apr 12, 2026

Title: perf: Rewrite C++ backend to finally beat the Python/NumPy baseline (#38)

It was a bit frustrating to see that the C++ backend was actually losing to the Python version in the benchmarks. After looking at the binding.cpp code, it was clear why the core pagerank loop was bogged down by std::map lookups and string comparisons for every single edge. Basically, the C++ code was doing way more work than it needed to, while NumPy's C-kernels were running highly optimized contiguous memory operations.

How I made it faster than Python-
I upgraded the backend to use a data oriented approach, focusing on how the cpu actually likes to read data:

  1. Integer Mapping: I stopped using strings in the hot loop. I mapped all node names to integer IDs once at the start so the cpu can just work with numbers.
  2. Straight-Line Memory: I swapped the slow std::map for a standard std::vector. Instead of making the CPU jump all over the place to find the next rank, this packs all the data tightly together in one continuous line. The CPU can just zoom straight through it without pausing.
  3. Simpler Connections: I flattened the graph structure into basic lists of integer IDs. By getting rid of the bulky data structures, we cut out all the hidden background work that was dragging the C++ speed down.
    The Results
    I ran the benchmarks against both the original C++ code and the Python baseline. The C++ version is now over 3x faster than it was and has successfully overtaken the Python/NumPy implementation.

Original C++: 0.027044s (Slower than Python)
Python (NumPy): 0.008787s (Baseline)
Optimized C++ (My Fix): 0.008621s (Faster than Python)
Screenshot 2026-04-12 113058

Total Speedup: ~3.14x compared to the previous C++ version.

Fixes #38

Summary by CodeRabbit

  • Refactor
    • Internal graph processing and PageRank implementation rewritten for significantly lower memory use and faster computation on large graphs; numerical results, outputs, and the public API remain unchanged.
    • Performance improvements should be transparent to users—same behavior and results, with reduced resource usage and quicker processing for large inputs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Walkthrough

Replaces string-keyed PageRank state with index-based structures: name_to_idx (std::unordered_map), adjacency lists adj (std::vector<std::vector<int>>), and rank vectors; converts neighbor names to indices, updates iterative rank computation (including explicit dangling mass redistribution), and updates top-node selection and checksum; public signatures unchanged.

Changes

Cohort / File(s) Summary
PageRank C++ implementation
chuck/tasks/graph_analytics/native_cpp/binding.cpp
Replaced map<string,...> state with index-based containers (name_to_idx, adj, rank, new_rank); converted neighbor names to integer indices; dangling nodes are represented by empty adj[u] and their mass is computed as dangling_share and redistributed each iteration; iterative update now zeroes new_rank via std::fill(base), accumulates contributions with new_rank[v] += (damping*rank[u])/adj[u].size(), and adds redistributed dangling/base mass; top-node selection and checksum now operate on indices (top_idx, checksum sums (i+1)*rank[i]); exported function signatures (py::dict solve(py::object payload_obj), module init) unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing slow string-keyed maps with index-based contiguous vectors for performance optimization in the C++ graph analytics backend.
Linked Issues check ✅ Passed The PR successfully addresses issue #38's core objective by replacing inefficient map<string, vector> with index-based structures and contiguous vectors, achieving reported speedup from 0.027s to 0.008621s, exceeding the Python baseline of 0.008787s.
Out of Scope Changes check ✅ Passed All changes are scoped to the C++ PageRank implementation in binding.cpp, directly addressing the performance optimization objective without unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Phantom0299 Phantom0299 force-pushed the perf/cpp-graph-analytics branch from 59ba1b8 to 46d849e Compare April 12, 2026 06:16
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
chuck/tasks/graph_analytics/native_cpp/binding.cpp (2)

73-73: Consider std::swap instead of copy assignment for better performance.

rank = new_rank performs a full vector copy (O(n)). Since this is a performance-focused PR, consider using std::swap(rank, new_rank) for O(1) exchange. The values in new_rank are overwritten by std::fill at the start of each iteration anyway.

Suggested optimization
-        rank = new_rank; 
+        std::swap(rank, new_rank);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` at line 73, The
assignment "rank = new_rank" causes an O(n) vector copy; replace it with an O(1
exchange using std::swap(rank, new_rank) in the same scope (where rank and
new_rank are used, and new_rank is later overwritten by std::fill). Ensure
<utility> (or <algorithm>) is included if not already so std::swap is available;
keep the rest of the iteration logic unchanged.

33-33: Narrowing conversion from size_t to int.

nodes.size() returns std::size_t. Assigning to int is a narrowing conversion that could truncate on graphs with more than ~2 billion nodes. While unlikely in practice, consider using std::size_t or int64_t for correctness, or add an explicit cast to silence compiler warnings.

Suggested fix
-    int n = nodes.size();
+    int n = static_cast<int>(nodes.size());

Or for large graph safety:

-    int n = nodes.size();
+    std::size_t n = nodes.size();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` at line 33, The
assignment int n = nodes.size() narrows std::size_t into int; change the type of
n to std::size_t (or int64_t if you prefer signed) or add an explicit cast if
you intentionally want truncation so it no longer performs an implicit narrowing
conversion; update the declaration of n in binding.cpp (the variable initialized
from nodes.size()) accordingly and ensure any downstream uses handle the new
type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 14-16: Remove the trailing whitespace characters present in the
file around the graph deserialization line where graph is assigned
(std::map<std::string, std::vector<std::string>> graph =
payload_obj.cast<std::map<std::string, std::vector<std::string>>>()), and also
strip trailing spaces from the nearby lines referenced in the review (other
surrounding lines in the same function/block). Ensure no lines end with spaces
so the compiler/CI whitespace checks pass.
- Around line 50-53: The loop that builds adjacency uses name_to_idx[target]
which silently inserts a default entry for unknown targets; change it to use
name_to_idx.at(target) so a std::out_of_range is thrown for malformed input (in
the block referencing adj, found, target, and name_to_idx) and thus fail fast on
edges to non-existent nodes—update the push_back call to use .at(target) and
handle or let the exception propagate, or add validation that all targets exist
before populating adj.

---

Nitpick comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Line 73: The assignment "rank = new_rank" causes an O(n) vector copy; replace
it with an O(1 exchange using std::swap(rank, new_rank) in the same scope (where
rank and new_rank are used, and new_rank is later overwritten by std::fill).
Ensure <utility> (or <algorithm>) is included if not already so std::swap is
available; keep the rest of the iteration logic unchanged.
- Line 33: The assignment int n = nodes.size() narrows std::size_t into int;
change the type of n to std::size_t (or int64_t if you prefer signed) or add an
explicit cast if you intentionally want truncation so it no longer performs an
implicit narrowing conversion; update the declaration of n in binding.cpp (the
variable initialized from nodes.size()) accordingly and ensure any downstream
uses handle the new type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58445ec1-1a26-4ef8-a99a-33bb65e865c2

📥 Commits

Reviewing files that changed from the base of the PR and between a302e5f and 41a01f5.

📒 Files selected for processing (1)
  • chuck/tasks/graph_analytics/native_cpp/binding.cpp

Comment thread chuck/tasks/graph_analytics/native_cpp/binding.cpp Outdated
Comment thread chuck/tasks/graph_analytics/native_cpp/binding.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
chuck/tasks/graph_analytics/native_cpp/binding.cpp (1)

47-48: ⚠️ Potential issue | 🟠 Major

Fail fast on unknown edge targets.

Line 48 uses operator[], so a missing target is inserted with index 0 and the solver returns a corrupted result instead of throwing. That also bypasses the fallback path in chuck/native_bindings.py, which only retries Python when the native solver raises. Use .at(target) or validate the payload before building adj.

Suggested fix
-            for (const auto& target : found->second) {
-                adj[u].push_back(name_to_idx[target]);
-            }
+            for (const auto& target : found->second) {
+                adj[u].push_back(name_to_idx.at(target));
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` around lines 47 - 48, The
loop that builds adj uses name_to_idx[target] which inserts a default 0 when
target is missing; instead validate and fail fast so the native solver throws
and triggers the Python fallback: in the loop over found->second (symbols:
found, target, adj, name_to_idx) either replace the operator[] access with
name_to_idx.at(target) or explicitly check name_to_idx.find(target) and if not
found throw a std::runtime_error (with a clear message including target) before
calling adj[u].push_back(...).
🧹 Nitpick comments (1)
chuck/tasks/graph_analytics/native_cpp/binding.cpp (1)

57-69: Swap the rank buffers instead of copying them.

rank = new_rank; copies n doubles every iteration. swap keeps the behavior the same here because new_rank is fully reset with std::fill at the top of the next pass.

Suggested fix
-        rank = new_rank;
+        rank.swap(new_rank);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` around lines 57 - 69, The
loop currently does an O(n) copy each iteration via "rank = new_rank;"; replace
that assignment with a constant-time buffer swap (e.g., call rank.swap(new_rank)
or std::swap(rank, new_rank)) at the end of the iteration so you reuse the
preallocated vectors and keep the same behavior given you reset new_rank with
std::fill at the top of the next pass; update the assignment in the loop body
where variables rank and new_rank are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 41-49: The code currently materializes a full adjacency list by
calling adj[u].resize(n) and filling adj[u][v]=v when graph.find(nodes[u]) is
missing or empty; this wastes O(n²) memory for many dangling nodes. Instead,
when found == graph.end() || found->second.empty() leave adj[u] empty (do not
resize or push all v), and update the PageRank iteration logic to account for
dangling nodes by computing a single dangling_mass (sum of ranks of nodes with
empty adj) each iteration and distributing dangling_mass/n to all nodes during
rank update; apply this change at both places handling dangling branches (the
block with adj[u].resize(n) / adj[u][v]=v and the analogous block around lines
63-66) and keep use of name_to_idx and other mappings unchanged.

---

Duplicate comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 47-48: The loop that builds adj uses name_to_idx[target] which
inserts a default 0 when target is missing; instead validate and fail fast so
the native solver throws and triggers the Python fallback: in the loop over
found->second (symbols: found, target, adj, name_to_idx) either replace the
operator[] access with name_to_idx.at(target) or explicitly check
name_to_idx.find(target) and if not found throw a std::runtime_error (with a
clear message including target) before calling adj[u].push_back(...).

---

Nitpick comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 57-69: The loop currently does an O(n) copy each iteration via
"rank = new_rank;"; replace that assignment with a constant-time buffer swap
(e.g., call rank.swap(new_rank) or std::swap(rank, new_rank)) at the end of the
iteration so you reuse the preallocated vectors and keep the same behavior given
you reset new_rank with std::fill at the top of the next pass; update the
assignment in the loop body where variables rank and new_rank are used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f2382e96-56dc-48d9-90dd-6bb46d6e05c3

📥 Commits

Reviewing files that changed from the base of the PR and between 41a01f5 and 46d849e.

📒 Files selected for processing (1)
  • chuck/tasks/graph_analytics/native_cpp/binding.cpp

Comment thread chuck/tasks/graph_analytics/native_cpp/binding.cpp
#include <map>
#include <string>
#include <vector>
#include <unordered_map>
Copy link
Copy Markdown
Member

@Aaryan-Dadu Aaryan-Dadu Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using unordered_map, how will you avoid hash collisions?

std::unordered_map<std::string, int> name_to_idx;
for (int i = 0; i < n; ++i) {
name_to_idx[nodes[i]] = i;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share the A/B comparison workflow for the changes

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 84-86: The tie-break in the top-node selection loop uses an
epsilon check allowing near-equal scores to fall back to node ordering; update
the condition in the loop that iterates over rank[] (the for loop using rank,
top_score, nodes, top_idx) to only use exact equality for the tie case (i.e.,
replace the fabs(rank[i] - top_score) < 1e-15 check with a direct equality
comparison rank[i] == top_score) so that node-name ordering is applied only when
scores are exactly equal, matching the Python implementation.
- Around line 34-36: The comment about unordered_map::reserve on the name_to_idx
local is misleading; change it to state that reserve(n) pre-allocates buckets to
reduce rehashing and keep the average load factor lower (improving performance)
but does not prevent hash collisions, which are determined by the hash function;
update the comment near name_to_idx and the reserve call to reflect this
accurate behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff7e5089-da79-4d38-bcdc-6e770019837a

📥 Commits

Reviewing files that changed from the base of the PR and between 46d849e and 48ba0e7.

📒 Files selected for processing (1)
  • chuck/tasks/graph_analytics/native_cpp/binding.cpp

Comment on lines +34 to +36
// Fix: .reserve(n) ensures the load factor stays low, preventing hash collisions
std::unordered_map<std::string, int> name_to_idx;
name_to_idx.reserve(n);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In C++, does std::unordered_map::reserve prevent hash collisions, or does it only reduce rehashes and average bucket occupancy?

💡 Result:

std::unordered_map::reserve does not prevent hash collisions. It only reduces rehashes and average bucket occupancy (i.e., lowers the load factor). Hash collisions occur when distinct keys have the same hash value, determined solely by the hash function. Collisions are handled by chaining (linked lists per bucket) regardless of the number of buckets. reserve cannot change hash values or prevent collisions; it only allocates more buckets. reserve(n) sets bucket_count to at least ceil(n / max_load_factor), ensuring at least n elements can be inserted without exceeding max_load_factor and triggering automatic rehash. This pre-allocates buckets to avoid incremental rehashes during insertion and keeps average elements per bucket low, improving performance by reducing chain lengths for collided elements. No source indicates reserve affects collision occurrence; it manages bucket count for load factor control.

Citations:


Correct the misleading comment on unordered_map::reserve.

.reserve(n) reduces rehashing and keeps average bucket occupancy low, but does not prevent hash collisions. Collisions are determined by the hash function and cannot be prevented by adjusting bucket count. Update the comment to reflect what the method actually guarantees.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` around lines 34 - 36, The
comment about unordered_map::reserve on the name_to_idx local is misleading;
change it to state that reserve(n) pre-allocates buckets to reduce rehashing and
keep the average load factor lower (improving performance) but does not prevent
hash collisions, which are determined by the hash function; update the comment
near name_to_idx and the reserve call to reflect this accurate behavior.

Comment on lines +84 to +86
for (int i = 1; i < n; ++i) {
if (rank[i] > top_score || (std::abs(rank[i] - top_score) < 1e-15 && nodes[i] > nodes[top_idx])) {
top_score = rank[i];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the epsilon from the top-node tie-break.

The Python implementation only falls back to node-name ordering when the scores are exactly equal. With std::abs(rank[i] - top_score) < 1e-15, a slightly smaller score can still win here, so top_node can diverge from chuck/tasks/graph_analytics/task.py.

💡 Suggested fix
-        if (rank[i] > top_score || (std::abs(rank[i] - top_score) < 1e-15 && nodes[i] > nodes[top_idx])) {
+        if (rank[i] > top_score || (rank[i] == top_score && nodes[i] > nodes[top_idx])) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (int i = 1; i < n; ++i) {
if (rank[i] > top_score || (std::abs(rank[i] - top_score) < 1e-15 && nodes[i] > nodes[top_idx])) {
top_score = rank[i];
for (int i = 1; i < n; ++i) {
if (rank[i] > top_score || (rank[i] == top_score && nodes[i] > nodes[top_idx])) {
top_score = rank[i];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` around lines 84 - 86, The
tie-break in the top-node selection loop uses an epsilon check allowing
near-equal scores to fall back to node ordering; update the condition in the
loop that iterates over rank[] (the for loop using rank, top_score, nodes,
top_idx) to only use exact equality for the tie case (i.e., replace the
fabs(rank[i] - top_score) < 1e-15 check with a direct equality comparison
rank[i] == top_score) so that node-name ordering is applied only when scores are
exactly equal, matching the Python implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

graph-analytics C++ backend is slower than Python backend

2 participants