perf(graph-analytics): make C++ faster than Python by swapping slow maps for contiguous vectors#42
perf(graph-analytics): make C++ faster than Python by swapping slow maps for contiguous vectors#42Phantom0299 wants to merge 3 commits into
Conversation
…aps for contiguous vectors
WalkthroughReplaces string-keyed PageRank state with index-based structures: Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
59ba1b8 to
46d849e
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
chuck/tasks/graph_analytics/native_cpp/binding.cpp (2)
73-73: Considerstd::swapinstead of copy assignment for better performance.
rank = new_rankperforms a full vector copy (O(n)). Since this is a performance-focused PR, consider usingstd::swap(rank, new_rank)for O(1) exchange. The values innew_rankare overwritten bystd::fillat the start of each iteration anyway.Suggested optimization
- rank = new_rank; + std::swap(rank, new_rank);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` at line 73, The assignment "rank = new_rank" causes an O(n) vector copy; replace it with an O(1 exchange using std::swap(rank, new_rank) in the same scope (where rank and new_rank are used, and new_rank is later overwritten by std::fill). Ensure <utility> (or <algorithm>) is included if not already so std::swap is available; keep the rest of the iteration logic unchanged.
33-33: Narrowing conversion fromsize_ttoint.
nodes.size()returnsstd::size_t. Assigning tointis a narrowing conversion that could truncate on graphs with more than ~2 billion nodes. While unlikely in practice, consider usingstd::size_torint64_tfor correctness, or add an explicit cast to silence compiler warnings.Suggested fix
- int n = nodes.size(); + int n = static_cast<int>(nodes.size());Or for large graph safety:
- int n = nodes.size(); + std::size_t n = nodes.size();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` at line 33, The assignment int n = nodes.size() narrows std::size_t into int; change the type of n to std::size_t (or int64_t if you prefer signed) or add an explicit cast if you intentionally want truncation so it no longer performs an implicit narrowing conversion; update the declaration of n in binding.cpp (the variable initialized from nodes.size()) accordingly and ensure any downstream uses handle the new type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 14-16: Remove the trailing whitespace characters present in the
file around the graph deserialization line where graph is assigned
(std::map<std::string, std::vector<std::string>> graph =
payload_obj.cast<std::map<std::string, std::vector<std::string>>>()), and also
strip trailing spaces from the nearby lines referenced in the review (other
surrounding lines in the same function/block). Ensure no lines end with spaces
so the compiler/CI whitespace checks pass.
- Around line 50-53: The loop that builds adjacency uses name_to_idx[target]
which silently inserts a default entry for unknown targets; change it to use
name_to_idx.at(target) so a std::out_of_range is thrown for malformed input (in
the block referencing adj, found, target, and name_to_idx) and thus fail fast on
edges to non-existent nodes—update the push_back call to use .at(target) and
handle or let the exception propagate, or add validation that all targets exist
before populating adj.
---
Nitpick comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Line 73: The assignment "rank = new_rank" causes an O(n) vector copy; replace
it with an O(1 exchange using std::swap(rank, new_rank) in the same scope (where
rank and new_rank are used, and new_rank is later overwritten by std::fill).
Ensure <utility> (or <algorithm>) is included if not already so std::swap is
available; keep the rest of the iteration logic unchanged.
- Line 33: The assignment int n = nodes.size() narrows std::size_t into int;
change the type of n to std::size_t (or int64_t if you prefer signed) or add an
explicit cast if you intentionally want truncation so it no longer performs an
implicit narrowing conversion; update the declaration of n in binding.cpp (the
variable initialized from nodes.size()) accordingly and ensure any downstream
uses handle the new type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 58445ec1-1a26-4ef8-a99a-33bb65e865c2
📒 Files selected for processing (1)
chuck/tasks/graph_analytics/native_cpp/binding.cpp
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
chuck/tasks/graph_analytics/native_cpp/binding.cpp (1)
47-48:⚠️ Potential issue | 🟠 MajorFail fast on unknown edge targets.
Line 48 uses
operator[], so a missingtargetis inserted with index0and the solver returns a corrupted result instead of throwing. That also bypasses the fallback path inchuck/native_bindings.py, which only retries Python when the native solver raises. Use.at(target)or validate the payload before buildingadj.Suggested fix
- for (const auto& target : found->second) { - adj[u].push_back(name_to_idx[target]); - } + for (const auto& target : found->second) { + adj[u].push_back(name_to_idx.at(target)); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` around lines 47 - 48, The loop that builds adj uses name_to_idx[target] which inserts a default 0 when target is missing; instead validate and fail fast so the native solver throws and triggers the Python fallback: in the loop over found->second (symbols: found, target, adj, name_to_idx) either replace the operator[] access with name_to_idx.at(target) or explicitly check name_to_idx.find(target) and if not found throw a std::runtime_error (with a clear message including target) before calling adj[u].push_back(...).
🧹 Nitpick comments (1)
chuck/tasks/graph_analytics/native_cpp/binding.cpp (1)
57-69: Swap the rank buffers instead of copying them.
rank = new_rank;copiesndoubles every iteration.swapkeeps the behavior the same here becausenew_rankis fully reset withstd::fillat the top of the next pass.Suggested fix
- rank = new_rank; + rank.swap(new_rank);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` around lines 57 - 69, The loop currently does an O(n) copy each iteration via "rank = new_rank;"; replace that assignment with a constant-time buffer swap (e.g., call rank.swap(new_rank) or std::swap(rank, new_rank)) at the end of the iteration so you reuse the preallocated vectors and keep the same behavior given you reset new_rank with std::fill at the top of the next pass; update the assignment in the loop body where variables rank and new_rank are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 41-49: The code currently materializes a full adjacency list by
calling adj[u].resize(n) and filling adj[u][v]=v when graph.find(nodes[u]) is
missing or empty; this wastes O(n²) memory for many dangling nodes. Instead,
when found == graph.end() || found->second.empty() leave adj[u] empty (do not
resize or push all v), and update the PageRank iteration logic to account for
dangling nodes by computing a single dangling_mass (sum of ranks of nodes with
empty adj) each iteration and distributing dangling_mass/n to all nodes during
rank update; apply this change at both places handling dangling branches (the
block with adj[u].resize(n) / adj[u][v]=v and the analogous block around lines
63-66) and keep use of name_to_idx and other mappings unchanged.
---
Duplicate comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 47-48: The loop that builds adj uses name_to_idx[target] which
inserts a default 0 when target is missing; instead validate and fail fast so
the native solver throws and triggers the Python fallback: in the loop over
found->second (symbols: found, target, adj, name_to_idx) either replace the
operator[] access with name_to_idx.at(target) or explicitly check
name_to_idx.find(target) and if not found throw a std::runtime_error (with a
clear message including target) before calling adj[u].push_back(...).
---
Nitpick comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 57-69: The loop currently does an O(n) copy each iteration via
"rank = new_rank;"; replace that assignment with a constant-time buffer swap
(e.g., call rank.swap(new_rank) or std::swap(rank, new_rank)) at the end of the
iteration so you reuse the preallocated vectors and keep the same behavior given
you reset new_rank with std::fill at the top of the next pass; update the
assignment in the loop body where variables rank and new_rank are used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f2382e96-56dc-48d9-90dd-6bb46d6e05c3
📒 Files selected for processing (1)
chuck/tasks/graph_analytics/native_cpp/binding.cpp
| #include <map> | ||
| #include <string> | ||
| #include <vector> | ||
| #include <unordered_map> |
There was a problem hiding this comment.
You are using unordered_map, how will you avoid hash collisions?
| std::unordered_map<std::string, int> name_to_idx; | ||
| for (int i = 0; i < n; ++i) { | ||
| name_to_idx[nodes[i]] = i; | ||
| } |
There was a problem hiding this comment.
Please share the A/B comparison workflow for the changes
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp`:
- Around line 84-86: The tie-break in the top-node selection loop uses an
epsilon check allowing near-equal scores to fall back to node ordering; update
the condition in the loop that iterates over rank[] (the for loop using rank,
top_score, nodes, top_idx) to only use exact equality for the tie case (i.e.,
replace the fabs(rank[i] - top_score) < 1e-15 check with a direct equality
comparison rank[i] == top_score) so that node-name ordering is applied only when
scores are exactly equal, matching the Python implementation.
- Around line 34-36: The comment about unordered_map::reserve on the name_to_idx
local is misleading; change it to state that reserve(n) pre-allocates buckets to
reduce rehashing and keep the average load factor lower (improving performance)
but does not prevent hash collisions, which are determined by the hash function;
update the comment near name_to_idx and the reserve call to reflect this
accurate behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ff7e5089-da79-4d38-bcdc-6e770019837a
📒 Files selected for processing (1)
chuck/tasks/graph_analytics/native_cpp/binding.cpp
| // Fix: .reserve(n) ensures the load factor stays low, preventing hash collisions | ||
| std::unordered_map<std::string, int> name_to_idx; | ||
| name_to_idx.reserve(n); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In C++, does std::unordered_map::reserve prevent hash collisions, or does it only reduce rehashes and average bucket occupancy?
💡 Result:
std::unordered_map::reserve does not prevent hash collisions. It only reduces rehashes and average bucket occupancy (i.e., lowers the load factor). Hash collisions occur when distinct keys have the same hash value, determined solely by the hash function. Collisions are handled by chaining (linked lists per bucket) regardless of the number of buckets. reserve cannot change hash values or prevent collisions; it only allocates more buckets. reserve(n) sets bucket_count to at least ceil(n / max_load_factor), ensuring at least n elements can be inserted without exceeding max_load_factor and triggering automatic rehash. This pre-allocates buckets to avoid incremental rehashes during insertion and keeps average elements per bucket low, improving performance by reducing chain lengths for collided elements. No source indicates reserve affects collision occurrence; it manages bucket count for load factor control.
Citations:
- 1: https://en.cppreference.com/w/cpp/container/unordered_map/reserve.html
- 2: https://stackoverflow.com/questions/42373232/why-does-stdunordered-map-have-a-reserve-method
- 3: https://stackoverflow.com/questions/31098123/c-unordered-map-collision-handling-resize-and-rehash
- 4: https://en.cppreference.com/w/cpp/container/unordered_map.html
Correct the misleading comment on unordered_map::reserve.
.reserve(n) reduces rehashing and keeps average bucket occupancy low, but does not prevent hash collisions. Collisions are determined by the hash function and cannot be prevented by adjusting bucket count. Update the comment to reflect what the method actually guarantees.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` around lines 34 - 36, The
comment about unordered_map::reserve on the name_to_idx local is misleading;
change it to state that reserve(n) pre-allocates buckets to reduce rehashing and
keep the average load factor lower (improving performance) but does not prevent
hash collisions, which are determined by the hash function; update the comment
near name_to_idx and the reserve call to reflect this accurate behavior.
| for (int i = 1; i < n; ++i) { | ||
| if (rank[i] > top_score || (std::abs(rank[i] - top_score) < 1e-15 && nodes[i] > nodes[top_idx])) { | ||
| top_score = rank[i]; |
There was a problem hiding this comment.
Remove the epsilon from the top-node tie-break.
The Python implementation only falls back to node-name ordering when the scores are exactly equal. With std::abs(rank[i] - top_score) < 1e-15, a slightly smaller score can still win here, so top_node can diverge from chuck/tasks/graph_analytics/task.py.
💡 Suggested fix
- if (rank[i] > top_score || (std::abs(rank[i] - top_score) < 1e-15 && nodes[i] > nodes[top_idx])) {
+ if (rank[i] > top_score || (rank[i] == top_score && nodes[i] > nodes[top_idx])) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (int i = 1; i < n; ++i) { | |
| if (rank[i] > top_score || (std::abs(rank[i] - top_score) < 1e-15 && nodes[i] > nodes[top_idx])) { | |
| top_score = rank[i]; | |
| for (int i = 1; i < n; ++i) { | |
| if (rank[i] > top_score || (rank[i] == top_score && nodes[i] > nodes[top_idx])) { | |
| top_score = rank[i]; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@chuck/tasks/graph_analytics/native_cpp/binding.cpp` around lines 84 - 86, The
tie-break in the top-node selection loop uses an epsilon check allowing
near-equal scores to fall back to node ordering; update the condition in the
loop that iterates over rank[] (the for loop using rank, top_score, nodes,
top_idx) to only use exact equality for the tie case (i.e., replace the
fabs(rank[i] - top_score) < 1e-15 check with a direct equality comparison
rank[i] == top_score) so that node-name ordering is applied only when scores are
exactly equal, matching the Python implementation.
Title: perf: Rewrite C++ backend to finally beat the Python/NumPy baseline (#38)
It was a bit frustrating to see that the C++ backend was actually losing to the Python version in the benchmarks. After looking at the binding.cpp code, it was clear why the core pagerank loop was bogged down by std::map lookups and string comparisons for every single edge. Basically, the C++ code was doing way more work than it needed to, while NumPy's C-kernels were running highly optimized contiguous memory operations.
How I made it faster than Python-
I upgraded the backend to use a data oriented approach, focusing on how the cpu actually likes to read data:
The Results
I ran the benchmarks against both the original C++ code and the Python baseline. The C++ version is now over 3x faster than it was and has successfully overtaken the Python/NumPy implementation.
Original C++: 0.027044s (Slower than Python)

Python (NumPy): 0.008787s (Baseline)
Optimized C++ (My Fix): 0.008621s (Faster than Python)
Total Speedup: ~3.14x compared to the previous C++ version.
Fixes #38
Summary by CodeRabbit