feat: add LatLngToCellBatch for lower cgo overhead#119
Conversation
… Adds a batched LatLng -> Cell API in a new C file (h3_latLngBatch.{c,h}) that supplements the cloned H3 core.
Coverage Report for CI Build 26683177239Coverage remained the same at 100.0%Details
Uncovered ChangesNo uncovered changes found. Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
|
Actually had a thought on pure Go implementations (as we can see in #120) - I'm wondering if its a pure Go implementation, then would we also get rid of the Cgo overhead and wouldn't need a batched version? Before (CGo): After (pure Go): benchstat (n=10): |
As a user and h3 & Go enthusiast, I'm all for pure Go optimizations. |
Resolves #113
Adds
LatLngToCellBatchso a slice ofLatLng->[]Cellcosts one cgo transaction for the whole batch instead of one per row. The motivating use case is pipelines transforming millions of coordinate rows per cycle, where cgo overhead currently dominates actual H3 work.Per @jogly's suggestion in the issue, the implementation lives in a new C extension (
h3_latLngBatch.{c,h}) that supplements the cloned H3 core. If H3 core later exposes and equivalentlatLngToCellBatch, swapping the Go wrapper to call it directly is a one-line change.Changes
h3_latLngBatch.h(new)latLngToCellBatchh3_latLngBatch.c(new)latLngToCellh3.go#include <h3_latLngBatch.h>in cgo preamble; newLatLngToCellBatchwrapperh3_test.gobench_test.goBenchmarkLatLngToCellBatchplus aBenchmarkLatLngToCellBaselinefor comparing resultsBenchmark
At small
n(n < 32) the per-call path wins. The crossover is aroundn=16-32; from there the batched version grows roughly linearly while the loop pays the cgo cost N times. Atn=16384the batched version is ~1.21x fasterConventions
LatLng.toC(), so the C code sees radians-LatLng identical to the single-call path.