GeoBrix: H3 cell rasterizer + gbx.viz module + example notebooks & diagrams#42
Merged
Conversation
added 12 commits
June 23, 2026 08:45
Design for a tier-agnostic gbx.viz module ([viz] extra: matplotlib + geopandas + folium + mapclassify) promoting the EO-series notebook helpers -- plot_raster / plot_file (decimation + 2-98% percentile stretch + nodata masking) and as_gdf / cells_as_gdf (Spark DF -> GeoDataFrame for .explore()) -- plus two Python-only pyrx escape-hatches (tile_to_numpy, generalized rst_apply). Drops generate_cells; keeps set_conf_safe + band-table ETL notebook-local. Pending user review. Co-authored-by: Isaac
Co-authored-by: Isaac
Adds gbx.viz package (importable skeleton for later tasks), assert_viz_available() guard (raises ImportError with install hint if matplotlib/geopandas missing), and the [viz] optional-dependency extra in pyproject.toml. Pins matplotlib==3.10.9, geopandas==1.1.3, folium==0.20.0, mapclassify==2.10.0 in requirements-pyrx-ci.in (latest on corp proxy) and regenerates the hash-pinned lock (83 packages). 2 tests pass. Co-authored-by: Isaac
Add _raster.py with _decimated_read, _needs_percentile_stretch, and _percentile_stretch helpers; test_raster.py with 3 TDD tests (3 passed).
Append _render, plot_raster, plot_file to viz/_raster.py; export from viz/__init__.py. Matplotlib/rasterio are lazy-imported inside each plotter; assert_viz_available() guards the public API. Agg backend forced when headless. Also fixes unused pytest import (F401) left from Task 2 and adds missing # noqa: E402 annotations so flake8 is fully clean for all viz files. Co-authored-by: Isaac
Replace unreliable get_current_fig_manager() probe with a correct headless guard: select Agg before pyplot import only when pyplot has not yet been imported (no prior use() lock-in), MPLBACKEND is unset, and no display is present (DISPLAY/WAYLAND_DISPLAY absent). Databricks notebooks pre-import pyplot with their own backend, so they are never overridden. Co-authored-by: Isaac
Adds viz._vector with as_gdf (WKT column → GeoDataFrame, EPSG:4326, max_rows guard with truncation warning) and cells_as_gdf (H3 bigint cell ids → boundary polygons via h3 v4 int_to_str + cell_to_boundary). Exports both from viz.__init__. 3/3 tests green. Co-authored-by: Isaac
Add two Python-only escape-hatches in pyrx/core/escape.py: - tile_to_numpy(tile_or_bytes): drops a collected tile or raw bytes to a numpy ndarray (all bands) for host-side exploration. - rst_apply(tile_col, fn, returnType): applies an arbitrary rasterio callable per-row via a dynamic @udf; null tile -> null. Both are re-exported on pyrx.functions (noqa: F401) so callers use `from databricks.labs.gbx.pyrx.functions import tile_to_numpy, rst_apply`. Neither is SQL-registered; binding-parity count is unchanged (154). Co-authored-by: Isaac
Add "viz" to _LIGHT_TEST_DIRS in test/conftest.py (heavy CI phase collect_ignore exclusion) and to the pytest dir list in the light CI action. Clean-venv verification against requirements-pyrx-ci.txt confirmed 829 passed, 2 skipped, RC=0 — all viz deps already in lock. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
…; drop library.py
The gbx.viz module ([viz] extra) and the pyrx escape-hatches (rst_apply,
tile_to_numpy) now provide, as first-class package APIs, the helpers the
eo-series previously carried in a local library.py + config_nb defs.
- config_nb.ipynb: install geobrix[light,stac,viz] (folium/mapclassify/geopandas
now come via [viz], not a manual %pip); import plot_raster/plot_file/as_gdf/
cells_as_gdf from databricks.labs.gbx.viz and rst_apply/tile_to_numpy from
pyrx.functions; drop the local as_gdf/cells_as_gdf defs and the `import library`;
keep the one surviving constant (FILENAME_TIMESTAMP_FORMAT) inline.
- library.py: DELETED. Everything in it is now superseded — plot_raster/plot_file
by gbx.viz, to_numpy_arr/rasterio_lambda by tile_to_numpy/rst_apply,
generate_cells was dead heavy-only, _set_conf_safe duplicated config_nb's
set_conf_safe, FILE_SIZE_THRESHOLD was unused.
- 01-04: call the package functions directly (plot_raster/plot_file, the bare
FILENAME_TIMESTAMP_FORMAT). nb03's raster->timeseries projection moves from
library.rasterio_lambda("tile.raster", fn) to rst_apply("tile", fn) — rst_apply
takes the tile struct and opens tile["raster"] itself (not the raw bytes column).
- README: drop the library.py row; note viz helpers come from gbx.viz; [light,stac,viz].
Source cells migrated; notebooks need re-execution on a cluster to refresh outputs
(they read /Volumes and run on Serverless/classic, not locally).
Co-authored-by: Isaac
Re-run on Serverless against the [viz]-enabled wheel. config_nb drops the now-dead imports (geopandas/matplotlib.pyplot/rasterio.MemoryFile/io.BytesIO — superseded by gbx.viz) and the library.py autoreload lines; installs geobrix[light,stac,viz]. Notebooks 01-04 carry refreshed outputs from the package-based viz/escape-hatch APIs. Co-authored-by: Isaac
…ation Bring docs/docs/notebooks/eo-series.mdx and the series README in line with the gbx.viz migration: drop all library.py references (file deleted), install geobrix[light,stac,viz] (+ the viz extras matplotlib/geopandas/folium/mapclassify), note visualization helpers come from databricks.labs.gbx.viz, and replace the raster->timeseries `rasterio_lambda` mention with the `rst_apply` escape-hatch. Option-2 (heavyweight) now flips only config_nb.ipynb. Co-authored-by: Isaac
added 3 commits
June 23, 2026 19:24
…c) design Design for a DGGS-cell rasterizer: rasterize a set of H3 cell ids (+ optional value) into a raster tile via pixel-centroid burn (the inverse of rst_h3_rastertogrid). Grouped aggregator rst_h3_rasterize_agg (heavy UDAF + light grouped pandas_udf, light SQL returns BINARY per the light-agg convention; default value = 1/NoData presence mask; default 4326 with optional projected srid + auto extent/pixel-size, full overrides). rst_h3_gridspec defines the complete SHARED grid/canvas (snapped origin + pixel size + dims + srid) once over the union of all thresholds, so every band rasterizes to a byte-identical transform and stacks cleanly via rst_frombands_agg (no half-pixel drift). Implemented as a scalar per-cell bbox + native min/max + snap (both tiers; avoids the grouped-pandas_udf struct-return limit). Quadbin/BNG variants are follow-ons. Validation: CI round-trip vs rastertogrid + partition property + a committed FCC fixed-wireless subset; DEM elevation-isoband notebook for the full polygons->polyfill->rasterize->stack demo. Pending user review. Co-authored-by: Isaac
Co-authored-by: Isaac
Adds cellraster.py with pure-function rasterization primitives: _h3_str (signed-Long normalization), _resolution (uniform-res guard), cell_bbox, snap_bounds (lattice-aligned snapping, DRY helper for Task 2), compute_gridspec (kring-padded, snapped 8-tuple), and cells_to_raster (pixel-centroid burn to float64 GTiff bytes). 5/5 tests green. Note: brief's half-pixel centroid expansion pre-snap caused the single-cell kring_pad=0 case to straddle two lattice slots (→2x2); removed expansion — snap_bounds correctly gives 1x1 for a single point.
added 2 commits
June 24, 2026 13:01
rasterio.plot.show() renders a constant-valued single band (an H3 presence mask, all 1.0) as a blank plot and ignores the explicit vmin/vmax, so every per-band inspection looked empty. Render the single-band branch with ax.imshow + an explicit plotting_extent instead -- it honors the clim and the masked array (NoData -> transparent over the facecolor). Replace the figure-exists check with a real regression test asserting the footprint is actually drawn (non-degenerate clim + coloured pixels in the rasterized buffer); the old check passed while blank. Verified locally across full/mid/sparse coverage and a continuous DEM-like raster. Co-authored-by: Isaac
Remove unused imports (math/numpy in _h3_cell_bbox_udf, math in rst_h3_gridspec) and the dead _mode_val capture (mode is already applied via the bbox UDF), plus an unused numpy import in test_core_cellraster. Reformat test_vector_raster_bridge with in-container black (a prior host-black pass diverged from CI's black). Clears the CI Python-lint gate; no behaviour change. Co-authored-by: Isaac
Add plot_mask_layers(layers, colors=, ...): overlay several single-band presence- mask tiles on one axes, each a solid colour with a legend (NoData transparent over a grey facecolor). Tiles must share a grid; draw order is largest-first so nested coverage stays visible. Notebook cell 8 now overlays the two mid-coverage bands on a single plot instead of two separate viridis figures. Regression test asserts two overlaid AxesImages, a 2-entry legend, and both requested colours present in the rasterized buffer. Co-authored-by: Isaac
…summary Cell 16 described dissolve_by as active while the code renders per-cell (kept deliberately: per-cell tooltips read better at this size). Reword to the per-cell render and add a Note pointing at dissolve_by="band_level" for larger sets. Update the summary table's Visualize rows to the actual calls (plot_mask_layers, plot_raster(composite="depth")). Co-authored-by: Isaac
- New docs/docs/notebooks/h3-rasterize.mdx (registered in sidebars.js) documenting
the SF Bay Area H3 rasterize -> band-stack example.
- viz.mdx: document plot_mask_layers, grid_as_gdf, composite="depth", and the
cells_as_gdf dissolve_by option; update the import list.
- beta-release-notes: add gbx.viz module bullet + H3 rasterize example bullet.
- raster-functions: add explicit {#h3-grid} anchor so release-notes link resolves.
- README: retarget the example to the San Francisco DEM, the session temp table,
and the new viz helpers.
Validated with a full Docusaurus build (no broken links).
Co-authored-by: Isaac
added 9 commits
June 24, 2026 14:13
…sterize_agg Add a "Worked example" tip in the rst_h3_rasterize_agg section pointing at the new H3 Rasterize notebook, cross-linking rst_h3_gridspec, rst_frombands_agg, and gbx.viz. Co-authored-by: Isaac
New "h3_aggregate" input kind: a fixed 331-cell res-9 H3 set on a hardcoded explicit grid, mirrored byte-for-byte in spec.py (_H3RAGG_*) and BenchDispatch.scala (h3Ragg*) so the light and heavy legs burn the identical cells onto the identical canvas. Python: FnSpec (dggs, spark-path, fingerprint) + _h3_aggregate_df group builder + runner wiring. Scala: DGGS category, h3Aggregate set/inputKind, heavy aggregate case (UDAF -> gbx_rst_fromcontent tile struct), HeavyRunner group branch, BenchDispatchTest count 106->107. Cross-tier masks verified byte-identical (0 px) via the local JAR; Scala bench suite green (20 tests). Co-authored-by: Isaac
…in fromcontent The heavy UDAF's dataType is tileDataType(BinaryType) -- a tile STRUCT, like rst_rasterize_agg -- so the cluster heavy leg must call gbx_rst_h3_rasterize_agg directly (the consistency collect reads `raster` off the struct). Wrapping it in gbx_rst_fromcontent (a BINARY->struct helper, only needed for the lightweight SQL form) fed a struct where bytes were expected and aborted the distributed agg with "[INTERNAL_ERROR] Couldn't find method eval". Match the rst_rasterize_agg pattern. Co-authored-by: Isaac
… not NaN A null in a TYPED (Double) value column arrives in the pandas_udf as np.nan, and `np.nan is not None` is True, so the presence guard burned float(np.nan)=NaN instead of 1.0. Guard with pd.isna. The cluster benchmark caught this as a heavy(1.0)-vs- light(NaN) cross-tier divergence (the value-omitted path was already correct, which is why the JAR-gated parity test passed). Regression test added with an explicit nullable DoubleType value column. Co-authored-by: Isaac
…5x, exact) Cluster bench (1000 groups, fixed 20 workers): heavy 1.50 ms/tile vs light 2.26 ms/tile (heavy ~1.5x faster), cross-tier parity exact. Footnoted because the H3 rasterizer's workload (331-cell groups -> 39x24 output) differs from the 1024² rows, so the cross-tier ratio + exact parity are the comparable takeaways. Co-authored-by: Isaac
…mily Bring the lightweight implementation-techniques page current: add rst_h3_rasterize_agg to the grouped-aggregate UDF table, and a "GridX (pygx) -- custom grid" section in the Arrow scalar tab mirroring the quadbin/BNG families (scalar cell-ops as pandas_udf, array polyfill/kring as plain @udf). Regular scalar UDFs (metadata/accessors, h3_cell_bbox) remain intentionally unlisted. Co-authored-by: Isaac
The landscape SVG already carried rst_h3_rasterize_agg + the 108 count, but its screenshot PNG was last rendered pre-H3 (107). Re-screenshot from the current SVG so the slide asset matches; the portrait PNG (used on the docs page) was already current. Co-authored-by: Isaac
eo-series 01-04: embed the existing eo-series-0N.png banner after the intro cell. xView + h3-rasterize: new resources/images/example-diagrams.py (reuses the eo-series.py diagram framework via importlib) renders xview-clipping.png and h3-rasterize.png, embedded after each notebook's intro cell and in their READMEs. Includes the executed-notebook + banner edits supplied for the example set. Co-authored-by: Isaac
Update the four eo-series notebook diagrams' chips + captions to match the migrated notebooks: shapefile_ogr→shapefile_gbx, pystac_client→StacClient (01); the deleted download_band/update_assets flow → StacClient.download / StacClient.repair (02); gdal reader→gtiff_gbx, rasterio_lambda→rst_apply (03); gdal writer→gtiff_gbx (04). Tier-agnostic built-ins (h3_tessellateaswkb, st_*) and still-used functions kept. Re-rendered all four PNGs. Also includes the h3 notebook's first-cell edit. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The 0.4.0 beta example + visualization layer, branched from
beta/0.4.0(the 0.4.0 base — STAC,register(only=), lightweightgbx_rst_fromfile, CI hardening — merged in #41). On top of that base this PR adds:gbx_rst_h3_rasterize_agg,gbx_h3_cell_bbox, and the light-tierrst_h3_gridspechelper, with a worked SF-DEM demo notebook + a published light-vs-heavy benchmark.gbx.viz— a tier-agnostic visualization module behind a new[viz]extra (raster plotting, coverage-depth + mask-layer composites, Spark→GeoPandas adapters), plus two Python-onlypyrxescape-hatches.library.py).Binding-parity: 156 registered functions (two new:
gbx_rst_h3_rasterize_agg,gbx_h3_cell_bbox).H3 cell rasterizer (RasterX — both tiers)
The inverse of the
gbx_rst_h3_rastertogrid*family: synthesize a raster from H3-indexed values instead of reducing a raster to per-cell stats.gbx_rst_h3_rasterize_agg— grouped aggregator (heavy ScalaTypedImperativeAggregate+ lightpyrxpandas_udf) that burns a set of H3 cells (one row per cell, optional value; null → presence mask1.0) into one GTiff tile per group via pixel-centroid assignment. Extent/grid supplied explicitly or auto-derived from the cell set (+kring_pad). Heavy returns a tileSTRUCT; the lightweight SQL form returnsBINARY(a PySpark grouped-pandas_udfcan't return a struct) — the Python wrapper recomposes the struct.gbx_h3_cell_bbox— scalarSTRUCT<xmin,ymin,xmax,ymax>for one H3 cell in a target EPSG, optionally k-ring-padded.rst_h3_gridspec(light/Python DataFrame helper) — derive the canonical shared canvas (extent + pixel grid) from a cell set before aggregating, so per-band tiles are pixel-aligned for stacking.notebooks/examples/h3-rasterize/h3_rasterize_isobands.ipynb— San Francisco DEM → 100 m elevation isobands → H3 polyfill → sharedrst_h3_gridspeccanvas → per-bandrst_h3_rasterize_agg(materialized once to a session temp table) →rst_frombands_aggmulti-band stack →gbx.vizrendering. Doc page + README + a#h3-gridcross-reference from the function docs.databricks.labs.gbx.viz(new[viz]extra)Tier-agnostic — works with
pyrxorrasterxtiles. Heavy deps (matplotlib + geopandas) are lazy-imported behind aviz/_env.pyguard;folium/mapclassifyship in the extra forGeoDataFrame.explore().plot_raster(raster_bytes)/plot_file(path)— decimation to a pixel budget + per-band 2–98% percentile stretch (UInt16 EO) + nodata masking + viridis(1-band)/RGB(multi); headless-safe backend selection.composite="depth"renders a multi-band presence stack as a per-pixel coverage-depth gradient (vs a mostly-black RGB), and single-band constant-value presence masks now draw as a solid footprint over a light background (previously rendered blank —rasterio.plot.showdiscards the clim, so the single-band path usesimshow).plot_mask_layers(layers)— overlay several single-band mask tiles on one axes, each a solid colour with a legend (the multi-threshold coverage view).as_gdf/cells_as_gdf(optionaldissolve_by) /grid_as_gdf— Spark DataFrame / H3 cells / arst_h3_gridspecgrid struct → GeoPandas (EPSG:4326) for.plot()/.explore(); driver-sidemax_rowsguard.pyrx escape-hatches (Python-only)
On
databricks.labs.gbx.pyrx.functions— not SQL-registered, so binding-parity is unaffected:tile_to_numpy(tile_or_bytes)— read a tile's raster into a NumPy array.rst_apply(tile_col, fn, returnType=DoubleType())— apply your own function to each tile's open rasterio dataset, one scalar per row.eo-series migration
config_nb.ipynbinstallsgeobrix[light,stac,viz]and imports the helpers from the package; its localas_gdf/cells_as_gdfdefs andimport libraryare gone.library.pydeleted — every helper is now superseded bygbx.viz/ the escape-hatches. Notebooks 01–04 call the package functions directly (nb03's raster→timeseries projection moves torst_apply("tile", fn)). READMEs updated.Example notebooks & hero diagrams
A hero pipeline diagram at the top of every example notebook, from a shared SVG→PNG generator (
resources/images/*.py):shapefile_gbx,gtiff_gbx(reader + writer),StacClient.download/StacClient.repair,rst_apply— matching the migrated notebooks (tier-agnostic built-ins likeh3_tessellateaswkb/st_*kept).example-diagrams.py(reuses the eo-series framework).Benchmark
gbx_rst_h3_rasterize_aggadded to the cluster benchmark (both tiers, fixed 20-worker cluster, 1000-group spark-path): heavy 1.50 ms/tile vs light 2.26 ms/tile (heavy ~1.5×), cross-tier consistencyexact. Recorded inbenchmarking.mdx; classified inperformance.mdx(which also gained thegbx_custom_*custom-grid family). The bench surfaced — and we fixed — a real light-tier defect: a null typed-Doublevalue column burnedNaNinstead of presence1.0(np.nan is not Noneslipped the guard).Testing
BenchDispatchTest+ the Scala bench suite green; lighttest/viz+test/pyrxgreen.test/vizwired into the lightweight CI tier (_LIGHT_TEST_DIRS+pyrx_build);[viz]deps hash-pinned inrequirements-pyrx-ci.txt. New regression tests:composite="depth", single-band presence render (asserts drawn pixels, not just a figure object),plot_mask_layersoverlay, null-value-column presence.override def name/ Pythonfunctions.py/function-info.json); QC doc gates green (diagram-coverage 108rst_*, release-notes-functions, doc-coverage D2–D5).build maingreen on the feature tip (heavy + light).This pull request and its description were written by Isaac.