Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/actions/pyrx_build/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,5 +64,5 @@ runs:
# dirs via test/conftest.py collect_ignore). Every light test dir must be
# listed: pyrx, ds, pyvx, pygx (light GridX), pmtiles_light (light
# pmtiles_agg), stac (light STAC client, [stac] extra),
# viz (gbx.viz, [viz] extra). See test/conftest.py for the maintained condition.
pytest test/pyrx test/ds test/pyvx test/pygx test/pmtiles_light test/stac test/viz -m "not integration" -v
# vizx (gbx.vizx, [vizx] extra). See test/conftest.py for the maintained condition.
pytest test/pyrx test/ds test/pyvx test/pygx test/pmtiles_light test/stac test/vizx -m "not integration" -v
28 changes: 21 additions & 7 deletions docs/docs/api/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import packagesExamples from '!!raw-loader!../../tests/python/packages/examples.
import RasterXIcon from '../../../resources/images/RasterX.png';
import GridXIcon from '../../../resources/images/GridX.png';
import VectorXIcon from '../../../resources/images/VectorX.png';
import VizXIcon from '../../../resources/images/VizX.png';

# Functions Overview

Expand Down Expand Up @@ -63,6 +64,17 @@ Augments Databricks built-in `ST_*` functions with vector-tile encoding, TIN sur

---

### <img src={VizXIcon} alt="VizX" style={{height: '88px', width: 'auto', verticalAlign: 'middle'}} /> {#vizx}

Tier-agnostic visualization helpers for inspecting GeoBrix outputs in a notebook — Python-only, no SQL functions.

- Raster plotting (`plot_raster` / `plot_file`) with auto-decimation and percentile stretch, including coverage-depth and mask-layer composites for multi-band presence stacks
- Spark DataFrame → GeoPandas adapters (`as_gdf` / `cells_as_gdf` / `grid_as_gdf`) for `.plot()` / `.explore()` maps

[VizX Reference →](./vizx)

---

### PMTiles

Container format for serving raster (PNG / JPEG / WebP) or vector (MVT) tile pyramids from a single static file via HTTP range requests. Native Scala v3 encoder — no GDAL/OGR dependency.
Expand All @@ -76,13 +88,13 @@ Container format for serving raster (PNG / JPEG / WebP) or vector (MVT) tile pyr

## Package Comparison

| Feature | RasterX | GridX | VectorX | PMTiles |
|---------|---------|-------|---------|---------|
| **Primary Use** | Raster processing | Discrete global grids | Vector encoding + legacy | Tile pyramid packaging |
| **Product Gap** | Full gap-filling | Specialized grids (BNG, quadbin) | Vector-tile encoding, legacy migration | Net-new |
| **GDAL Required** | Yes | No | Yes (readers + MVT) | No |
| **Output Format** | Tile (struct) + arrays | Cell IDs (Long / String) + WKB | BINARY (MVT bytes), WKB | BINARY (PMTile blob) or file |
| **Spark Surface** | 65+ SQL functions | 30+ SQL functions | 6+ SQL functions + DataSources | 1 UDAF + 1 DataSource |
| Feature | RasterX | GridX | VectorX | PMTiles | VizX |
|---------|---------|-------|---------|---------|------|
| **Primary Use** | Raster processing | Discrete global grids | Vector encoding + legacy | Tile pyramid packaging | Notebook visualization |
| **Product Gap** | Full gap-filling | Specialized grids (BNG, quadbin) | Vector-tile encoding, legacy migration | Net-new | Supporting layer |
| **GDAL Required** | Yes | No | Yes (readers + MVT) | No | No |
| **Output Format** | Tile (struct) + arrays | Cell IDs (Long / String) + WKB | BINARY (MVT bytes), WKB | BINARY (PMTile blob) or file | Matplotlib figure / GeoDataFrame |
| **Spark Surface** | 65+ SQL functions | 30+ SQL functions | 6+ SQL functions + DataSources | 1 UDAF + 1 DataSource | Python only (no SQL) |

## Choosing the Right Package

Expand All @@ -94,6 +106,8 @@ Container format for serving raster (PNG / JPEG / WebP) or vector (MVT) tile pyr

**Use PMTiles when:** publishing a tile pyramid (raster or vector) as a single static file; serving from S3/ABFS/GCS without a tile server; aggregating `(z, x, y, bytes)` rows into a deployable map.

**Use VizX when:** inspecting GeoBrix raster tiles or files in a notebook (single-band, RGB, coverage-depth, or mask-layer composites); turning Spark geometry / H3-cell / gridspec DataFrames into GeoPandas for `.plot()` / `.explore()` maps. VizX is a Python-only supporting layer for visualization, not a spatial-processing package.

## Function Naming Convention

All GeoBrix SQL functions use the `gbx_` prefix:
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/api/raster-functions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -785,7 +785,7 @@ GROUP BY region_id
Streaming aggregator that burns H3 cell centroid pixels (or spatial-envelope pixels) into one raster tile per group. This is the **inverse** of [`rst_h3_rastertogrid*`](#rst_h3_rastertogridavg): where those functions reduce raster pixels to per-cell statistics, `rst_h3_rasterize_agg` reconstructs a raster from per-cell values. Use [`rst_frombands_agg`](#rst_frombands_agg) to stack per-threshold rasters (each produced by one `rst_h3_rasterize_agg` call) into a single multi-band output.

:::tip Worked example
The [H3 Rasterize notebook](../notebooks/h3-rasterize) walks this through end to end on a San Francisco Bay Area DEM: elevation isobands → H3 polyfill → a shared canvas from [`rst_h3_gridspec`](#h3-grid) → per-band `rst_h3_rasterize_agg` → multi-band stack via [`rst_frombands_agg`](#rst_frombands_agg), visualized with the [`gbx.viz`](./viz) helpers. The same pattern maps directly to a telco multi-threshold signal-coverage stack.
The [H3 Rasterize notebook](../notebooks/h3-rasterize) walks this through end to end on a San Francisco Bay Area DEM: elevation isobands → H3 polyfill → a shared canvas from [`rst_h3_gridspec`](#h3-grid) → per-band `rst_h3_rasterize_agg` → multi-band stack via [`rst_frombands_agg`](#rst_frombands_agg), visualized with the [`gbx.vizx`](./vizx) helpers. The same pattern maps directly to a telco multi-threshold signal-coverage stack.
:::

**Signature:** `rst_h3_rasterize_agg(cellid: Column, value: Column, srid: Column, pixel_size: Column, xmin: Column, ymin: Column, xmax: Column, ymax: Column, width: Column, height: Column, mode: Column, kring_pad: Column): Column`
Expand Down Expand Up @@ -2038,7 +2038,7 @@ df.select(

Apply your own function to each tile's open rasterio dataset, returning one scalar per row. `fn` receives a rasterio `DatasetReader`; the return value must match `returnType` (default `DoubleType()`; any Spark `DataType`). A null/empty tile yields null. This is the "GeoBrix doesn't have function X — run my own rasterio per tile" path; it returns a scalar (raster→raster transforms are the domain of `rst_mapalgebra` / `rst_derivedband`).

For rendering tiles and building maps from results, see the [Visualization (`gbx.viz`)](./viz) page.
For rendering tiles and building maps from results, see the [Visualization (`gbx.vizx`)](./vizx) page.

---

Expand Down
91 changes: 75 additions & 16 deletions docs/docs/api/viz.mdx → docs/docs/api/vizx.mdx
Original file line number Diff line number Diff line change
@@ -1,34 +1,36 @@
---
sidebar_position: 11
title: Visualization (gbx.viz)
title: VizX Function Reference
---

# Visualization (`gbx.viz`)
import VizXIcon from '../../../resources/images/VizX.png';

`databricks.labs.gbx.viz` renders rasters and turns Spark DataFrames into GeoDataFrames for interactive maps. It is **tier-agnostic** — the raster plotters take tile bytes (use them with `pyrx` *or* `rasterx` tiles), and the vector adapters take any Spark DataFrame.
# <img src={VizXIcon} alt="VizX" style={{height: '3em', width: 'auto', verticalAlign: 'middle', marginRight: '0.5rem'}} /> Function Reference

`databricks.labs.gbx.vizx` renders rasters and turns Spark DataFrames into GeoDataFrames for interactive maps. It is **tier-agnostic** — the raster plotters take tile bytes (use them with `pyrx` *or* `rasterx` tiles), and the vector adapters take any Spark DataFrame.

These are driver-side, single-node helpers for inspecting results in a notebook — not distributed operations. They collect to the driver, so the vector adapters guard the collect with `max_rows`.

:::note Opt-in extra
`gbx.viz` requires `geobrix[viz]`, which pulls in `matplotlib`, `geopandas`, `folium`, and `mapclassify`. The package itself imports only `matplotlib` (raster rendering) and `geopandas` (the GeoDataFrame adapters); `folium` + `mapclassify` are used by `GeoDataFrame.explore()` for interactive maps.
`gbx.vizx` requires `geobrix[vizx]`, which pulls in `matplotlib`, `geopandas`, `folium`, and `mapclassify`. The package itself imports only `matplotlib` (raster rendering) and `geopandas` (the GeoDataFrame adapters); `folium` + `mapclassify` are used by `GeoDataFrame.explore()` for interactive maps.
:::

## Installation

```bash
pip install "geobrix[viz]"
pip install "geobrix[vizx]"
```

From a Databricks notebook (commonly alongside the lightweight tier):

```python
%pip install --quiet "geobrix[light,viz] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-0.4.0-py3-none-any.whl"
%pip install --quiet "geobrix[light,vizx] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-0.4.0-py3-none-any.whl"
```

## Import

```python
from databricks.labs.gbx.viz import (
from databricks.labs.gbx.vizx import (
plot_raster,
plot_file,
plot_mask_layers,
Expand Down Expand Up @@ -60,7 +62,7 @@ plot_raster(raster_bytes, *, fig_w=10, fig_h=10, max_pixels=2000, composite="aut
Render a raster from its in-memory bytes — e.g. a tile's `raster` field collected from a GeoBrix DataFrame:

```python
from databricks.labs.gbx.viz import plot_raster
from databricks.labs.gbx.vizx import plot_raster

row = df.select("tile").first()
plot_raster(row["tile"]["raster"])
Expand All @@ -75,10 +77,10 @@ plot_raster(row["tile"]["raster"], composite="depth")
plot_file(path, *, fig_w=10, fig_h=10, max_pixels=2000, composite="auto")
```

Render a raster from disk (GeoTIFF, VRT, …) with the same decimation + stretch pipeline:
Render a raster from disk (GeoTIFF, VRT, …) with the same decimation + stretch pipeline. A leading `dbfs:` or `file:` scheme is stripped automatically, so scheme-qualified Databricks paths (`dbfs:/Volumes/…`, `file:///Volumes/…`) work as well as the bare FUSE path:

```python
from databricks.labs.gbx.viz import plot_file
from databricks.labs.gbx.vizx import plot_file

plot_file("/Volumes/main/geobrix_samples/geobrix-examples/nyc/dem.tif")

Expand Down Expand Up @@ -108,7 +110,7 @@ Overlay several single-band presence-mask tiles on one axes, each drawn as a sol
All tiles must share the same grid and extent — for example, produced on a shared canvas via [`rst_h3_gridspec`](./raster-functions#h3-grid). Layers are drawn in order: pass the largest footprint first and the smallest last so nested coverage remains visible.

```python
from databricks.labs.gbx.viz import plot_mask_layers
from databricks.labs.gbx.vizx import plot_mask_layers

# Each raster is a single-band H3 presence mask for a different threshold.
# Tiles share the same canvas (same gridspec extent).
Expand Down Expand Up @@ -137,10 +139,10 @@ as_gdf(df, wkt_col="wkt", *, max_rows=10_000)
Convert a Spark DataFrame with a WKT geometry column into a `geopandas.GeoDataFrame`. Non-geometry columns are preserved; the WKT column is replaced by the geometry:

```python
from databricks.labs.gbx.viz import as_gdf
from databricks.labs.gbx.vizx import as_gdf

gdf = as_gdf(df_with_wkt, wkt_col="wkt")
gdf.explore() # interactive folium map (needs the [viz] extra)
gdf.explore() # interactive folium map (needs the [vizx] extra)
```

### `cells_as_gdf`
Expand All @@ -164,11 +166,11 @@ Convert a DataFrame of H3 cell ids into a `GeoDataFrame` of cell-boundary polygo
Without `dissolve_by`, each row represents one cell (useful for per-cell tooltips in `.explore()`). With `dissolve_by`, rows are merged per group into a single union footprint — far fewer geometries for large cell sets:

```python
from databricks.labs.gbx.viz import cells_as_gdf
from databricks.labs.gbx.vizx import cells_as_gdf

# Per-cell choropleth:
gdf = cells_as_gdf(df_cells, cell_col="cellid", extra_cols=["count"])
gdf.explore(column="count") # choropleth (needs mapclassify, bundled in [viz])
gdf.explore(column="count") # choropleth (needs mapclassify, bundled in [vizx])

# Dissolve to one footprint polygon per category:
gdf_dissolved = cells_as_gdf(
Expand Down Expand Up @@ -198,7 +200,7 @@ Convert a grid spec — as returned by [`rst_h3_gridspec`](./raster-functions#h3
Optional metadata fields `pixel_size`, `width`, and `height` are carried through if present on the input.

```python
from databricks.labs.gbx.viz import cells_as_gdf, grid_as_gdf
from databricks.labs.gbx.vizx import cells_as_gdf, grid_as_gdf

# grid_row is the 'grid' field from an rst_h3_gridspec result
grid_gdf = grid_as_gdf(grid_row)
Expand All @@ -211,6 +213,63 @@ grid_gdf.explore(m=m, color="black", style_kwds={"fill": False})

See the [H3 rasterize notebook](../notebooks/h3-rasterize) for a worked example pairing `grid_as_gdf` with `cells_as_gdf` to build a shared-canvas overlay.

## Static maps

`plot_static` renders Spark- or GeoPandas-derived geometries (or H3 cells) over
a basemap as a **static** matplotlib figure — the GitHub-renderable counterpart
to `GeoDataFrame.explore()` (whose Leaflet/folium output renders a blank
*"Make this Notebook Trusted"* placeholder on GitHub and the docs site).

The basemap is fetched from a web tile server (via `contextily`) **at execution
time** and rasterized into the figure, so it bakes into the committed notebook
output PNG — GitHub then displays it with no network. If the executing
environment has no egress, the map renders without a basemap and a warning is
emitted (never a hard error).

### `plot_static`

```python
plot_static(
data, *, geom_col=None, grid_system=None, column=None, cmap="viridis",
legend=True, basemap=True, basemap_source=None, alpha=0.8, edgecolor="face",
fill=True, markersize=None, title=None, fig_w=10, fig_h=10, max_rows=10_000,
srid=None, ax=None,
)
```

`data` is a Spark DataFrame **or** a `geopandas.GeoDataFrame`. Returns the
matplotlib `Axes`; pass it back via `ax=` to overlay layers on one map. Every
layer is reprojected to Web Mercator (EPSG:3857), so a `basemap=False` overlay
lines up with a basemap layer on the same axes. `plot_static` does **not** call
`pyplot.show()` — in a notebook the figure auto-displays at cell end with all
overlaid layers; a script can call `plt.show()` itself. Pass `fill=False` to
draw geometries as outlines only (no face), so a boundary doesn't cover the
layers beneath it.

**Geometry columns** accept the same encodings as every other `gbx_st_*`
function — WKT, EWKT, WKB, EWKB, and native `GEOMETRY` / `GEOGRAPHY` (coerced
in-Spark via `st_asbinary`). Set **`grid_system`** to treat the column as DGGS
cell ids instead:

| `grid_system` | Behaviour |
|---|---|
| `None` (default) | Column is a geometry encoding (WKT/EWKT/WKB/EWKB/`GEOMETRY`/`GEOGRAPHY`). |
| `'h3'` | Column holds H3 cell ids (string index **or** bigint); rendered as cell-boundary polygons. |
| `'quadbin'`, `'bng'`, `'custom'` | Planned — currently raise `NotImplementedError`. |

```python
from databricks.labs.gbx.vizx import plot_static

# H3 choropleth over a basemap, then overlay the shared-canvas boundary as a
# red outline (fill=False so it doesn't cover the cells; basemap=False so it
# doesn't re-fetch tiles). Both layers reproject to 3857, so they align.
ax = plot_static(cells_df, grid_system="h3", column="count", title="Coverage")
plot_static(grid_gdf, ax=ax, fill=False, edgecolor="red", basemap=False)
```

`basemap_source` overrides the default `contextily.providers.CartoDB.Positron`;
`basemap=False` skips tiles entirely (deterministic, no network).

## Escape hatches

When you need raster math that isn't in the `rst_*` surface, drop down to NumPy / rasterio per tile with the [pyrx escape hatches](./raster-functions#escape-hatches) (`tile_to_numpy`, `rst_apply`).
Expand Down
Loading
Loading