Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/actions/pyrx_build/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,5 +64,5 @@ runs:
# dirs via test/conftest.py collect_ignore). Every light test dir must be
# listed: pyrx, ds, pyvx, pygx (light GridX), pmtiles_light (light
# pmtiles_agg), stac (light STAC client, [stac] extra),
# viz (gbx.viz, [viz] extra). See test/conftest.py for the maintained condition.
pytest test/pyrx test/ds test/pyvx test/pygx test/pmtiles_light test/stac test/viz -m "not integration" -v
# vizx (gbx.vizx, [vizx] extra). See test/conftest.py for the maintained condition.
pytest test/pyrx test/ds test/pyvx test/pygx test/pmtiles_light test/stac test/vizx -m "not integration" -v
28 changes: 21 additions & 7 deletions docs/docs/api/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import packagesExamples from '!!raw-loader!../../tests/python/packages/examples.
import RasterXIcon from '../../../resources/images/RasterX.png';
import GridXIcon from '../../../resources/images/GridX.png';
import VectorXIcon from '../../../resources/images/VectorX.png';
import VizXIcon from '../../../resources/images/VizX.png';

# Functions Overview

Expand Down Expand Up @@ -63,6 +64,17 @@ Augments Databricks built-in `ST_*` functions with vector-tile encoding, TIN sur

---

### <img src={VizXIcon} alt="VizX" style={{height: '88px', width: 'auto', verticalAlign: 'middle'}} /> {#vizx}

Tier-agnostic visualization helpers for inspecting GeoBrix outputs in a notebook — Python-only, no SQL functions.

- Raster plotting (`plot_raster` / `plot_file`) with auto-decimation and percentile stretch, including coverage-depth and mask-layer composites for multi-band presence stacks
- Spark DataFrame → GeoPandas adapters (`as_gdf` / `cells_as_gdf` / `grid_as_gdf`) for `.plot()` / `.explore()` maps

[VizX Reference →](./vizx)

---

### PMTiles

Container format for serving raster (PNG / JPEG / WebP) or vector (MVT) tile pyramids from a single static file via HTTP range requests. Native Scala v3 encoder — no GDAL/OGR dependency.
Expand All @@ -76,13 +88,13 @@ Container format for serving raster (PNG / JPEG / WebP) or vector (MVT) tile pyr

## Package Comparison

| Feature | RasterX | GridX | VectorX | PMTiles |
|---------|---------|-------|---------|---------|
| **Primary Use** | Raster processing | Discrete global grids | Vector encoding + legacy | Tile pyramid packaging |
| **Product Gap** | Full gap-filling | Specialized grids (BNG, quadbin) | Vector-tile encoding, legacy migration | Net-new |
| **GDAL Required** | Yes | No | Yes (readers + MVT) | No |
| **Output Format** | Tile (struct) + arrays | Cell IDs (Long / String) + WKB | BINARY (MVT bytes), WKB | BINARY (PMTile blob) or file |
| **Spark Surface** | 65+ SQL functions | 30+ SQL functions | 6+ SQL functions + DataSources | 1 UDAF + 1 DataSource |
| Feature | RasterX | GridX | VectorX | PMTiles | VizX |
|---------|---------|-------|---------|---------|------|
| **Primary Use** | Raster processing | Discrete global grids | Vector encoding + legacy | Tile pyramid packaging | Notebook visualization |
| **Product Gap** | Full gap-filling | Specialized grids (BNG, quadbin) | Vector-tile encoding, legacy migration | Net-new | Supporting layer |
| **GDAL Required** | Yes | No | Yes (readers + MVT) | No | No |
| **Output Format** | Tile (struct) + arrays | Cell IDs (Long / String) + WKB | BINARY (MVT bytes), WKB | BINARY (PMTile blob) or file | Matplotlib figure / GeoDataFrame |
| **Spark Surface** | 65+ SQL functions | 30+ SQL functions | 6+ SQL functions + DataSources | 1 UDAF + 1 DataSource | Python only (no SQL) |

## Choosing the Right Package

Expand All @@ -94,6 +106,8 @@ Container format for serving raster (PNG / JPEG / WebP) or vector (MVT) tile pyr

**Use PMTiles when:** publishing a tile pyramid (raster or vector) as a single static file; serving from S3/ABFS/GCS without a tile server; aggregating `(z, x, y, bytes)` rows into a deployable map.

**Use VizX when:** inspecting GeoBrix raster tiles or files in a notebook (single-band, RGB, coverage-depth, or mask-layer composites); turning Spark geometry / H3-cell / gridspec DataFrames into GeoPandas for `.plot()` / `.explore()` maps. VizX is a Python-only supporting layer for visualization, not a spatial-processing package.

## Function Naming Convention

All GeoBrix SQL functions use the `gbx_` prefix:
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/api/raster-functions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -785,7 +785,7 @@ GROUP BY region_id
Streaming aggregator that burns H3 cell centroid pixels (or spatial-envelope pixels) into one raster tile per group. This is the **inverse** of [`rst_h3_rastertogrid*`](#rst_h3_rastertogridavg): where those functions reduce raster pixels to per-cell statistics, `rst_h3_rasterize_agg` reconstructs a raster from per-cell values. Use [`rst_frombands_agg`](#rst_frombands_agg) to stack per-threshold rasters (each produced by one `rst_h3_rasterize_agg` call) into a single multi-band output.

:::tip Worked example
The [H3 Rasterize notebook](../notebooks/h3-rasterize) walks this through end to end on a San Francisco Bay Area DEM: elevation isobands → H3 polyfill → a shared canvas from [`rst_h3_gridspec`](#h3-grid) → per-band `rst_h3_rasterize_agg` → multi-band stack via [`rst_frombands_agg`](#rst_frombands_agg), visualized with the [`gbx.viz`](./viz) helpers. The same pattern maps directly to a telco multi-threshold signal-coverage stack.
The [H3 Rasterize notebook](../notebooks/h3-rasterize) walks this through end to end on a San Francisco Bay Area DEM: elevation isobands → H3 polyfill → a shared canvas from [`rst_h3_gridspec`](#h3-grid) → per-band `rst_h3_rasterize_agg` → multi-band stack via [`rst_frombands_agg`](#rst_frombands_agg), visualized with the [`gbx.vizx`](./vizx) helpers. The same pattern maps directly to a telco multi-threshold signal-coverage stack.
:::

**Signature:** `rst_h3_rasterize_agg(cellid: Column, value: Column, srid: Column, pixel_size: Column, xmin: Column, ymin: Column, xmax: Column, ymax: Column, width: Column, height: Column, mode: Column, kring_pad: Column): Column`
Expand Down Expand Up @@ -2038,7 +2038,7 @@ df.select(

Apply your own function to each tile's open rasterio dataset, returning one scalar per row. `fn` receives a rasterio `DatasetReader`; the return value must match `returnType` (default `DoubleType()`; any Spark `DataType`). A null/empty tile yields null. This is the "GeoBrix doesn't have function X — run my own rasterio per tile" path; it returns a scalar (raster→raster transforms are the domain of `rst_mapalgebra` / `rst_derivedband`).

For rendering tiles and building maps from results, see the [Visualization (`gbx.viz`)](./viz) page.
For rendering tiles and building maps from results, see the [Visualization (`gbx.vizx`)](./vizx) page.

---

Expand Down
91 changes: 75 additions & 16 deletions docs/docs/api/viz.mdx → docs/docs/api/vizx.mdx
Original file line number Diff line number Diff line change
@@ -1,34 +1,36 @@
---
sidebar_position: 11
title: Visualization (gbx.viz)
title: VizX Function Reference
---

# Visualization (`gbx.viz`)
import VizXIcon from '../../../resources/images/VizX.png';

`databricks.labs.gbx.viz` renders rasters and turns Spark DataFrames into GeoDataFrames for interactive maps. It is **tier-agnostic** — the raster plotters take tile bytes (use them with `pyrx` *or* `rasterx` tiles), and the vector adapters take any Spark DataFrame.
# <img src={VizXIcon} alt="VizX" style={{height: '3em', width: 'auto', verticalAlign: 'middle', marginRight: '0.5rem'}} /> Function Reference

`databricks.labs.gbx.vizx` renders rasters and turns Spark DataFrames into GeoDataFrames for interactive maps. It is **tier-agnostic** — the raster plotters take tile bytes (use them with `pyrx` *or* `rasterx` tiles), and the vector adapters take any Spark DataFrame.

These are driver-side, single-node helpers for inspecting results in a notebook — not distributed operations. They collect to the driver, so the vector adapters guard the collect with `max_rows`.

:::note Opt-in extra
`gbx.viz` requires `geobrix[viz]`, which pulls in `matplotlib`, `geopandas`, `folium`, and `mapclassify`. The package itself imports only `matplotlib` (raster rendering) and `geopandas` (the GeoDataFrame adapters); `folium` + `mapclassify` are used by `GeoDataFrame.explore()` for interactive maps.
`gbx.vizx` requires `geobrix[vizx]`, which pulls in `matplotlib`, `geopandas`, `folium`, and `mapclassify`. The package itself imports only `matplotlib` (raster rendering) and `geopandas` (the GeoDataFrame adapters); `folium` + `mapclassify` are used by `GeoDataFrame.explore()` for interactive maps.
:::

## Installation

```bash
pip install "geobrix[viz]"
pip install "geobrix[vizx]"
```

From a Databricks notebook (commonly alongside the lightweight tier):

```python
%pip install --quiet "geobrix[light,viz] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-0.4.0-py3-none-any.whl"
%pip install --quiet "geobrix[light,vizx] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-0.4.0-py3-none-any.whl"
```

## Import

```python
from databricks.labs.gbx.viz import (
from databricks.labs.gbx.vizx import (
plot_raster,
plot_file,
plot_mask_layers,
Expand Down Expand Up @@ -60,7 +62,7 @@ plot_raster(raster_bytes, *, fig_w=10, fig_h=10, max_pixels=2000, composite="aut
Render a raster from its in-memory bytes — e.g. a tile's `raster` field collected from a GeoBrix DataFrame:

```python
from databricks.labs.gbx.viz import plot_raster
from databricks.labs.gbx.vizx import plot_raster

row = df.select("tile").first()
plot_raster(row["tile"]["raster"])
Expand All @@ -75,10 +77,10 @@ plot_raster(row["tile"]["raster"], composite="depth")
plot_file(path, *, fig_w=10, fig_h=10, max_pixels=2000, composite="auto")
```

Render a raster from disk (GeoTIFF, VRT, …) with the same decimation + stretch pipeline:
Render a raster from disk (GeoTIFF, VRT, …) with the same decimation + stretch pipeline. A leading `dbfs:` or `file:` scheme is stripped automatically, so scheme-qualified Databricks paths (`dbfs:/Volumes/…`, `file:///Volumes/…`) work as well as the bare FUSE path:

```python
from databricks.labs.gbx.viz import plot_file
from databricks.labs.gbx.vizx import plot_file

plot_file("/Volumes/main/geobrix_samples/geobrix-examples/nyc/dem.tif")

Expand Down Expand Up @@ -108,7 +110,7 @@ Overlay several single-band presence-mask tiles on one axes, each drawn as a sol
All tiles must share the same grid and extent — for example, produced on a shared canvas via [`rst_h3_gridspec`](./raster-functions#h3-grid). Layers are drawn in order: pass the largest footprint first and the smallest last so nested coverage remains visible.

```python
from databricks.labs.gbx.viz import plot_mask_layers
from databricks.labs.gbx.vizx import plot_mask_layers

# Each raster is a single-band H3 presence mask for a different threshold.
# Tiles share the same canvas (same gridspec extent).
Expand Down Expand Up @@ -137,10 +139,10 @@ as_gdf(df, wkt_col="wkt", *, max_rows=10_000)
Convert a Spark DataFrame with a WKT geometry column into a `geopandas.GeoDataFrame`. Non-geometry columns are preserved; the WKT column is replaced by the geometry:

```python
from databricks.labs.gbx.viz import as_gdf
from databricks.labs.gbx.vizx import as_gdf

gdf = as_gdf(df_with_wkt, wkt_col="wkt")
gdf.explore() # interactive folium map (needs the [viz] extra)
gdf.explore() # interactive folium map (needs the [vizx] extra)
```

### `cells_as_gdf`
Expand All @@ -164,11 +166,11 @@ Convert a DataFrame of H3 cell ids into a `GeoDataFrame` of cell-boundary polygo
Without `dissolve_by`, each row represents one cell (useful for per-cell tooltips in `.explore()`). With `dissolve_by`, rows are merged per group into a single union footprint — far fewer geometries for large cell sets:

```python
from databricks.labs.gbx.viz import cells_as_gdf
from databricks.labs.gbx.vizx import cells_as_gdf

# Per-cell choropleth:
gdf = cells_as_gdf(df_cells, cell_col="cellid", extra_cols=["count"])
gdf.explore(column="count") # choropleth (needs mapclassify, bundled in [viz])
gdf.explore(column="count") # choropleth (needs mapclassify, bundled in [vizx])

# Dissolve to one footprint polygon per category:
gdf_dissolved = cells_as_gdf(
Expand Down Expand Up @@ -198,7 +200,7 @@ Convert a grid spec — as returned by [`rst_h3_gridspec`](./raster-functions#h3
Optional metadata fields `pixel_size`, `width`, and `height` are carried through if present on the input.

```python
from databricks.labs.gbx.viz import cells_as_gdf, grid_as_gdf
from databricks.labs.gbx.vizx import cells_as_gdf, grid_as_gdf

# grid_row is the 'grid' field from an rst_h3_gridspec result
grid_gdf = grid_as_gdf(grid_row)
Expand All @@ -211,6 +213,63 @@ grid_gdf.explore(m=m, color="black", style_kwds={"fill": False})

See the [H3 rasterize notebook](../notebooks/h3-rasterize) for a worked example pairing `grid_as_gdf` with `cells_as_gdf` to build a shared-canvas overlay.

## Static maps

`plot_static` renders Spark- or GeoPandas-derived geometries (or H3 cells) over
a basemap as a **static** matplotlib figure — the GitHub-renderable counterpart
to `GeoDataFrame.explore()` (whose Leaflet/folium output renders a blank
*"Make this Notebook Trusted"* placeholder on GitHub and the docs site).

The basemap is fetched from a web tile server (via `contextily`) **at execution
time** and rasterized into the figure, so it bakes into the committed notebook
output PNG — GitHub then displays it with no network. If the executing
environment has no egress, the map renders without a basemap and a warning is
emitted (never a hard error).

### `plot_static`

```python
plot_static(
data, *, geom_col=None, grid_system=None, column=None, cmap="viridis",
legend=True, basemap=True, basemap_source=None, alpha=0.8, edgecolor="face",
fill=True, markersize=None, title=None, fig_w=10, fig_h=10, max_rows=10_000,
srid=None, ax=None,
)
```

`data` is a Spark DataFrame **or** a `geopandas.GeoDataFrame`. Returns the
matplotlib `Axes`; pass it back via `ax=` to overlay layers on one map. Every
layer is reprojected to Web Mercator (EPSG:3857), so a `basemap=False` overlay
lines up with a basemap layer on the same axes. `plot_static` does **not** call
`pyplot.show()` — in a notebook the figure auto-displays at cell end with all
overlaid layers; a script can call `plt.show()` itself. Pass `fill=False` to
draw geometries as outlines only (no face), so a boundary doesn't cover the
layers beneath it.

**Geometry columns** accept the same encodings as every other `gbx_st_*`
function — WKT, EWKT, WKB, EWKB, and native `GEOMETRY` / `GEOGRAPHY` (coerced
in-Spark via `st_asbinary`). Set **`grid_system`** to treat the column as DGGS
cell ids instead:

| `grid_system` | Behaviour |
|---|---|
| `None` (default) | Column is a geometry encoding (WKT/EWKT/WKB/EWKB/`GEOMETRY`/`GEOGRAPHY`). |
| `'h3'` | Column holds H3 cell ids (string index **or** bigint); rendered as cell-boundary polygons. |
| `'quadbin'`, `'bng'`, `'custom'` | Planned — currently raise `NotImplementedError`. |

```python
from databricks.labs.gbx.vizx import plot_static

# H3 choropleth over a basemap, then overlay the shared-canvas boundary as a
# red outline (fill=False so it doesn't cover the cells; basemap=False so it
# doesn't re-fetch tiles). Both layers reproject to 3857, so they align.
ax = plot_static(cells_df, grid_system="h3", column="count", title="Coverage")
plot_static(grid_gdf, ax=ax, fill=False, edgecolor="red", basemap=False)
```

`basemap_source` overrides the default `contextily.providers.CartoDB.Positron`;
`basemap=False` skips tiles entirely (deterministic, no network).

## Escape hatches

When you need raster math that isn't in the `rst_*` surface, drop down to NumPy / rasterio per tile with the [pyrx escape hatches](./raster-functions#escape-hatches) (`tile_to_numpy`, `rst_apply`).
Expand Down
Loading
Loading