NodeDB's array engine stores multi-dimensional data with bitemporal support — system time (when data was entered) and valid time (what the data represents). Use it for scientific computing, geospatial grids, medical imaging, and time-evolving spatial data. Cells are compressed, indexed by Z-order curves, and queryable via SQL table-valued functions.
- Scientific simulations and climate models
- Medical imaging and volumetric analysis
- GIS raster data (elevation maps, satellite imagery)
- Time-evolving geospatial grids
- Sparse multi-dimensional datasets
- Hypervolume analysis with bitemporal audit trails
- Multi-dimensional storage — Define arrays with arbitrary dimensions (e.g., 3D space + time)
- Tile-based compression — Cells grouped into tiles, each tile independently compressed (ALP, FastLanes, Gorilla, LZ4)
- Z-order indexing — Hilbert/Z-order curve linearization for spatial locality and fast range queries
- Bitemporal support — Both system time (audit trail) and valid time (temporal semantics) tracked per tile
- Row-major or column-major layout — Choose
CELL_ORDERto match your access patterns - Cross-engine identity — Cells linked via surrogate bitmaps to vector, graph, document, and columnar queries
- Distributed execution — Sharded by tile, queries scatter-gather across cores/nodes
- Tile-level retention — Purge old versions by system time for compliance (GDPR, data minimization)
Array schemas are defined by dimensions (axes), attributes (stored values), and tile extents (chunk size):
CREATE ARRAY spatial_grid
DIMS (
x INT64 DOMAIN [0, 1000),
y INT64 DOMAIN [0, 1000),
z INT64 DOMAIN [0, 1000)
)
ATTRS (
temperature FLOAT32,
pressure FLOAT32,
humidity FLOAT32
)
TILE_EXTENTS (64, 64, 64)
WITH (
cell_order = 'Z-ORDER',
audit_retain_ms = 86400000
);| Parameter | Required | Default | Description |
|---|---|---|---|
DIMS |
Yes | — | List of dimensions. Each has a name, type (INT64, INT32, FLOAT64), and domain bounds [lo, hi). |
ATTRS |
Yes | — | List of attributes (cell values). Each has a name and type (FLOAT32, FLOAT64, INT32, INT64, STRING`). |
TILE_EXTENTS |
Yes | — | Tuple of tile extent per dimension. All > 0. Determines cell locality and compression block granularity. |
cell_order |
No | 'Z-ORDER' |
'Z-ORDER' (Hilbert curve) or 'ROW-MAJOR'. Affects spatial cache locality. |
audit_retain_ms |
No | NULL |
Milliseconds. Tiles older than now - audit_retain_ms (by system time) are eligible for purge. NULL = keep all. |
-- Create a 2D elevation map with tiles of 256x256 cells
CREATE ARRAY elevation_map
DIMS (
lon FLOAT64 DOMAIN [-180, 180),
lat FLOAT64 DOMAIN [-90, 90)
)
ATTRS (
height FLOAT32
)
TILE_EXTENTS (256, 256);
-- Insert cells (or rows of cells)
-- Cells are written as objects with dim names as keys
INSERT INTO elevation_map (lon, lat, height) VALUES
(-73.5, 40.7, 10.5),
(-73.6, 40.8, 12.3),
(-73.7, 40.6, 8.9);
-- Flush memtable to persistent storage
SELECT NDARRAY_FLUSH('elevation_map');-- Create a 3D climate model with temporal tracking
CREATE ARRAY climate_forecast
DIMS (
lon INT32 DOMAIN [-180, 180),
lat INT32 DOMAIN [-90, 90),
altitude_m INT32 DOMAIN [0, 50000)
)
ATTRS (
temp_c FLOAT32,
humidity FLOAT32
)
TILE_EXTENTS (32, 32, 20)
WITH (audit_retain_ms = 7776000000); -- 90 days
-- Insert forecast data at a specific valid time
INSERT INTO climate_forecast (lon, lat, altitude_m, temp_c, humidity) VALUES
(10, 20, 5000, -10.5, 65.0),
(10, 20, 10000, -20.3, 40.0);
-- Query at a specific moment in time (system time)
SELECT lon, lat, altitude_m, temp_c
FROM NDARRAY_SLICE('climate_forecast', {lon: [10, 15), lat: [20, 25), altitude_m: [5000, 15000)}, ['temp_c'])
AS OF SYSTEM TIME 1700000000000;All array queries use table-valued functions in the FROM clause. System time and valid time can be specified via AS OF clauses.
Returns cells within a multi-dimensional range:
SELECT * FROM NDARRAY_SLICE(
'elevation_map',
{lon: [-74.0, -73.0), lat: [40.0, 41.0)},
['height'], -- projecting only 'height' (optional)
1000 -- limit to 1000 cells (optional)
);| Parameter | Required | Type | Description |
|---|---|---|---|
array |
Yes | STRING | Array name |
bounds |
Yes | OBJECT | Dict of dim name → [lo, hi) bounds. Omitted dims = full range |
attrs |
No | ARRAY[STRING] | Attributes to project. NULL = all attributes |
limit |
No | INT64 | Max cells returned. NULL = no limit |
Returns all cells, optionally filtered to specific attributes:
SELECT * FROM NDARRAY_PROJECT(
'spatial_grid',
['temperature', 'pressure'] -- only these attributes
);Aggregates an attribute over a dimension, reducing dimensionality:
-- Sum temperature over the x dimension, keeping y and z
SELECT * FROM NDARRAY_AGG(
'spatial_grid',
'temperature',
'SUM',
'x' -- aggregate over x; result has one less dimension
);Supported reducers: 'SUM', 'AVG', 'MIN', 'MAX', 'COUNT'.
Applies an operation between two arrays (or array and scalar) with the same shape:
-- Subtract a baseline from all cells
SELECT * FROM NDARRAY_ELEMENTWISE(
'current_grid',
'baseline_grid',
'SUBTRACT',
'temperature' -- the attribute to operate on
);Forces in-memory cells to durable storage:
SELECT NDARRAY_FLUSH('spatial_grid') AS result;
-- Returns: {result: true}Returns a single row {result: BOOL}. Always returns true; failure is fatal and raises an error.
Merges tile versions and reclaims space:
SELECT NDARRAY_COMPACT('spatial_grid') AS result;
-- Returns: {result: true}Compaction is background-automatic, but can be triggered manually.
Arrays support dual timestamping:
- System Time — When the cell value was written (audit trail, compliance)
- Valid Time — When the cell represents (temporal semantics, forecasts, corrections)
Query as of either or both:
-- Read cells as they existed at a point in the past
SELECT * FROM NDARRAY_SLICE(
'data',
{x: [0, 100), y: [0, 100)},
['value']
)
AS OF SYSTEM TIME 1700000000000;
-- Read cells that were valid at a specific time
SELECT * FROM NDARRAY_SLICE(
'forecast',
{x: [0, 100), y: [0, 100)},
['temp']
)
AS OF VALID TIME 1700000000000;
-- Read cells that were valid AND existed at a specific time
SELECT * FROM NDARRAY_SLICE(
'forecast',
{x: [0, 100), y: [0, 100)},
['temp']
)
AS OF SYSTEM TIME 1700000000000 AS OF VALID TIME 1700000001000;Array cells are addressable via surrogate identity alongside other engines. Combine array queries with vector search, graph traversal, and full-text search:
-- Find cells near a vector embedding, return array slice
SELECT *
FROM NDARRAY_SLICE('spatial_data', slice_bounds, ['attr1', 'attr2'])
WHERE id IN (
SEARCH vectors USING VECTOR(embedding, query_vec, 100)
);See Architecture — Cross-engine identity for details.
- Tile-level parallelism — Each tile is read/processed in parallel on separate cores
- Compression — Typical 5-20x compression depending on data homogeneity
- Range queries — Z-order indexing provides cache-friendly access; skip irrelevant tiles via block statistics
- Sparse data — Only materialized cells stored; implicit zeros and empty regions not persisted
System time–based retention enables GDPR and data minimization compliance:
ALTER NDARRAY spatial_grid SET (audit_retain_ms = 86400000); -- keep 1 day
-- Tiles older than now - 1 day are candidates for purge
-- Purge is automatic during compactionPurged tiles are irreversibly removed; historical queries beyond the retention window will see gaps.
- Bitemporal — Cross-engine bitemporal architecture
- Architecture — Cross-engine identity — Surrogate bitmap linking
- Columnar — Related structured analytics engine