The foundation format of K-Series. GeoParquet + H3 spatial sorting + K-Series metadata, in one open file format that travels between big-data pipelines and the analyst's laptop.
K1 is an open geo-data file format for storing location
intelligence — device pings, GPS tracks, POI events, anything
with a latitude and a longitude. Files have the extension .k1
and are valid Parquet, so every tool that reads Parquet reads
K1 already.
What K1 adds on top of plain Parquet:
- H3 spatial index baked into every row.
- Global sort by H3 cell, plus
sorting_columnsParquet metadata so readers can skip whole row groups on spatial filters. - K-Series metadata footer carrying version, region, author, source schema, and creation time.
- GeoParquet 1.0 compatibility — geopandas / DuckDB spatial / Sedona read the file as a GeoDataFrame natively.
- Streaming writer with external merge sort — works on inputs too large to fit in RAM. Single 20 GB / 200 GB Parquet dumps are processed chunk by chunk; many small files are processed in parallel.
K1 is the file format — not a database, not a processing engine, not a visualisation tool. Like PDF, the goal is that every tool reads and writes it.
- Quick start
- Why K1?
- Features at a glance
- Documentation
- The CLI in one page
- JavaScript / Node
- The K-Series family
- Project status
- Benchmarks
- Acknowledgements
- License
Install:
pip install k1Write a K1 file from a pandas DataFrame:
import pandas as pd
import k1
df = pd.DataFrame([
{"device_id": "d_8f3a2b91", "lat": 40.7128, "lon": -74.0060,
"timestamp": "2026-01-15T09:23:11Z", "event_type": "ping",
"accuracy_m": 12.5, "speed_kmh": 0.0},
{"device_id": "d_c7e5d104", "lat": 51.5074, "lon": -0.1278,
"timestamp": "2026-01-15T14:11:03Z", "event_type": "start",
"accuracy_m": 15.0, "speed_kmh": 0.0},
{"device_id": "d_e1a9c8b3", "lat": 35.6762, "lon": 139.6503,
"timestamp": "2026-01-15T18:55:21Z", "event_type": "ping",
"accuracy_m": 22.0, "speed_kmh": 11.4},
{"device_id": "d_47b22f01", "lat": 52.5200, "lon": 13.4050,
"timestamp": "2026-01-15T20:01:09Z", "event_type": "end",
"accuracy_m": 9.8, "speed_kmh": 0.0},
{"device_id": "d_fa3e0d6c", "lat": -33.8688, "lon": 151.2093,
"timestamp": "2026-01-16T03:45:32Z", "event_type": "ping",
"accuracy_m": 18.0, "speed_kmh": 33.5},
])
k1.write_k1(df, "mobility.k1", h3_resolution=8, author="example")Read it back:
data = k1.load("mobility.k1")
print(data.info())
# {'path': '.../mobility.k1', 'k1_version': '1.0.0', 'h3_resolution': 8, ...}
df = data.to_dataframe()
gdf = data.to_geodataframe() # geopandas, geometry decoded from WKBOr from the command line:
k1 write mobility.csv mobility.k1 --region GLOBAL --author me
k1 info mobility.k1
k1 convert mobility.k1 mobility.k2 # produces a DuckDB fileFor inputs that don't fit in RAM (many files, single huge file):
k1.write_k1_streaming(
sources=["/data/export/part_0.parquet", "/data/export/part_1.parquet", ...],
output="combined.k1",
)| Format | Query engine | H3 native | Scales | K-Series metadata | Single file |
|---|---|---|---|---|---|
| GeoJSON | no | no | breaks past a few hundred MB | no | yes |
| Shapefile | no | no | 1990s tech, 4-file bundle | no | no |
| GeoParquet | no | no | billions | no | yes |
| GeoPackage | yes (SQLite) | no | millions | no | yes |
| FlatGeobuf | no | no | fast reads | no | yes |
| PMTiles | no | no | tiles only | no | yes |
| K1 | yes (Parquet) | yes | billions | yes | yes |
| K2 | yes (DuckDB) | yes | any | yes | yes |
The point of K1 is intelligence baked in at every layer. Other formats are bytes on disk; K1 carries enough context to be useful the moment it lands on someone's machine.
.k1extension. Internally a Parquet 1.0 file.- Required columns:
h3_index(string),h3_resolution(int32),geometry(binary WKB),k1_source_id(string),k1_imported_at(timestamp UTC). - Rows globally sorted ascending by
h3_index. sorting_columnsdeclared in the Parquet footer so range filters can prune row groups.- GeoParquet 1.0 compatible (
geoschema metadata key). - K-Series metadata footer (
k1_version,k1_h3_resolution,k1_region,k1_author, etc.).
k1.write_k1(source, output, ...)— in-memory writer for inputs up to a few GB.k1.write_k1_streaming(sources, output, ..., workers=None)— multi-source, memory-bounded writer with external merge sort via DuckDB and multi-core shard-pass parallelism.k1.load(path)— read a.k1file. Returns aK1object withinfo(),to_dataframe(),to_geodataframe(),to_k2(),metadata,columns.- Auto-detection of lat/lng columns across many common name
variants (
lat,latitude,y,lon,lng,long,longitude,x). - Auto-snake-case for all column names.
- Auto-reprojection to WGS84.
- Single-file row-group chunking — feed in a 20 GB / 200 GB Parquet dump and it processes chunk by chunk.
- CSV chunked reads via
pd.read_csv(chunksize=…)when the source is a CSV.
k1 write SOURCE OUTPUT.k1— convert any source to K1.k1 info FILE.k1— print metadata + summary (pretty or--json).k1 convert FILE.k1 FILE.k2— produce a K2 (DuckDB) file with bundled SQL engine.
npm install k1-js.- Read-only in v0.1.
load(path),K1.info(),K1.rows(),K1.toGeoJSON(). - Pure JS — Node 18+, modern browsers, no WASM.
- Default: zstd compression — typically ~50% smaller than snappy on K1 data.
BYTE_STREAM_SPLITencoding on every float column — additional ~30–50% reduction for sequential coordinates.- Dictionary encoding on string columns — h3_index repeats heavily within a row group so this is near-free.
- 1M-row Parquet row groups by default — sensible predicate- pushdown granularity, no pyarrow page-header pitfalls.
Four documentation files cover the project from beginner to expert:
| Doc | Audience | What it covers |
|---|---|---|
USER_GUIDE.md |
Anyone using K1 | Install, write your first file, every CLI command, big-data streaming workflows, single-file chunking, K1→K2 conversion, JavaScript usage, common workflows, troubleshooting/FAQ, glossary. |
TECHNICAL_DOC.md |
Engineers and integrators | Architecture, file format spec, SDK layout, in-memory and streaming pipelines, memory model, performance model, every design decision and rationale, failure modes, extension points, limitations. |
CHANGELOG.md |
Anyone tracking versions | Per-release notes (currently v0.1.0), known issues, semantic-versioning policy. |
CONTRIBUTING.md |
Contributors | Project structure, dev setup, code style, docs discipline, commit/PR conventions, end-to-end walkthroughs for adding source formats / CLI commands / metadata keys / K1 methods, JS SDK work, performance-work guidelines, format-evolution rules. |
The K1 file format specification, the K-Series vision document, and all sibling K-format specs (K2, K3, ...) live in the K-Series standards repo: github.com/Kenzy-Zero/k-series.
VISION.md— the K-Series north star.K1.md— the K1 format spec (the contract this repo implements).K2.md— the K2 format spec (K1 already produces conformant.k2viaK1.to_k2(); the standalone K2 SDK is in active development).
k1 write SOURCE OUTPUT.k1
[-r RESOLUTION] # H3 resolution 0-15, default 8
[--region REGION]
[--author AUTHOR]
[--source-id ID]
[--lat-col COL] # override auto-detection
[--lng-col COL]
[--compression CODEC] # zstd (default), snappy, gzip, lz4, brotli, none
k1 info FILE.k1
[--json] # emit JSON instead of pretty layout
k1 convert FILE.k1 FILE.k2
k1 --version
k1 --help
k1 <subcommand> --helpExample session:
$ k1 write mobility.csv mobility.k1 -r 9 --region GLOBAL
wrote /.../mobility.k1
rows: 5
h3_resolution: 9
size: 8.91 KB
$ k1 info mobility.k1
path /.../mobility.k1
k1_version 1.0.0
k1_standard geo
h3_resolution 9
crs EPSG:4326
region GLOBAL
created_at 2026-01-15T20:01:09+00:00
row_count 5
columns (12)
- device_id
- lat
- lon
- timestamp
- event_type
- accuracy_m
- speed_kmh
- h3_index
- h3_resolution
- k1_source_id
- k1_imported_at
- geometry
$ k1 convert mobility.k1 mobility.k2
converted /.../mobility.k1 -> /.../mobility.k2
rows: 5
size: 256.0 KBFull reference: USER_GUIDE.md §6.
npm install k1-jsimport { load } from 'k1-js';
const data = await load('mobility.k1');
console.log(await data.info());
const sample = await data.rows({
columns: ['device_id', 'lat', 'lon'],
limit: 3,
});
console.log(sample);
// [
// { device_id: 'd_8f3a2b91', lat: 40.7128, lon: -74.006 },
// { device_id: 'd_c7e5d104', lat: 51.5074, lon: -0.1278 },
// { device_id: 'd_e1a9c8b3', lat: 35.6762, lon: 139.6503 }
// ]
await data.toGeoJSON({ output: 'mobility.geojson' });Write support is on the roadmap. v0.1 is read-only.
Full reference: USER_GUIDE.md §10.
K1 is part of K-Series — a family of open geo-data standards. Today only K1 ships; the others are in the roadmap.
| Format | Role | Status |
|---|---|---|
| K1 | Foundation. Parquet-based. Big-data pipeline output. | v0.1 — active |
| K2 | Intelligence. DuckDB-based. Analyst/dev layer. | Active development |
| K3 | Mobility & trajectory (OD matrices, flows). | 2027 |
| K4 | Audience & identity (segments, tiers). | 2027 |
| K5 | Urban & real estate (buildings, parcels). | 2028 |
| K6 | Retail & POI intelligence. | 2028 |
K1 today can already produce K2 files via K1.to_k2() or
k1 convert. The full K2 SDK (its own writer, querier, etc.) is
in active development.
Version: 0.1.0 — first public release. Format spec at
k1_version: 1.0.0.
Stability:
- File format is stable. Files written today will be readable by future versions.
- Python SDK public API (
write_k1,write_k1_streaming,load,K1.*) is stable for the 0.x line. Internal helpers (_*) may change without notice. - CLI subcommand surface is stable; flags may be added (never removed in a minor release).
- JavaScript SDK public surface is stable for the 0.x line.
Versioning: Semantic Versioning. Format-breaking changes require a MAJOR bump; additions are MINOR; fixes are PATCH.
Benchmarks run on real-world mobility data. Results may vary based on hardware and data characteristics.
K1 stands on the shoulders of giants:
- Apache Parquet — binary storage foundation.
- Apache Arrow — in-memory columnar format.
- GeoParquet — the geometry spec K1 is compatible with.
- DuckDB — the external sort and the K2 file format.
- Uber H3 — hexagonal spatial indexing.
- hyparquet — pure-JS
Parquet reader powering
k1-js.
MIT. See LICENSE.
Built by Kenzy-Zero with K-Series
as an open community standard. Contributions welcome — see
CONTRIBUTING.md.