K1

The foundation format of K-Series. GeoParquet + H3 spatial sorting + K-Series metadata, in one open file format that travels between big-data pipelines and the analyst's laptop.

K1 is an open geo-data file format for storing location intelligence — device pings, GPS tracks, POI events, anything with a latitude and a longitude. Files have the extension .k1 and are valid Parquet, so every tool that reads Parquet reads K1 already.

What K1 adds on top of plain Parquet:

H3 spatial index baked into every row.
Global sort by H3 cell, plus sorting_columns Parquet metadata so readers can skip whole row groups on spatial filters.
K-Series metadata footer carrying version, region, author, source schema, and creation time.
GeoParquet 1.0 compatibility — geopandas / DuckDB spatial / Sedona read the file as a GeoDataFrame natively.
Streaming writer with external merge sort — works on inputs too large to fit in RAM. Single 20 GB / 200 GB Parquet dumps are processed chunk by chunk; many small files are processed in parallel.

K1 is the file format — not a database, not a processing engine, not a visualisation tool. Like PDF, the goal is that every tool reads and writes it.

Quick start

Install:

pip install k1

Write a K1 file from a pandas DataFrame:

import pandas as pd
import k1

df = pd.DataFrame([
    {"device_id": "d_8f3a2b91", "lat": 40.7128, "lon": -74.0060,
     "timestamp": "2026-01-15T09:23:11Z", "event_type": "ping",
     "accuracy_m": 12.5, "speed_kmh": 0.0},
    {"device_id": "d_c7e5d104", "lat": 51.5074, "lon": -0.1278,
     "timestamp": "2026-01-15T14:11:03Z", "event_type": "start",
     "accuracy_m": 15.0, "speed_kmh": 0.0},
    {"device_id": "d_e1a9c8b3", "lat": 35.6762, "lon": 139.6503,
     "timestamp": "2026-01-15T18:55:21Z", "event_type": "ping",
     "accuracy_m": 22.0, "speed_kmh": 11.4},
    {"device_id": "d_47b22f01", "lat": 52.5200, "lon": 13.4050,
     "timestamp": "2026-01-15T20:01:09Z", "event_type": "end",
     "accuracy_m": 9.8, "speed_kmh": 0.0},
    {"device_id": "d_fa3e0d6c", "lat": -33.8688, "lon": 151.2093,
     "timestamp": "2026-01-16T03:45:32Z", "event_type": "ping",
     "accuracy_m": 18.0, "speed_kmh": 33.5},
])

k1.write_k1(df, "mobility.k1", h3_resolution=8, author="example")

Read it back:

data = k1.load("mobility.k1")
print(data.info())
# {'path': '.../mobility.k1', 'k1_version': '1.0.0', 'h3_resolution': 8, ...}

df = data.to_dataframe()
gdf = data.to_geodataframe()    # geopandas, geometry decoded from WKB

Or from the command line:

k1 write mobility.csv mobility.k1 --region GLOBAL --author me
k1 info mobility.k1
k1 convert mobility.k1 mobility.k2     # produces a DuckDB file

For inputs that don't fit in RAM (many files, single huge file):

k1.write_k1_streaming(
    sources=["/data/export/part_0.parquet", "/data/export/part_1.parquet", ...],
    output="combined.k1",
)

Why K1?

Format	Query engine	H3 native	Scales	K-Series metadata	Single file
GeoJSON	no	no	breaks past a few hundred MB	no	yes
Shapefile	no	no	1990s tech, 4-file bundle	no	no
GeoParquet	no	no	billions	no	yes
GeoPackage	yes (SQLite)	no	millions	no	yes
FlatGeobuf	no	no	fast reads	no	yes
PMTiles	no	no	tiles only	no	yes
K1	yes (Parquet)	yes	billions	yes	yes
K2	yes (DuckDB)	yes	any	yes	yes

The point of K1 is intelligence baked in at every layer. Other formats are bytes on disk; K1 carries enough context to be useful the moment it lands on someone's machine.

Features at a glance

File format

.k1 extension. Internally a Parquet 1.0 file.
Required columns: h3_index (string), h3_resolution (int32), geometry (binary WKB), k1_source_id (string), k1_imported_at (timestamp UTC).
Rows globally sorted ascending by h3_index.
sorting_columns declared in the Parquet footer so range filters can prune row groups.
GeoParquet 1.0 compatible (geo schema metadata key).
K-Series metadata footer (k1_version, k1_h3_resolution, k1_region, k1_author, etc.).

Python SDK

k1.write_k1(source, output, ...) — in-memory writer for inputs up to a few GB.
k1.write_k1_streaming(sources, output, ..., workers=None) — multi-source, memory-bounded writer with external merge sort via DuckDB and multi-core shard-pass parallelism.
k1.load(path) — read a .k1 file. Returns a K1 object with info(), to_dataframe(), to_geodataframe(), to_k2(), metadata, columns.
Auto-detection of lat/lng columns across many common name variants (lat, latitude, y, lon, lng, long, longitude, x).
Auto-snake-case for all column names.
Auto-reprojection to WGS84.
Single-file row-group chunking — feed in a 20 GB / 200 GB Parquet dump and it processes chunk by chunk.
CSV chunked reads via pd.read_csv(chunksize=…) when the source is a CSV.

CLI

k1 write SOURCE OUTPUT.k1 — convert any source to K1.
k1 info FILE.k1 — print metadata + summary (pretty or --json).
k1 convert FILE.k1 FILE.k2 — produce a K2 (DuckDB) file with bundled SQL engine.

JavaScript / Node SDK

npm install k1-js.
Read-only in v0.1. load(path), K1.info(), K1.rows(), K1.toGeoJSON().
Pure JS — Node 18+, modern browsers, no WASM.

Output compression and encoding

Default: zstd compression — typically ~50% smaller than snappy on K1 data.
BYTE_STREAM_SPLIT encoding on every float column — additional ~30–50% reduction for sequential coordinates.
Dictionary encoding on string columns — h3_index repeats heavily within a row group so this is near-free.
1M-row Parquet row groups by default — sensible predicate- pushdown granularity, no pyarrow page-header pitfalls.

Documentation

Four documentation files cover the project from beginner to expert:

Doc	Audience	What it covers
`USER_GUIDE.md`	Anyone using K1	Install, write your first file, every CLI command, big-data streaming workflows, single-file chunking, K1→K2 conversion, JavaScript usage, common workflows, troubleshooting/FAQ, glossary.
`TECHNICAL_DOC.md`	Engineers and integrators	Architecture, file format spec, SDK layout, in-memory and streaming pipelines, memory model, performance model, every design decision and rationale, failure modes, extension points, limitations.
`CHANGELOG.md`	Anyone tracking versions	Per-release notes (currently v0.1.0), known issues, semantic-versioning policy.
`CONTRIBUTING.md`	Contributors	Project structure, dev setup, code style, docs discipline, commit/PR conventions, end-to-end walkthroughs for adding source formats / CLI commands / metadata keys / `K1` methods, JS SDK work, performance-work guidelines, format-evolution rules.

The K1 file format specification, the K-Series vision document, and all sibling K-format specs (K2, K3, ...) live in the K-Series standards repo: github.com/Kenzy-Zero/k-series.

VISION.md — the K-Series north star.
K1.md — the K1 format spec (the contract this repo implements).
K2.md — the K2 format spec (K1 already produces conformant .k2 via K1.to_k2(); the standalone K2 SDK is in active development).

The CLI in one page

k1 write SOURCE OUTPUT.k1
    [-r RESOLUTION]            # H3 resolution 0-15, default 8
    [--region REGION]
    [--author AUTHOR]
    [--source-id ID]
    [--lat-col COL]            # override auto-detection
    [--lng-col COL]
    [--compression CODEC]      # zstd (default), snappy, gzip, lz4, brotli, none

k1 info FILE.k1
    [--json]                   # emit JSON instead of pretty layout

k1 convert FILE.k1 FILE.k2

k1 --version
k1 --help
k1 <subcommand> --help

Example session:

$ k1 write mobility.csv mobility.k1 -r 9 --region GLOBAL
wrote /.../mobility.k1
  rows: 5
  h3_resolution: 9
  size: 8.91 KB

$ k1 info mobility.k1
path             /.../mobility.k1
k1_version       1.0.0
k1_standard      geo
h3_resolution    9
crs              EPSG:4326
region           GLOBAL
created_at       2026-01-15T20:01:09+00:00
row_count        5
columns          (12)
  - device_id
  - lat
  - lon
  - timestamp
  - event_type
  - accuracy_m
  - speed_kmh
  - h3_index
  - h3_resolution
  - k1_source_id
  - k1_imported_at
  - geometry

$ k1 convert mobility.k1 mobility.k2
converted /.../mobility.k1 -> /.../mobility.k2
  rows: 5
  size: 256.0 KB

Full reference: USER_GUIDE.md §6.

JavaScript / Node

npm install k1-js

import { load } from 'k1-js';

const data = await load('mobility.k1');
console.log(await data.info());

const sample = await data.rows({
  columns: ['device_id', 'lat', 'lon'],
  limit: 3,
});
console.log(sample);
// [
//   { device_id: 'd_8f3a2b91', lat: 40.7128, lon: -74.006 },
//   { device_id: 'd_c7e5d104', lat: 51.5074, lon: -0.1278 },
//   { device_id: 'd_e1a9c8b3', lat: 35.6762, lon: 139.6503 }
// ]

await data.toGeoJSON({ output: 'mobility.geojson' });

Write support is on the roadmap. v0.1 is read-only.

Full reference: USER_GUIDE.md §10.

The K-Series family

K1 is part of K-Series — a family of open geo-data standards. Today only K1 ships; the others are in the roadmap.

Format	Role	Status
K1	Foundation. Parquet-based. Big-data pipeline output.	v0.1 — active
K2	Intelligence. DuckDB-based. Analyst/dev layer.	Active development
K3	Mobility & trajectory (OD matrices, flows).	2027
K4	Audience & identity (segments, tiers).	2027
K5	Urban & real estate (buildings, parcels).	2028
K6	Retail & POI intelligence.	2028

K1 today can already produce K2 files via K1.to_k2() or k1 convert. The full K2 SDK (its own writer, querier, etc.) is in active development.

Project status

Version: 0.1.0 — first public release. Format spec at k1_version: 1.0.0.

Stability:

File format is stable. Files written today will be readable by future versions.
Python SDK public API (write_k1, write_k1_streaming, load, K1.*) is stable for the 0.x line. Internal helpers (_*) may change without notice.
CLI subcommand surface is stable; flags may be added (never removed in a minor release).
JavaScript SDK public surface is stable for the 0.x line.

Versioning: Semantic Versioning. Format-breaking changes require a MAJOR bump; additions are MINOR; fixes are PATCH.

Benchmarks

Benchmarks run on real-world mobility data. Results may vary based on hardware and data characteristics.

Acknowledgements

K1 stands on the shoulders of giants:

Apache Parquet — binary storage foundation.
Apache Arrow — in-memory columnar format.
GeoParquet — the geometry spec K1 is compatible with.
DuckDB — the external sort and the K2 file format.
Uber H3 — hexagonal spatial indexing.
hyparquet — pure-JS Parquet reader powering k1-js.

License

MIT. See LICENSE.

Built by Kenzy-Zero with K-Series as an open community standard. Contributions welcome — see CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
examples		examples
k1-js		k1-js
k1		k1
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TECHNICAL_DOC.md		TECHNICAL_DOC.md
USER_GUIDE.md		USER_GUIDE.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_tests.py		run_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K1

Table of Contents

Quick start

Why K1?

Features at a glance

File format

Python SDK

CLI

JavaScript / Node SDK

Output compression and encoding

Documentation

The CLI in one page

JavaScript / Node

The K-Series family

Project status

Benchmarks

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

K1

Table of Contents

Quick start

Why K1?

Features at a glance

File format

Python SDK

CLI

JavaScript / Node SDK

Output compression and encoding

Documentation

The CLI in one page

JavaScript / Node

The K-Series family

Project status

Benchmarks

Acknowledgements

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages