Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions docs/src/format/index.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,44 @@
# Lance Lakehouse Format Specification
# Lance Lakehouse Format Specifications

Lance is a lakehouse format designed as a stack of interoperating specifications instead of a single file or metadata layout. The storage-facing layers are the file format, table format, index formats, and catalog specifications, with a unified namespace interface sitting above them.
Lance is a lakehouse format defined as a stack of interoperating specifications, rather than as a single file format or metadata layout. The storage-facing layers cover files, tables, indices, and catalogs. A unified namespace interface sits above those layers and gives engines a consistent way to work with Lance tables across catalog implementations.

## Architecture Overview

Modern lakehouses are built from cooperating layers. Lance keeps those layers intentionally decoupled so that the file format, table metadata, indices, and catalogs can evolve independently without forcing lock-in across the stack.
Modern lakehouses are built from complementary layers. Lance keeps those layers intentionally decoupled so that the file format, table metadata, indices, and catalogs can evolve independently without forcing lock-in across the stack.

![Lakehouse Stack](../images/lakehouse_stack.png)

At a high level:

- The **file format** stores column data in large random-access-friendly pages and avoids row groups.
- The **file format** stores column data in large pages optimized for random access and avoids row groups.
- The **table format** manages fragments, manifests, deletions, schema evolution, and ACID commits.
- The **index formats** define redundant search structures such as scalar, vector, full-text, and system indices.
- The **catalog specs** define how tables are discovered, registered, and coordinated across engines and services.
- The **namespace client spec** provides a unified interface for engines to interact with any catalog implementation.
- The **namespace client spec** provides a unified interface for engines to interact with any catalog implementations.

The layers are designed so that only table readers, table writers, and index readers or writers need to know the on-disk Lance file layout.
The layers are designed so that only table readers, table writers, and index readers or writers need to understand the on-disk Lance file layout.

## Design Themes

### File Format

The Lance file format is optimized for cloud object storage and highly selective reads. It avoids Parquet-style row groups, uses structural encodings that support efficient random access, and keeps statistics and search structures out of the file format so those concerns can evolve as independent indices.
The Lance file format is optimized for cloud object storage and highly selective reads. It avoids Parquet-style row groups, uses structural encodings for efficient random access, and keeps statistics and search structures out of the file format so those concerns can evolve independently as indices.

### Table Format

The Lance table format stores data in two dimensions: rows are grouped into fragments, and each fragment can contain multiple data files that each contribute a subset of columns. This makes column additions and backfills metadata-heavy instead of rewrite-heavy, which is especially useful for feature engineering and embedding workflows.
The Lance table format organizes data in two dimensions: rows are grouped into fragments, and each fragment can contain multiple data files, each contributing a subset of columns. This makes column additions and backfills primarily metadata operations instead of data rewrites, which is especially useful for feature engineering and embedding workflows.

### Index Formats

Indices are first-class table objects. Lance tables define how indices are discovered, versioned, and coordinated transactionally, while the index formats themselves remain decoupled from both the file encoding and the table manifest structure.
Indices are first-class table objects. Lance tables define how indices are discovered, versioned, and coordinated transactionally. The index formats themselves remain decoupled from both the file encoding and the table manifest structure.

### Catalog Specs

Lance provides storage-native and service-oriented catalog options. The [Directory Catalog](catalog/dir/index.md) supports zero-infrastructure deployments directly on object stores, while the [REST Catalog](catalog/rest/index.md) standardizes enterprise-facing APIs and can act as an external manifest store.
Lance provides both storage-native and service-oriented catalog options. The [Directory Catalog](catalog/dir/index.md) supports zero-infrastructure deployments directly on object stores, while the [REST Catalog](catalog/rest/index.md) standardizes enterprise-facing APIs and can act as an external manifest store.

### Namespace Client Spec

The [Namespace Client Spec](namespace/index.md) provides a unified interface for engines to interact with any catalog implementation, across both Lance native catalog specs and third-party catalog systems, in any programming language. This abstraction allows applications to switch between directory-based, REST-based, or third-party catalogs without changing their code.
The [Namespace Client Spec](namespace/index.md) provides a language-agnostic interface for engines to interact with any catalog implementation, including Lance-native catalogs and third-party catalog systems. This abstraction allows applications to switch between directory-based, REST-based, and third-party catalogs without changing their code.

## Specifications

Expand Down
Binary file modified docs/src/images/lakehouse_stack.png

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: wow, that is way better 😍

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading