diff --git a/docs/src/format/index.md b/docs/src/format/index.md index c6956453b05..055c9129c5f 100644 --- a/docs/src/format/index.md +++ b/docs/src/format/index.md @@ -1,44 +1,44 @@ -# Lance Lakehouse Format Specification +# Lance Lakehouse Format Specifications -Lance is a lakehouse format designed as a stack of interoperating specifications instead of a single file or metadata layout. The storage-facing layers are the file format, table format, index formats, and catalog specifications, with a unified namespace interface sitting above them. +Lance is a lakehouse format defined as a stack of interoperating specifications, rather than as a single file format or metadata layout. The storage-facing layers cover files, tables, indices, and catalogs. A unified namespace interface sits above those layers and gives engines a consistent way to work with Lance tables across catalog implementations. ## Architecture Overview -Modern lakehouses are built from cooperating layers. Lance keeps those layers intentionally decoupled so that the file format, table metadata, indices, and catalogs can evolve independently without forcing lock-in across the stack. +Modern lakehouses are built from complementary layers. Lance keeps those layers intentionally decoupled so that the file format, table metadata, indices, and catalogs can evolve independently without forcing lock-in across the stack. ![Lakehouse Stack](../images/lakehouse_stack.png) At a high level: -- The **file format** stores column data in large random-access-friendly pages and avoids row groups. +- The **file format** stores column data in large pages optimized for random access and avoids row groups. - The **table format** manages fragments, manifests, deletions, schema evolution, and ACID commits. - The **index formats** define redundant search structures such as scalar, vector, full-text, and system indices. - The **catalog specs** define how tables are discovered, registered, and coordinated across engines and services. -- The **namespace client spec** provides a unified interface for engines to interact with any catalog implementation. +- The **namespace client spec** provides a unified interface for engines to interact with any catalog implementations. -The layers are designed so that only table readers, table writers, and index readers or writers need to know the on-disk Lance file layout. +The layers are designed so that only table readers, table writers, and index readers or writers need to understand the on-disk Lance file layout. ## Design Themes ### File Format -The Lance file format is optimized for cloud object storage and highly selective reads. It avoids Parquet-style row groups, uses structural encodings that support efficient random access, and keeps statistics and search structures out of the file format so those concerns can evolve as independent indices. +The Lance file format is optimized for cloud object storage and highly selective reads. It avoids Parquet-style row groups, uses structural encodings for efficient random access, and keeps statistics and search structures out of the file format so those concerns can evolve independently as indices. ### Table Format -The Lance table format stores data in two dimensions: rows are grouped into fragments, and each fragment can contain multiple data files that each contribute a subset of columns. This makes column additions and backfills metadata-heavy instead of rewrite-heavy, which is especially useful for feature engineering and embedding workflows. +The Lance table format organizes data in two dimensions: rows are grouped into fragments, and each fragment can contain multiple data files, each contributing a subset of columns. This makes column additions and backfills primarily metadata operations instead of data rewrites, which is especially useful for feature engineering and embedding workflows. ### Index Formats -Indices are first-class table objects. Lance tables define how indices are discovered, versioned, and coordinated transactionally, while the index formats themselves remain decoupled from both the file encoding and the table manifest structure. +Indices are first-class table objects. Lance tables define how indices are discovered, versioned, and coordinated transactionally. The index formats themselves remain decoupled from both the file encoding and the table manifest structure. ### Catalog Specs -Lance provides storage-native and service-oriented catalog options. The [Directory Catalog](catalog/dir/index.md) supports zero-infrastructure deployments directly on object stores, while the [REST Catalog](catalog/rest/index.md) standardizes enterprise-facing APIs and can act as an external manifest store. +Lance provides both storage-native and service-oriented catalog options. The [Directory Catalog](catalog/dir/index.md) supports zero-infrastructure deployments directly on object stores, while the [REST Catalog](catalog/rest/index.md) standardizes enterprise-facing APIs and can act as an external manifest store. ### Namespace Client Spec -The [Namespace Client Spec](namespace/index.md) provides a unified interface for engines to interact with any catalog implementation, across both Lance native catalog specs and third-party catalog systems, in any programming language. This abstraction allows applications to switch between directory-based, REST-based, or third-party catalogs without changing their code. +The [Namespace Client Spec](namespace/index.md) provides a language-agnostic interface for engines to interact with any catalog implementation, including Lance-native catalogs and third-party catalog systems. This abstraction allows applications to switch between directory-based, REST-based, and third-party catalogs without changing their code. ## Specifications diff --git a/docs/src/images/lakehouse_stack.png b/docs/src/images/lakehouse_stack.png index e34c999efd8..b3e4db709d6 100644 Binary files a/docs/src/images/lakehouse_stack.png and b/docs/src/images/lakehouse_stack.png differ