Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,17 @@

# Parquet [![Build Status](https://github.com/apache/parquet-format/actions/workflows/test.yml/badge.svg)](https://github.com/apache/parquet-format/actions)

Parquet is a columnar storage format that supports nested data.
This repository contains the specification for [Apache Parquet] and
[Apache Thrift] definitions to read and write Parquet metadata.

Parquet metadata is encoded using Apache Thrift.
Apache Parquet is an open source, column-oriented data file format

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same wording from apache/parquet-site#59

designed for efficient data storage and retrieval. It provides high
performance compression and encoding schemes to handle complex data in
bulk and is supported in many programming language and analytics
tools.

The `Parquet-format` project contains all Thrift definitions that are necessary to create readers
and writers for Parquet files.
[Apache Parquet]: https://parquet.apache.org
[Apache Thrift]: https://thrift.apache.org

## Motivation

Expand Down Expand Up @@ -176,7 +181,7 @@ following rules:
* If the min is +0, the row group may contain -0 values as well.
* If the max is -0, the row group may contain +0 values as well.
* When looking for NaN values, min and max should be ignored.

* BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY - Lexicographic unsigned byte-wise
comparison.

Expand Down