diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift index fe259d61..040ffc15 100644 --- a/src/main/thrift/parquet.thrift +++ b/src/main/thrift/parquet.thrift @@ -1363,16 +1363,38 @@ union EncryptionAlgorithm { * Description for file metadata */ struct FileMetaData { - /** Version of this file + /** Major Parquet Format Version * - * As of December 2025, there is no agreed upon consensus of what constitutes - * version 2 of the file. For maximum compatibility with readers, writers should - * always populate "1" for version. For maximum compatibility with writers, - * readers should accept "1" and "2" interchangeably. All other versions are - * reserved for potential future use-cases. + * This corresponds to the highest major version of the parquet-format whose + * features the file uses. For example, if a file contains features from parquet-format + * version 2.4, then this field should be set to "2". + * + * Prior to 2026, some readers supported features added in version 2.0 and + * greater, but would reject files with the version set to 2. It was common + * practice for writers to populate "1" for version even if they used version + * 2.0. + * + * For maximum compatibility with writers, readers should accept "1" and "2" + * interchangeably. All other versions are reserved for potential future + * use-cases. */ 1: required i32 version + /** Minor Parquet Format Version + * + * This corresponds to the highest minor version of the parquet-format whose + * features the file uses. For example, if a file contains features from + * parquet-format version 2.4, then this field should be set to "4". + * + * Note that Parquet does not follow semantic versioning and new + * forward-incompatible features, such as new encodings, can be added in + * minor versions. See the documentation[1] for more details on the versioning + * scheme and the features added in each version. + * + * [1]: http://parquet.apache.org/docs/file-format/versions + */ + 10: optional i32 minor_version + /** Parquet schema for this file. This schema contains metadata for all the columns. * The schema is represented as a tree with a single root. The nodes of the tree * are flattened to a list by doing a depth-first traversal.