Skip to content

Fix typos, grammar, and consistency in Encryption, Contributing, and BinaryProtocolExtensions docs#578

Merged
wgtmac merged 2 commits into
apache:masterfrom
iemejia:fix/encryption-meta-docs
Jun 8, 2026
Merged

Fix typos, grammar, and consistency in Encryption, Contributing, and BinaryProtocolExtensions docs#578
wgtmac merged 2 commits into
apache:masterfrom
iemejia:fix/encryption-meta-docs

Conversation

@iemejia

@iemejia iemejia commented Jun 2, 2026

Copy link
Copy Markdown
Member

Summary

Fix typos, grammar, and formatting across ancillary specification documents: Encryption, Contributing guide, and Binary Protocol Extensions.

Changes

Encryption.md

  • Fix double-negative; align GCM invocation limit to NIST
  • "Data PageHeader" -> "Data Page Header" (spacing consistency)
  • Replace "allows to" with idiomatic English
  • Fix smart quotes to ASCII for magic-bytes literal
  • Remove double spaces; fix "the the FileMetaData"
  • "explictly" -> "explicitly"
  • Hyphenate compound adjectives ("2 byte short" -> "2-byte short")
  • Fix section heading numbering ("## 5 File Format" -> "## 5. File Format")
  • Fix mass noun article ("from a secret data" -> "from secret data")

CONTRIBUTING.md

  • Fix 7 typos: docuemnt, demostrate, interopability, libaries, highlighed, compatiblity, an prototype
  • Fix possessive: "features desirability" -> "a feature's desirability"
  • Fix agreement: "an external dependencies" -> "an external dependency"
  • Add commas after introductory clauses
  • Fix comma splice -> semicolon

BinaryProtocolExtensions.md

  • Fix "FileMetadata" -> "FileMetaData" (4 occurrences; match thrift struct)
  • Fix "Flatbuffers"/"flatbuffer" -> "FlatBuffers" (5 occurrences; official capitalization)
  • Fix "implementers which" -> "implementers who" (people)
  • Fix missing copula: "extension shared" -> "extension is shared"

Validation

No semantic/behavioral changes to the format specification. All fixes are documentation-only.

Split from #572 for easier review.

…BinaryProtocolExtensions docs

Encryption.md:
- Fix double-negative; align GCM invocation limit to NIST
- "Data PageHeader" -> "Data Page Header" (spacing consistency)
- Replace "allows to" with idiomatic English
- Fix smart quotes to ASCII for magic-bytes literal
- Remove double spaces; fix "the the FileMetaData"
- "explictly" -> "explicitly"
- Hyphenate compound adjectives ("2 byte short" -> "2-byte short")
- Fix section heading numbering ("## 5 File Format" -> "## 5. File Format")
- Fix mass noun article ("from a secret data" -> "from secret data")

CONTRIBUTING.md:
- Fix 7 typos: docuemnt, demostrate, interopability, libaries,
  highlighed, compatiblity, an prototype
- Fix possessive: "features desirability" -> "a feature's desirability"
- Fix agreement: "an external dependencies" -> "an external dependency"
- Add commas after introductory clauses
- Fix comma splice -> semicolon

BinaryProtocolExtensions.md:
- Fix "FileMetadata" -> "FileMetaData" (4 occurrences; match thrift struct)
- Fix "Flatbuffers"/"flatbuffer" -> "FlatBuffers" (5 occurrences; official capitalization)
- Fix "implementers which" -> "implementers who" (people)
- Fix missing copula: "extension shared" -> "extension is shared"

@etseidl etseidl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits, thanks!

Comment thread BinaryProtocolExtensions.md Outdated
* Extensions can be appended to existing Thrift serialized structs [without requiring Thrift libraries](#appending-extensions-to-thrift) for manipulation (or changes to the thrift IDL).

Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers which MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers.
Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers who MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gads this could use some line breaks 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, broke the paragraph into shorter lines.

Comment thread CONTRIBUTING.md Outdated

2. New encodings should be fully specified in this repository and not
rely on an external dependencies for implementation (i.e. `parquet-format` is
rely on an external dependency for implementation (i.e. `parquet-format` is

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rely on an external dependency for implementation (i.e. `parquet-format` is
rely on external dependencies for implementation (i.e. `parquet-format` is

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied, thanks!

Comment thread Encryption.md
key shall not not be used for more than 2^31 (~2 billion) pages. In Parquet files encrypted with
multiple keys (footer and column keys), the constraint on the number of invocations is applied
to each key separately.
key shall not be used for more than 2^32 total module encryptions, as per the NIST specification.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should still point out that 2^32 modules means in practice 2^31 pages.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added a sentence clarifying that since each data page requires two module encryptions (header + data), 2^32 modules means in practice no more than 2^31 pages per key.

@etseidl etseidl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

* Extensions can be appended to existing Thrift serialized structs [without requiring Thrift libraries](#appending-extensions-to-thrift) for manipulation (or changes to the thrift IDL).

Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers which MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers.
Because only one field-id is reserved the extension bytes themselves require

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we revert this change? Other lines do not seem to wrap lines like this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should strive to reduce lines this long. A single word change in such a wrapped section of text makes for a difficult to see delta.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@wgtmac wgtmac merged commit 6be6b91 into apache:master Jun 8, 2026
4 checks passed
@iemejia iemejia deleted the fix/encryption-meta-docs branch June 8, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants