Fix typos, grammar, and consistency in Encryption, Contributing, and BinaryProtocolExtensions docs#578
Conversation
…BinaryProtocolExtensions docs
Encryption.md:
- Fix double-negative; align GCM invocation limit to NIST
- "Data PageHeader" -> "Data Page Header" (spacing consistency)
- Replace "allows to" with idiomatic English
- Fix smart quotes to ASCII for magic-bytes literal
- Remove double spaces; fix "the the FileMetaData"
- "explictly" -> "explicitly"
- Hyphenate compound adjectives ("2 byte short" -> "2-byte short")
- Fix section heading numbering ("## 5 File Format" -> "## 5. File Format")
- Fix mass noun article ("from a secret data" -> "from secret data")
CONTRIBUTING.md:
- Fix 7 typos: docuemnt, demostrate, interopability, libaries,
highlighed, compatiblity, an prototype
- Fix possessive: "features desirability" -> "a feature's desirability"
- Fix agreement: "an external dependencies" -> "an external dependency"
- Add commas after introductory clauses
- Fix comma splice -> semicolon
BinaryProtocolExtensions.md:
- Fix "FileMetadata" -> "FileMetaData" (4 occurrences; match thrift struct)
- Fix "Flatbuffers"/"flatbuffer" -> "FlatBuffers" (5 occurrences; official capitalization)
- Fix "implementers which" -> "implementers who" (people)
- Fix missing copula: "extension shared" -> "extension is shared"
etseidl
left a comment
There was a problem hiding this comment.
Just a few nits, thanks!
| * Extensions can be appended to existing Thrift serialized structs [without requiring Thrift libraries](#appending-extensions-to-thrift) for manipulation (or changes to the thrift IDL). | ||
|
|
||
| Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers which MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers. | ||
| Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers who MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers. |
There was a problem hiding this comment.
Gads this could use some line breaks 😅
There was a problem hiding this comment.
Done, broke the paragraph into shorter lines.
|
|
||
| 2. New encodings should be fully specified in this repository and not | ||
| rely on an external dependencies for implementation (i.e. `parquet-format` is | ||
| rely on an external dependency for implementation (i.e. `parquet-format` is |
There was a problem hiding this comment.
| rely on an external dependency for implementation (i.e. `parquet-format` is | |
| rely on external dependencies for implementation (i.e. `parquet-format` is |
| key shall not not be used for more than 2^31 (~2 billion) pages. In Parquet files encrypted with | ||
| multiple keys (footer and column keys), the constraint on the number of invocations is applied | ||
| to each key separately. | ||
| key shall not be used for more than 2^32 total module encryptions, as per the NIST specification. |
There was a problem hiding this comment.
Perhaps we should still point out that 2^32 modules means in practice 2^31 pages.
There was a problem hiding this comment.
Good point. Added a sentence clarifying that since each data page requires two module encryptions (header + data), 2^32 modules means in practice no more than 2^31 pages per key.
| * Extensions can be appended to existing Thrift serialized structs [without requiring Thrift libraries](#appending-extensions-to-thrift) for manipulation (or changes to the thrift IDL). | ||
|
|
||
| Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers which MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers. | ||
| Because only one field-id is reserved the extension bytes themselves require |
There was a problem hiding this comment.
Should we revert this change? Other lines do not seem to wrap lines like this.
There was a problem hiding this comment.
I think we should strive to reduce lines this long. A single word change in such a wrapped section of text makes for a difficult to see delta.
Summary
Fix typos, grammar, and formatting across ancillary specification documents: Encryption, Contributing guide, and Binary Protocol Extensions.
Changes
Encryption.md
CONTRIBUTING.md
BinaryProtocolExtensions.md
Validation
No semantic/behavioral changes to the format specification. All fixes are documentation-only.
Split from #572 for easier review.