Conversation
There was a problem hiding this comment.
Code Review
This pull request restructures the documentation and adds comprehensive historical, engineering, and data science perspectives on serialization, while updating serializer references for C# and Python. The review feedback identifies several issues in the new content, such as accessibility concerns from using images for text, factual errors, and broken citations. Suggestions were also made to fix code block formatting, resolve a missing import in a Python example, and consolidate the new perspective documents to reduce redundancy and maintain consistent spelling and citation styles.
| The following serializers are partially tested in the benchmark via the `Supports()` method. | ||
| That means they are not tested with all data types, but some of them are tested with specific data types. |
There was a problem hiding this comment.
The explanation for 'Partially Tested Serializers' could be more precise. Stating that some are tested with specific data types is good, but it would be even clearer to mention that they might fail or be explicitly excluded from tests with certain data types (like those with circular references or requiring specific schema attributes), as detailed in the table below.
| - **Tim Bray and others (XML, 1996+):** As W3C editors, they defined XML, which became a foundation for data interchange before JSON’s rise (Citing directly: [26] notes Crockford’s JSON emerging as an alternative). | ||
|
|
||
| - **Brian Behlendorf, etc. (YAML):** The YAML editors listed on the spec include Evans, Ben-Kiki, Ingerson. |
There was a problem hiding this comment.
This section contains a couple of issues:
- The citation
[26]on line 120 is broken, as there is no corresponding entry in the references section. - On line 122, Brian Behlendorf is incorrectly listed as a YAML contributor. He is primarily known for his work on the Apache Web Server. The previous line correctly identifies the main YAML contributors.
| @@ -0,0 +1,227 @@ | |||
| # **The Architecture of Information: A Comprehensive History and Evolution of Data Serialization** | |||
|
|
|||
| The fundamental challenge of distributed computing has always been the translation of abstract, multi-dimensional in-memory data structures into a linear, one-dimensional stream of bytes suitable for transmission across physical media or storage on non-volatile devices.1 This process, known as data serialization or marshalling, serves as the primary bridge between disparate hardware architectures, operating systems, and addressing mechanisms that characterize the global computing landscape.1 To understand the trajectory of serialization is to understand the history of software engineering itself—a persistent struggle to balance human readability with machine efficiency, flexibility with performance, and the organic growth of complex systems with the rigid requirements of network protocols.4 | |||
There was a problem hiding this comment.
The citation style in this document is inconsistent with the other new documents and is hard to read. Citations are appended directly to words (e.g., devices.1), which can be confusing.
For better readability and consistency, please:
- Use the
[1]format for citations, as seen indata_science_perspective.md. - Add a space before the citation (e.g.,
devices [1]).
This formatting issue appears throughout the document.
| - Historical Perspective: theory/historical_perspective.md | ||
| - Data Science Perspective: theory/data_science_perspective.md | ||
| - Engineer Perspective: theory/engineer_perspective.md |
There was a problem hiding this comment.
There appears to be significant content overlap between the three new 'perspective' documents (historical_perspective.md, data_science_perspective.md, engineer_perspective.md). The quality also varies, with data_science_perspective.md being the most comprehensive and polished, while the others contain formatting issues and errors.
Consider consolidating these into a single, high-quality document to avoid redundancy and provide a more focused reading experience for users. If different perspectives are desired, it might be better to structure them as sections within one file.
PR Description:
Documentation Restructure & Theory Content Expansion
Summary
Restructures documentation site architecture and adds comprehensive theory perspectives on serialization.
Changes
Documentation Architecture:
docs/theory/directory with new indextest_data_design.md→test_data_configuration.mdfor claritylanguages-overview.md→serialization_categories.mdNew Theory Content:
data_science_perspective.md- Statistical analysis approach to serializer evaluationhistorical_perspective.md- Evolution of serialization formats and key contributorsUpdates:
python_tested_serializers.mdwith additional serializer detailsTesting