Skip to content

refactor/docs 2#25

Merged
leo-gan merged 5 commits intomasterfrom
refactor/docs-2
May 1, 2026
Merged

refactor/docs 2#25
leo-gan merged 5 commits intomasterfrom
refactor/docs-2

Conversation

@leo-gan
Copy link
Copy Markdown
Owner

@leo-gan leo-gan commented May 1, 2026

PR Description:

Documentation Restructure & Theory Content Expansion

Summary

Restructures documentation site architecture and adds comprehensive theory perspectives on serialization.

Changes

Documentation Architecture:

  • Moved theory content to dedicated docs/theory/ directory with new index
  • Renamed test_data_design.mdtest_data_configuration.md for clarity
  • Renamed languages-overview.mdserialization_categories.md
  • Updated all internal cross-references

New Theory Content:

  • data_science_perspective.md - Statistical analysis approach to serializer evaluation
  • engineer_perspective.md - Text vs binary vs schema-driven format trade-offs with historical context
  • historical_perspective.md - Evolution of serialization formats and key contributors

Updates:

  • Regenerated benchmark logs with latest run data
  • Updated python_tested_serializers.md with additional serializer details
  • Revised MkDocs navigation to reflect new structure

Testing

  • Full benchmark suite executed (100 reps, all serializers) - results verified
  • Documentation builds successfully with MkDocs
  • All internal links validated

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request restructures the documentation and adds comprehensive historical, engineering, and data science perspectives on serialization, while updating serializer references for C# and Python. The review feedback identifies several issues in the new content, such as accessibility concerns from using images for text, factual errors, and broken citations. Suggestions were also made to fix code block formatting, resolve a missing import in a Python example, and consolidate the new perspective documents to reduce redundancy and maintain consistent spelling and citation styles.

Comment thread docs/theory/historical_perspective.md Outdated
Comment thread README.md Outdated
Comment on lines +255 to +256
The following serializers are partially tested in the benchmark via the `Supports()` method.
That means they are not tested with all data types, but some of them are tested with specific data types.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explanation for 'Partially Tested Serializers' could be more precise. Stating that some are tested with specific data types is good, but it would be even clearer to mention that they might fail or be explicitly excluded from tests with certain data types (like those with circular references or requiring specific schema attributes), as detailed in the table below.

Comment thread docs/theory/data_science_perspective.md Outdated
Comment thread docs/theory/engineer_perspective.md
Comment on lines +120 to +122
- **Tim Bray and others (XML, 1996+):** As W3C editors, they defined XML, which became a foundation for data interchange before JSON’s rise (Citing directly: [26] notes Crockford’s JSON emerging as an alternative).

- **Brian Behlendorf, etc. (YAML):** The YAML editors listed on the spec include Evans, Ben-Kiki, Ingerson.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This section contains a couple of issues:

  • The citation [26] on line 120 is broken, as there is no corresponding entry in the references section.
  • On line 122, Brian Behlendorf is incorrectly listed as a YAML contributor. He is primarily known for his work on the Apache Web Server. The previous line correctly identifies the main YAML contributors.

@@ -0,0 +1,227 @@
# **The Architecture of Information: A Comprehensive History and Evolution of Data Serialization**

The fundamental challenge of distributed computing has always been the translation of abstract, multi-dimensional in-memory data structures into a linear, one-dimensional stream of bytes suitable for transmission across physical media or storage on non-volatile devices.1 This process, known as data serialization or marshalling, serves as the primary bridge between disparate hardware architectures, operating systems, and addressing mechanisms that characterize the global computing landscape.1 To understand the trajectory of serialization is to understand the history of software engineering itself—a persistent struggle to balance human readability with machine efficiency, flexibility with performance, and the organic growth of complex systems with the rigid requirements of network protocols.4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The citation style in this document is inconsistent with the other new documents and is hard to read. Citations are appended directly to words (e.g., devices.1), which can be confusing.

For better readability and consistency, please:

  1. Use the [1] format for citations, as seen in data_science_perspective.md.
  2. Add a space before the citation (e.g., devices [1]).

This formatting issue appears throughout the document.

Comment thread docs/theory/historical_perspective.md Outdated
Comment thread mkdocs.yml
Comment on lines +41 to +43
- Historical Perspective: theory/historical_perspective.md
- Data Science Perspective: theory/data_science_perspective.md
- Engineer Perspective: theory/engineer_perspective.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be significant content overlap between the three new 'perspective' documents (historical_perspective.md, data_science_perspective.md, engineer_perspective.md). The quality also varies, with data_science_perspective.md being the most comprehensive and polished, while the others contain formatting issues and errors.

Consider consolidating these into a single, high-quality document to avoid redundancy and provide a more focused reading experience for users. If different perspectives are desired, it might be better to structure them as sections within one file.

@leo-gan leo-gan merged commit 1ef5f82 into master May 1, 2026
4 checks passed
@leo-gan leo-gan deleted the refactor/docs-2 branch May 1, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant