Skip to content

Explore wpt.fyi-style cross-implementation conformance comparison (OOXML conformance subset only) #283

@stevenobiajulu

Description

@stevenobiajulu

Proposal (scoping, not implementation)

Today our traceability tests verify safe-docx's own behavior. They cannot tell us whether another OOXML library (python-docx, docx4j, Aspose.Words, etc.) implements the same OOXML behavior the same way. For the subset of our tests that are anchored to a spec (ECMA-376 / OOXML), we could run them against multiple implementations and surface a comparison matrix — analogous to wpt.fyi for the web platform.

This issue scopes the idea. It does NOT implement a runner — that would be at least one follow-up issue per language adapter.

What's in scope

A cross-implementation comparison harness is only meaningful for tests whose assertion is derivable from the OOXML spec, not from our internal algorithm. Examples:

  • In scope (spec-anchored): accept-insertions-by-unwrapping-w-ins-wrappers — the spec says <w:ins> wraps inserted content and "accepting" means unwrapping. Any compliant OOXML lib should produce the same post-accept XML given the same input XML.
  • In scope (spec-anchored): most text-matching / find-replace scenarios where the assertion is "the literal substring exists at offset N in the post-mutation document".
  • In scope (spec-anchored): schema-validation scenarios where the assertion is "the emitted document validates against wml.xsd".

What's out of scope

  • Out of scope (algorithm-anchored): identifiers-stable-across-reopens — our _bk_* naming is safe-docx-specific; python-docx has no insertParagraphBookmarks equivalent and running it would require porting our algorithm first.
  • ❌ Any test that depends on safe-docx-specific primitives (extractRevisions, insertParagraphStyleSource, our determinism guarantees).
  • Internal-only behavior characterization tests (e.g., the upcoming Add unit test for duplicate-paragraph collision/salt-loop in insertParagraphBookmarks #282 collision/salt test).

Open questions to resolve before implementation

  1. Scenario expression: how do we express each in-scope scenario in a language-neutral form? Candidates:
    • Pure XML input/output pairs with a small DSL for assertions (xpath: queries, xml-canonical-equals: comparisons).
    • JSON Schema-style assertion language.
    • Markdown-based with embedded XML snippets and predicates.
  2. Per-implementation adapter shape: each library would need an adapter binary that consumes the scenario's input XML, applies the equivalent operation, and emits the post-mutation XML (or raw bytes). Where do these adapter binaries live? In this repo (adapters/<lang>/) or in a separate repo?
  3. Spec anchor: each scenario must cite its ECMA-376 section so the assertion can be defended against the spec text, not against our implementation. Ties into the work in Establish structured OOXML/ECMA-376 spec traceability and compliance coverage #223 (structured spec traceability).
  4. Results storage + UI: wpt.fyi serves a matrix view; for us, the tests-renderer already renders our scenarios. Augment the per-scenario page with an "Other implementations" row, or build a separate matrix view?
  5. Maintenance overhead: each adapter is a non-trivial dependency. What's the policy on breakage when an upstream lib changes?

Initial milestones (rough)

  • M0 (this issue): agree on scope + scenario expression DSL. No code yet.
  • M1: pick 3 in-scope scenarios; express them in the chosen DSL.
  • M2: build the safe-docx adapter (read DSL → run scenario → emit canonical XML). Use it as a self-check (safe-docx vs safe-docx should always agree).
  • M3: build one second-language adapter (suggested: python-docx, easiest to install/CI-host).
  • M4: render a comparison row in tests-renderer for the 3 scenarios.

Each milestone is a separate issue. M0 → produces a design.md or OpenSpec proposal; M1-M4 → implementation issues.

Related

Why now / why not now

  • Why this is worth tracking: positions safe-docx as defensible-by-spec, not just defensible-by-tests. The same comparison matrix that helps us also helps others evaluate OOXML libs.
  • Why not implement yet: needs a scenario DSL decision (M0) and a clear OOXML-only scope to avoid spending budget on tests that can't be cross-validated.

Note: This is a scoping/discovery issue (analogous to an epic). Not eligible for direct /codex-implement; subdivide into M0..M4 issues first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions