Explore wpt.fyi-style cross-implementation conformance comparison (OOXML conformance subset only)

## Proposal (scoping, not implementation)

Today our traceability tests verify safe-docx's own behavior. They cannot tell us whether another OOXML library (python-docx, docx4j, Aspose.Words, etc.) implements the same OOXML behavior the same way. For the subset of our tests that are anchored to a spec (ECMA-376 / OOXML), we could run them against multiple implementations and surface a comparison matrix — analogous to [wpt.fyi](https://wpt.fyi/results/) for the web platform.

This issue scopes the idea. It does NOT implement a runner — that would be at least one follow-up issue per language adapter.

### What's in scope

A cross-implementation comparison harness is only meaningful for tests whose **assertion is derivable from the OOXML spec**, not from our internal algorithm. Examples:

- ✅ **In scope (spec-anchored)**: `accept-insertions-by-unwrapping-w-ins-wrappers` — the spec says `<w:ins>` wraps inserted content and "accepting" means unwrapping. Any compliant OOXML lib should produce the same post-accept XML given the same input XML.
- ✅ **In scope (spec-anchored)**: most text-matching / find-replace scenarios where the assertion is "the literal substring exists at offset N in the post-mutation document".
- ✅ **In scope (spec-anchored)**: schema-validation scenarios where the assertion is "the emitted document validates against `wml.xsd`".

### What's out of scope

- ❌ **Out of scope (algorithm-anchored)**: `identifiers-stable-across-reopens` — our `_bk_*` naming is safe-docx-specific; python-docx has no `insertParagraphBookmarks` equivalent and running it would require porting our algorithm first.
- ❌ Any test that depends on safe-docx-specific primitives (`extractRevisions`, `insertParagraphStyleSource`, our determinism guarantees).
- Internal-only behavior characterization tests (e.g., the upcoming #282 collision/salt test).

### Open questions to resolve before implementation

1. **Scenario expression**: how do we express each in-scope scenario in a language-neutral form? Candidates:
   - Pure XML input/output pairs with a small DSL for assertions (`xpath:` queries, `xml-canonical-equals:` comparisons).
   - JSON Schema-style assertion language.
   - Markdown-based with embedded XML snippets and predicates.
2. **Per-implementation adapter shape**: each library would need an adapter binary that consumes the scenario's input XML, applies the equivalent operation, and emits the post-mutation XML (or raw bytes). Where do these adapter binaries live? In this repo (`adapters/<lang>/`) or in a separate repo?
3. **Spec anchor**: each scenario must cite its ECMA-376 section so the assertion can be defended against the spec text, not against our implementation. Ties into the work in #223 (structured spec traceability).
4. **Results storage + UI**: wpt.fyi serves a matrix view; for us, the tests-renderer already renders our scenarios. Augment the per-scenario page with an "Other implementations" row, or build a separate matrix view?
5. **Maintenance overhead**: each adapter is a non-trivial dependency. What's the policy on breakage when an upstream lib changes?

### Initial milestones (rough)

- M0 (this issue): agree on scope + scenario expression DSL. No code yet.
- M1: pick 3 in-scope scenarios; express them in the chosen DSL.
- M2: build the safe-docx adapter (read DSL → run scenario → emit canonical XML). Use it as a self-check (safe-docx vs safe-docx should always agree).
- M3: build one second-language adapter (suggested: python-docx, easiest to install/CI-host).
- M4: render a comparison row in tests-renderer for the 3 scenarios.

Each milestone is a separate issue. M0 → produces a `design.md` or OpenSpec proposal; M1-M4 → implementation issues.

### Related

- #223 — Structured OOXML/ECMA-376 spec traceability. Cross-app comparison only makes sense for scenarios that have a clear spec anchor; #223's work is a prerequisite for a defensible comparison.
- #214 — ECMA-376 schema validation as a CI gate. The schema-validation flavor of cross-impl comparison overlaps with #214.

### Why now / why not now

- **Why this is worth tracking**: positions safe-docx as defensible-by-spec, not just defensible-by-tests. The same comparison matrix that helps us also helps others evaluate OOXML libs.
- **Why not implement yet**: needs a scenario DSL decision (M0) and a clear OOXML-only scope to avoid spending budget on tests that can't be cross-validated.

> Note: This is a scoping/discovery issue (analogous to an `epic`). Not eligible for direct `/codex-implement`; subdivide into M0..M4 issues first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore wpt.fyi-style cross-implementation conformance comparison (OOXML conformance subset only) #283

Proposal (scoping, not implementation)

What's in scope

What's out of scope

Open questions to resolve before implementation

Initial milestones (rough)

Related

Why now / why not now

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Explore wpt.fyi-style cross-implementation conformance comparison (OOXML conformance subset only) #283

Description

Proposal (scoping, not implementation)

What's in scope

What's out of scope

Open questions to resolve before implementation

Initial milestones (rough)

Related

Why now / why not now

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions