You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today our traceability tests verify safe-docx's own behavior. They cannot tell us whether another OOXML library (python-docx, docx4j, Aspose.Words, etc.) implements the same OOXML behavior the same way. For the subset of our tests that are anchored to a spec (ECMA-376 / OOXML), we could run them against multiple implementations and surface a comparison matrix — analogous to wpt.fyi for the web platform.
This issue scopes the idea. It does NOT implement a runner — that would be at least one follow-up issue per language adapter.
What's in scope
A cross-implementation comparison harness is only meaningful for tests whose assertion is derivable from the OOXML spec, not from our internal algorithm. Examples:
✅ In scope (spec-anchored): accept-insertions-by-unwrapping-w-ins-wrappers — the spec says <w:ins> wraps inserted content and "accepting" means unwrapping. Any compliant OOXML lib should produce the same post-accept XML given the same input XML.
✅ In scope (spec-anchored): most text-matching / find-replace scenarios where the assertion is "the literal substring exists at offset N in the post-mutation document".
✅ In scope (spec-anchored): schema-validation scenarios where the assertion is "the emitted document validates against wml.xsd".
What's out of scope
❌ Out of scope (algorithm-anchored): identifiers-stable-across-reopens — our _bk_* naming is safe-docx-specific; python-docx has no insertParagraphBookmarks equivalent and running it would require porting our algorithm first.
❌ Any test that depends on safe-docx-specific primitives (extractRevisions, insertParagraphStyleSource, our determinism guarantees).
Scenario expression: how do we express each in-scope scenario in a language-neutral form? Candidates:
Pure XML input/output pairs with a small DSL for assertions (xpath: queries, xml-canonical-equals: comparisons).
JSON Schema-style assertion language.
Markdown-based with embedded XML snippets and predicates.
Per-implementation adapter shape: each library would need an adapter binary that consumes the scenario's input XML, applies the equivalent operation, and emits the post-mutation XML (or raw bytes). Where do these adapter binaries live? In this repo (adapters/<lang>/) or in a separate repo?
Results storage + UI: wpt.fyi serves a matrix view; for us, the tests-renderer already renders our scenarios. Augment the per-scenario page with an "Other implementations" row, or build a separate matrix view?
Maintenance overhead: each adapter is a non-trivial dependency. What's the policy on breakage when an upstream lib changes?
Initial milestones (rough)
M0 (this issue): agree on scope + scenario expression DSL. No code yet.
M1: pick 3 in-scope scenarios; express them in the chosen DSL.
M2: build the safe-docx adapter (read DSL → run scenario → emit canonical XML). Use it as a self-check (safe-docx vs safe-docx should always agree).
M3: build one second-language adapter (suggested: python-docx, easiest to install/CI-host).
M4: render a comparison row in tests-renderer for the 3 scenarios.
Each milestone is a separate issue. M0 → produces a design.md or OpenSpec proposal; M1-M4 → implementation issues.
Why this is worth tracking: positions safe-docx as defensible-by-spec, not just defensible-by-tests. The same comparison matrix that helps us also helps others evaluate OOXML libs.
Why not implement yet: needs a scenario DSL decision (M0) and a clear OOXML-only scope to avoid spending budget on tests that can't be cross-validated.
Note: This is a scoping/discovery issue (analogous to an epic). Not eligible for direct /codex-implement; subdivide into M0..M4 issues first.
Proposal (scoping, not implementation)
Today our traceability tests verify safe-docx's own behavior. They cannot tell us whether another OOXML library (python-docx, docx4j, Aspose.Words, etc.) implements the same OOXML behavior the same way. For the subset of our tests that are anchored to a spec (ECMA-376 / OOXML), we could run them against multiple implementations and surface a comparison matrix — analogous to wpt.fyi for the web platform.
This issue scopes the idea. It does NOT implement a runner — that would be at least one follow-up issue per language adapter.
What's in scope
A cross-implementation comparison harness is only meaningful for tests whose assertion is derivable from the OOXML spec, not from our internal algorithm. Examples:
accept-insertions-by-unwrapping-w-ins-wrappers— the spec says<w:ins>wraps inserted content and "accepting" means unwrapping. Any compliant OOXML lib should produce the same post-accept XML given the same input XML.wml.xsd".What's out of scope
identifiers-stable-across-reopens— our_bk_*naming is safe-docx-specific; python-docx has noinsertParagraphBookmarksequivalent and running it would require porting our algorithm first.extractRevisions,insertParagraphStyleSource, our determinism guarantees).Open questions to resolve before implementation
xpath:queries,xml-canonical-equals:comparisons).adapters/<lang>/) or in a separate repo?Initial milestones (rough)
Each milestone is a separate issue. M0 → produces a
design.mdor OpenSpec proposal; M1-M4 → implementation issues.Related
Why now / why not now