Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ A TypeScript property test (`packages/docx-core/src/integration/lean-differentia
- render each generated `Doc` both to the Lean JSON encoding and, via a **`Doc`→`document.xml` adapter**, to a real OOXML `document.xml` string parseable by the engine's `@xmldom/xmldom` path;
- run the TS helpers in-process per case and spawn the Lean executable **once per memory-bounded chunk** of the batch;
- compare `accept`/`reject` outputs on a **canonical token projection** that both the Lean output `Doc` and the TS output XML reduce to deterministically (paragraph/run/wrapper/atom tokens in document order), and compare `validate` as a boolean, asserting strict per-case equality;
- assert the known out-of-subset model cases explicitly as fixed cases rather than hiding them: `fldChar` inside `del` (G1) and `delInstrText` outside `del` (G2) — now **closed** to agreement by `add-lean-deleted-field-code-constraint` — reject of an `ins`-only untracked-mark paragraph (G4) — now **closed** to agreement by the engine fidelity fix `make-reject-paragraph-collapse-mark-based` (mark-based reject) — and accept of an `ins`-wrappered collapsing paragraph (G3) — now **closed** to agreement by the Lean fidelity fix `broaden-lean-accept-keep-empty-paragraphs` (Lean `accept` broadened to keep empties) — plus the one remaining characterized gap, accept of a `del`-only untracked-mark paragraph (G5), a symmetric ENGINE accept-side over-deletion surfaced by broadening Lean `accept`;
- assert the known out-of-subset model cases explicitly as fixed cases rather than hiding them: `fldChar` inside `del` (G1) and `delInstrText` outside `del` (G2) — now **closed** to agreement by `add-lean-deleted-field-code-constraint` — reject of an `ins`-only untracked-mark paragraph (G4) — now **closed** to agreement by the engine fidelity fix `make-reject-paragraph-collapse-mark-based` (mark-based reject) — and accept of an `ins`-wrappered collapsing paragraph (G3) — now **closed** to agreement by the Lean fidelity fix `broaden-lean-accept-keep-empty-paragraphs` (Lean `accept` broadened to keep empties) — and accept of a `del`-only untracked-mark paragraph (G5) — now **closed** to agreement by the engine fidelity fix `make-accept-paragraph-collapse-mark-based` (mark-based accept, the symmetric mirror of the G4 reject fix);
- **skip** with a clear message when the Lean executable is absent (so a developer without the Lean toolchain still gets a green `npm test`), while CI builds the executable so the comparison runs there.

The harness SHALL assert **strict** agreement on the faithful subset by default; any in-subset divergence is a genuine finding, NOT a reason to weaken the assertion. The out-of-subset cases SHALL be asserted explicitly: G1/G2 as agreement (the DeletedFieldCode locality constraint is modeled), G4 as agreement (reject is now mark-based, an engine fidelity fix), G3 as agreement (Lean `accept` is now broadened to keep empties, a Lean fidelity fix), and the remaining gap G5 as a documented divergence (a characterization case — the symmetric accept-side engine over-deletion), forming the worklist for the next engine-fidelity increment. This requirement strengthens extensional-equivalence evidence between the existing Lean and TS helpers only; it introduces no production-engine change and modifies no proved Lean module.
The harness SHALL assert **strict** agreement on the faithful subset by default; any in-subset divergence is a genuine finding, NOT a reason to weaken the assertion. The out-of-subset cases SHALL be asserted explicitly: G1/G2 as agreement (the DeletedFieldCode locality constraint is modeled), G4 as agreement (reject is now mark-based, an engine fidelity fix), G3 as agreement (Lean `accept` is now broadened to keep empties, a Lean fidelity fix), and G5 as agreement (accept is now mark-based, an engine fidelity fix — the symmetric mirror of G4). With G5 closed, every characterized G-case (G1–G5) agrees. This requirement strengthens extensional-equivalence evidence between the existing Lean and TS helpers only; it introduces no production-engine change and modifies no proved Lean module.

#### Scenario: [LEAN-HELP-01] Compiled Lean accept/reject/validate match the TS engine on generated docs in the faithful subset

Expand Down Expand Up @@ -49,10 +49,10 @@ The harness SHALL assert **strict** agreement on the faithful subset by default;
- **WHEN** the harness runs the fixed [G4] `Doc` with an `ins`-only paragraph whose paragraph mark is untracked (no surviving content)
- **THEN** the Lean `reject` and the TS `rejectAllChanges` both keep the collapsed paragraph as an empty `<w:p>`, asserted via the token projection as agreement — the TS engine's reject is now purely mark-based (`make-reject-paragraph-collapse-mark-based`), matching the already-faithful Lean `reject`

#### Scenario: [LEAN-HELP-08] G5 — accept of a del-only paragraph is a characterized divergence
#### Scenario: [LEAN-HELP-08] G5 — accept of a del-only paragraph agrees (engine accept made mark-based)

- **WHEN** the harness runs the fixed [G5] `Doc` with a paragraph whose only content is a `w:del` wrapping deleted content and whose paragraph mark is untracked
- **THEN** the Lean `accept` keeps the collapsed paragraph as an empty `<w:p>` (faithful to LibreOffice/Word) while the TS `acceptAllChanges` drops it via a content-based heuristic, asserted via the token projection as a documented divergence — the symmetric accept-side analog of the pre-`make-reject-paragraph-collapse-mark-based` G4 engine over-deletion, surfaced by broadening Lean `accept`, with the TS accept-side mark-based fix as the deferred successor increment
- **THEN** the Lean `accept` and the TS `acceptAllChanges` both keep the collapsed paragraph as an empty `<w:p>`, asserted via the token projection as agreement — the TS engine's accept is now purely mark-based (`make-accept-paragraph-collapse-mark-based`), matching the already-faithful Lean `accept` (the symmetric accept-side analog of the G4 engine over-deletion fixed in `make-reject-paragraph-collapse-mark-based`)

#### Scenario: [LEAN-HELP-07] A real divergence is caught, not masked

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ and Lean `accept` SHALL now agree on a `w:ins`-wrappered collapsing paragraph. B
surface and the harness SHALL pin a new characterized divergence `G5`: a `del`-only untracked-mark paragraph
accepts to an empty `<w:p>` in Lean (faithful), but the TS `acceptAllChanges` over-deletes it via a
content-based heuristic — the symmetric accept-side analog of the reject over-deletion fixed in
`make-reject-paragraph-collapse-mark-based`, whose TS accept-side mark-based fix is the deferred successor
increment. This requirement modifies the proved Lean model only; it introduces no production-engine change.
`make-reject-paragraph-collapse-mark-based`. (That TS accept-side mark-based fix is subsequently delivered by
the successor change `make-accept-paragraph-collapse-mark-based`, closing G5 to agreement.) This requirement
modifies the proved Lean model only; it introduces no production-engine change.

#### Scenario: [ACCEPT-KEEP-01] Ins-wrappered collapsing paragraph is kept as an empty paragraph on accept

Expand All @@ -46,11 +47,11 @@ increment. This requirement modifies the proved Lean model only; it introduces n
- **WHEN** the Tier 2 modules are built
- **THEN** `field_structure_preserved_doc` and `inv_field_001` compile with no statement change, the spike is zero-`sorry`, and `#print axioms field_structure_preserved_doc` reports no new axioms

#### Scenario: [ACCEPT-KEEP-04] Broadening surfaces and pins the symmetric engine accept-side gap (G5)
#### Scenario: [ACCEPT-KEEP-04] Broadening surfaces the symmetric engine accept-side gap (G5), since closed

- **GIVEN** a `del`-only paragraph whose paragraph mark is untracked, followed by a surviving paragraph
- **WHEN** both engines accept it
- **THEN** the Lean `accept` keeps an empty `<w:p>` (faithful to LibreOffice/Word) while the TS `acceptAllChanges` drops it, asserted via the harness token projection as the characterized divergence `[LEAN-HELP-08]` (G5), the deferred engine accept-side fidelity fix
- **THEN** the Lean `accept` keeps an empty `<w:p>` (faithful to LibreOffice/Word); broadening `accept` surfaces and the harness pins this as `[LEAN-HELP-08]` (G5)the engine accept-side over-deletion that the successor change `make-accept-paragraph-collapse-mark-based` subsequently closes to agreement (TS `acceptAllChanges` made mark-based)

## MODIFIED Requirements

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Change: Make Accept All paragraph removal purely mark-based (closes G5, oracle-validated)

## Why

The Lean↔TS helper differential (`add-lean-ts-helper-differential-harness`) pinned **G5**: accepting a
paragraph whose only content is inside `w:del` (with an **untracked** paragraph mark) — the Lean `accept`
keeps an empty `<w:p>`, the TS `acceptAllChanges` drops the whole paragraph. G5 is the exact accept-side
mirror of **G4** (closed by `make-reject-paragraph-collapse-mark-based`): an **engine over-deletion**, not a
Lean-model gap.

- **LibreOffice / Word / OOXML semantics:** `<w:del>` inside `<w:pPr><w:rPr>` (PPR-DEL) marks the paragraph
**mark** as deleted. A run-level deletion under an *untracked* mark means text was deleted from a
**pre-existing** paragraph, so accepting the deletion restores that (empty) paragraph. Only when the mark
itself is `PPR-DEL` should accept remove the whole paragraph. LibreOffice (driven headless) keeps the empty
paragraph for `del`-only and `moveFrom`-only untracked-mark cases, and drops the `PPR-DEL`-marked control —
the same shape already confirmed for the reject side in `make-reject-paragraph-collapse-mark-based`.
- The Lean `accept`, broadened by `broaden-lean-accept-keep-empty-paragraphs` (G3, #340) to never drop, was
already faithful — it keeps the empty paragraph. G5 is the TS engine catching up to it.

The engine drops G5 via a content-based heuristic ("all content is inside `w:del`/`w:moveFrom`") on **both**
accept paths. `wrapParagraphAsDeleted` already always emits `PPR-DEL`, so making accept purely **mark-based**
(drop iff the paragraph mark is `PPR-DEL`) is Word/LibreOffice-faithful, keeps safe-docx's own delete→accept
round-trip safe, and closes G5 by fixing the engine — exactly symmetric to the #337 reject fix.

## What Changes

- `acceptAllChanges` (`packages/docx-core/src/baselines/atomizer/trackChangesAcceptorAst.ts`) and the
primitive `acceptChanges` (`packages/docx-core/src/primitives/accept_changes.ts`,
`paragraphHasOnlyRemovedContent`) SHALL remove a paragraph on accept **iff its paragraph mark is `PPR-DEL`**
— the content-based all-`w:del`/`w:moveFrom` drop heuristic is removed from **both** accept paths in
lockstep (they are used by different public APIs and SHALL agree).
- Flip the G5 case in `lean-differential-helpers.test.ts` (`[LEAN-HELP-08]`) from *characterized divergence*
to *agreement*; update the file header (G5 closed, no KNOWN gap remains). Add a targeted regression test
exercising **both** accept entry points on the four shape classes (PPR-DEL drop, del-only keep, moveFrom-only
keep, pPrChange-snapshot ignore), asserting the two paths agree.

## Impact

- Affected specs: `docx-comparison` (ADDED: one requirement; the pending helper-differential change's G5
scenario and the pending Lean-broaden change's G5 scenario are revised in place to agreement since neither
is archived).
- Affected code: `packages/docx-core/src/baselines/atomizer/trackChangesAcceptorAst.ts`,
`packages/docx-core/src/primitives/accept_changes.ts`, and tests
(`trackChangesAcceptorAst.test.ts`, `integration/lean-differential-helpers.test.ts`).
- **Behavior change** (accept of foreign / mark-omitting documents): a paragraph whose runs are deleted
under an untracked mark now survives accept as an empty paragraph instead of being dropped — matching
Word/LibreOffice. safe-docx's own deleted paragraphs always carry `PPR-DEL`, so their delete→accept
round-trip is unchanged (validated: full docx-core suite 1347 passed / 3 skipped, no regression).
- With G5 closed, every characterized G-case (G1–G5) of the Lean↔TS helper differential agrees; no KNOWN gap
remains in that harness.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
## ADDED Requirements

### Requirement: Accept All removes paragraphs by the paragraph-mark deletion marker, not by content

The production Accept All SHALL remove a paragraph during accept **if and only if** its paragraph mark is a
tracked deletion (`<w:pPr><w:rPr><w:del/>`, "PPR-DEL"). This applies to both accept entry points —
`acceptAllChanges` (`packages/docx-core/src/baselines/atomizer/trackChangesAcceptorAst.ts`) and the primitive
`acceptChanges` (`packages/docx-core/src/primitives/accept_changes.ts`) — which SHALL behave identically.
Accept SHALL NOT remove a paragraph based on a content heuristic (e.g. "all runs live inside
`w:del`/`w:moveFrom`"): a run-level deletion under an **untracked** paragraph mark denotes text deleted from a
pre-existing paragraph, which Microsoft Word and LibreOffice both keep as an empty paragraph on accept; a
content-based drop over-deletes it. The mark check SHALL be a strict direct-child traversal
(`w:p > w:pPr > w:rPr > w:del`), so a `w:del` nested inside a `w:pPrChange` snapshot is NOT mistaken for a
live paragraph mark.

`wrapParagraphAsDeleted` already always emits the `PPR-DEL` marker, so safe-docx's own genuinely-deleted
paragraphs remain removable by this mark-based rule without the content heuristic. This is the accept-side
mirror of `make-reject-paragraph-collapse-mark-based` (which made Reject All mark-based, closing G4). It
closes the characterized accept-side divergence `G5` recorded by the Lean↔TS helper differential as an
**engine fidelity fix** (the Lean `accept`, broadened by `broaden-lean-accept-keep-empty-paragraphs` to keep
empties, was already faithful): the TS and Lean `accept` SHALL now agree on a `del`-only untracked-mark
paragraph. With G5 closed, every characterized G-case (G1–G5) agrees between the genuine Lean helpers and the
production engine. This requirement changes production accept behavior only for foreign / mark-omitting input;
safe-docx's own deleted paragraphs always carry `PPR-DEL`, so their delete→accept round-trip is preserved.

#### Scenario: [ACCEPT-MARK-01] Del-only paragraph with an untracked mark survives accept as an empty paragraph

- **GIVEN** a paragraph whose only content is a `w:del`-wrapped run and whose paragraph mark is NOT `PPR-DEL` (text deleted from a pre-existing paragraph)
- **WHEN** it is run through `acceptAllChanges` (and through the primitive `acceptChanges`)
- **THEN** the deleted run is removed and the now-empty paragraph is kept, matching Word/LibreOffice and the Lean `accept` (the former `G5` divergence is now agreement)

#### Scenario: [ACCEPT-MARK-02] PPR-DEL-marked paragraph is removed by accept

- **GIVEN** a paragraph whose paragraph mark is a tracked deletion (`<w:pPr><w:rPr><w:del/>`)
- **WHEN** it is run through `acceptAllChanges` (and through the primitive `acceptChanges`)
- **THEN** the whole paragraph is removed, including its mark, matching Word/LibreOffice

#### Scenario: [ACCEPT-MARK-03] MoveFrom-only paragraph with an untracked mark survives accept as an empty paragraph

- **GIVEN** a paragraph whose only content is a `w:moveFrom`-wrapped run (the move source) and whose paragraph mark is untracked
- **WHEN** it is run through `acceptAllChanges` (and through the primitive `acceptChanges`)
- **THEN** the moved-away content is removed and the now-empty paragraph is kept (the symmetric `moveFrom` content heuristic is removed in lockstep with the `w:del` one)

#### Scenario: [ACCEPT-MARK-04] A pPrChange snapshot's nested w:del is not a live paragraph mark

- **GIVEN** a surviving paragraph whose `w:pPrChange` snapshot nests a `w:del` inside its stored `w:rPr`
- **WHEN** it is run through `acceptAllChanges` (and through the primitive `acceptChanges`)
- **THEN** the paragraph is kept (the strict direct-child mark check ignores the snapshot del), so both accept paths agree
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
## 1. Engine: mark-based accept (both paths, in lockstep)

- [x] 1.1 `acceptAllChanges` (`trackChangesAcceptorAst.ts`): remove the content-based all-`w:del`/`w:moveFrom`
paragraph-drop loops; keep only the `paragraphHasParaMarker(p, 'w:del')` mark-based drop. Replace the
stale lead comment with the full mark-based rationale (mirrors `rejectAllChanges`).
- [x] 1.2 `acceptChanges` (`primitives/accept_changes.ts`): remove `paragraphHasOnlyRemovedContent` and its
drop branch; keep only the `paragraphHasParaMarker(p, 'del')` mark-based drop.
- [x] 1.3 Remove the now-dead helpers/constants left behind (`containsRun`, `isWElement`,
`REMOVED_LOCALS`/`KEPT_LOCALS`/`RANGE_MARKER_LOCALS` in `accept_changes.ts`), grep-confirmed unreferenced.
`tsc --noEmit` clean.

## 2. Tests

- [x] 2.1 `lean-differential-helpers.test.ts`: flip `[LEAN-HELP-08]` (G5) from documented divergence to strict
agreement (both keep an empty `P[ ]`); update the file header (G5 closed, no KNOWN gap remains).
- [x] 2.2 `trackChangesAcceptorAst.test.ts`: add a targeted regression test running BOTH accept entry points
(`acceptAllChanges` + primitive `acceptChanges`) over four shapes — PPR-DEL drop, del-only keep,
moveFrom-only keep, pPrChange-snapshot ignore — asserting the two paths agree. Count `<w:p>` opens with
a regex that matches self-closing empties (`<w:p/>`).
- [x] 2.3 Full `@usejunior/docx-core` suite green (1347 passed / 3 skipped); helper differential green 8/8
with the exe.

## 3. Oracle validation (evidence)

- [x] 3.1 The accept-side keep-empty behavior is the LibreOffice/Word-faithful one, established by the same
headless-LibreOffice oracle run that grounded the #337 reject fix (an untracked paragraph mark is a
pre-existing paragraph; `del`-only and `moveFrom`-only collapse to an empty `<w:p>`, while the
`PPR-DEL`-marked control drops). This change ships as a self-validated engine fix via the differential
flip (`[LEAN-HELP-08]`) and the both-paths regression, exactly as #337 shipped its reject mirror without
a committed oracle voter; wiring the LibreOffice voter into the harness remains the deferred PR-B.

## 4. Specs / docs

- [x] 4.1 Add the `docx-comparison` ADDED requirement (this change) + scenarios `[ACCEPT-MARK-01..04]`.
- [x] 4.2 Revise the pending `add-lean-ts-helper-differential-harness` G5 scenario (`[LEAN-HELP-08]`) and the
pending `broaden-lean-accept-keep-empty-paragraphs` G5 scenario (`[ACCEPT-KEEP-04]`) from documented
divergence to agreement (closed by this change).
- [x] 4.3 Docs: `verification/ROADMAP.md` records G5 closed as an engine accept-side fidelity fix (the
accept-side mirror of G4/#337); all G1–G5 now agree. (`verification/lean/Tier2/README.md` needs no
change: it documents the Lean proofs, whose `accept`-keeps-every-paragraph note already reflects the
G3 broaden; G5 is a TS-engine fix, not a Lean-model change.)
- [ ] 4.4 Ship: peer-review (codex + agy), open PR, `/automerge-smoke`.
Loading
Loading