Skip to content

feat(docx-mcp): ODF comments — add_comment + get_comments for .odt (Phase 2b-1)#341

Merged
stevenobiajulu merged 2 commits into
mainfrom
odf-comments-20260606
Jun 7, 2026
Merged

feat(docx-mcp): ODF comments — add_comment + get_comments for .odt (Phase 2b-1)#341
stevenobiajulu merged 2 commits into
mainfrom
odf-comments-20260606

Conversation

@stevenobiajulu
Copy link
Copy Markdown
Member

Summary

Extends the provider-aware ODF (.odt) lane with comments — the lighter half of Phase 2b — backed by inline office:annotation markup in content.xml (no separate comments part, rels, or commentsExtended.xml). Mirrors the DOCX add_comment/get_comments param + response shapes so agents get the same contract across providers. compare_documents (the tracked-changes atomizer) stays deferred to a follow-up PR.

What changed

odf-core

  • Add the DC namespace; extract Segment/buildSegments into a shared shared/odf/text_segments.ts that skips office:annotation / office:annotation-end subtrees so a comment body never leaks into the host paragraph's visible text nor registers as a phantom block (collectBlocks skips them too).
  • New comments.ts: whole-paragraph anchoring via structural insertion (annotation as first inline child, end after the last — independent of text segmentation) and ranged (anchor_text) anchoring via a single text-node split; cross-node ranges return MATCH_SPANS_MULTIPLE_NODES. office:name ids are allocated by scanning all existing annotation names.
  • OdfDocument.addComment / getComments delegating to comments.ts; export OdfComment.

docx-mcp

  • tools/odf/add_comment.ts + get_comments.ts mirroring the DOCX shapes. Replies (parent_comment_id) on a .odt return UNSUPPORTED_FOR_ODF (ODF has no first-class reply graph — deferred).
  • Add both tools to ODF_SUPPORTED_TOOLS, update the guard + resolver hint strings, register handlers + isOdfRequest dispatch branches, refresh tool_catalog.ts + the generated tool docs.
  • Re-point the sibling add-odf-grep-insert OPLR-08 guard tests and spec wording at compare_documents, now that add_comment is supported.

OpenSpec — new add-odf-comments change (mcp-server OPCM-01..05, odf-core OANN-01..05).

Design review

Reviewed before building (Codex, read-only dynamic). Two BLOCKERs were caught, independently reproduced, and fixed here: (B1) the annotation body leaking into the paragraph stream, and (B2) whole-paragraph anchoring needing a structural path separate from the single-text-node contract. SHOULDs adopted: author stays required; office:name allocation scans all existing names; spec wording amended (not just tests).

Verification

  • Local gates green: build, lint:workspaces (0 errors), check:cycles, check:release-isolation, allure label/filename/quality (0 errors), openspec validate --strict, check:spec-coverage --strict, check:tool-docs.
  • Tests: odf-core 37/37, docx-mcp 827/827. Coverage ratchet improved (docx-mcp +1.24% lines / +0.93% branches).
  • Document-shaped smoke on a real NVCA .odt (source.docx.odt via soffice): whole-paragraph + ranged comments via dispatchToolCall, get_comments returns them with author/date/anchor, persisted through save→reopen, and LibreOffice round-trips the annotation to a native comment (author "Jane Doe", body preserved). Guards intact: replies and compare_documents on .odt still return UNSUPPORTED_FOR_ODF.

Out of scope

compare_documents (Phase 2b-2), comment replies/threads, delete_comment for ODF, .ods/.odp, durable injected xml:id anchors.

…hase 2b-1)

Extend the provider-aware ODF (.odt) lane with comments, backed by inline
office:annotation markup in content.xml (no separate comments part).

odf-core:
- Add the DC namespace; extract Segment/buildSegments into a shared
  text_segments module that skips office:annotation / office:annotation-end
  subtrees so a comment body never leaks into the host paragraph's visible
  text nor registers as a phantom block (collectBlocks skips them too).
- New comments module: whole-paragraph anchoring via structural insertion
  (independent of text segmentation) and ranged anchoring via a single
  text-node split (cross-node ranges return MATCH_SPANS_MULTIPLE_NODES);
  office:name ids allocated by scanning all existing annotation names.
- OdfDocument.addComment / getComments delegating to the comments module.

docx-mcp:
- tools/odf/add_comment + get_comments mirroring the DOCX param/response
  shapes; replies (parent_comment_id) on .odt return UNSUPPORTED_FOR_ODF.
- add the tools to ODF_SUPPORTED_TOOLS, update the guard + resolver hints,
  register handlers + isOdfRequest dispatch branches, and refresh the tool
  catalog + generated docs.
- Re-point the add-odf-grep-insert OPLR-08 guard tests (and spec wording)
  at compare_documents, now that add_comment is supported.

New OpenSpec change add-odf-comments (mcp-server OPCM-01..05,
odf-core OANN-01..05). Verified against a real NVCA .odt: comments persist
through save/reopen and LibreOffice reads them as native comments.
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
site Ready Ready Preview, Comment Jun 7, 2026 9:07am

Request Review

@github-actions github-actions Bot added the feat label Jun 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

LLM-Based Quality Gate

Overall: ✅ PASS (5 pass · 0 warn · 9 skipped · 14 total)

Check Verdict
⏭️ SKIPPED read_file response metadata parity paths not touched by this PR
⏭️ SKIPPED Live DOM namespace-safe OOXML writes paths not touched by this PR
Deleted field markup keeps w:fldChar outside w:del The PR does not touch field atomization, validateFieldStructure, hasFldCharInsideDel, or any related field markup or logic, as it is entirely focused on adding comment support for ODF (.odt) documents.
⏭️ SKIPPED Field validation per story, not global paths not touched by this PR
⏭️ SKIPPED Revision IDs seeded from all revision-bearing side parts paths not touched by this PR
Accept/reject sweep side parts and caches The PR does not touch DocxDocument.acceptChanges, DocxDocument.rejectChanges, REVISION_STORY_PART_PATHS, accept_changes, reject_changes, or side-part revision markup, as it focuses entirely on adding ODF (.odt) comment support.
⏭️ SKIPPED DocumentViewNode.heading stays canonical paths not touched by this PR
AI-author parity across entry points The PR touches packages/docx-mcp/src/server.ts to add ODF handlers but adds no new SessionManager construction. All existing entry points (packages/docx-mcp/src/server.ts:249-252 and packages/docx-mcp/src/cli/tool_runner.ts:17-27) correctly resolve SAFE_DOCX_AI_AUTHOR with the identical three-way semantics.
⏭️ SKIPPED Property-change wrapper discipline paths not touched by this PR
⏭️ SKIPPED SUPPORT.md Table A drift vs. implementation paths not touched by this PR
⏭️ SKIPPED Table A / Table B boundary on side-part revisions paths not touched by this PR
Canonical-emission surface completeness The PR only introduces ODF (.odt) comment tools under packages/docx-mcp/src/tools/odf/ which do not use the OOXML tracked-changes revision-emission surface documented in Table A.
⏭️ SKIPPED Lean predicate drift against engine semantics (asymmetric) paths not touched by this PR
Unit-test quality (avoid tautological / change-detector tests) The tests added in packages/docx-mcp/src/tools/odf/odf_comments.test.ts:54 and packages/odf-core/src/comments.test.ts:18 verify comments behavior with explicit expectations from first principles, utilizing filesystem sandboxes for boundaries without any SUT mocking.
Full checklist questions
  1. read_file response metadata parity: If this PR touches packages/docx-mcp/src/tools/read_file.ts, budgeted pagination returns, or additive response metadata like warnings / comment_load_error, do every successful return path (default budgeted early return, non-budget fallthrough, explicit limit/node_ids) preserve the same additive diagnostic fields? read_file has multiple success exits; diagnostics have already disappeared on one path before. Reference: fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180 surfaced comment_load_error, fix(docx-mcp): warn when read_file budget is exceeded by a single node (closes #184) #186 added an early budget return + warnings, fix(docx-mcp): surface comment_load_error on the default budgeted read path (closes #189) #191 fixed the missing comment_load_error on the default budgeted path.

  2. Live DOM namespace-safe OOXML writes: If this PR touches packages/docx-core/src/primitives/comments.ts or writes prefixed OOXML attributes/elements (w14:*, w15:*, xmlns:*, comments.xml, commentsExtended.xml, people.xml), are prefixed OOXML names written with namespace-aware APIs — root aliases bound with setAttributeNS(XMLNS_NS, ...), prefixed attributes with setAttributeNS(W14_NS/W15_NS, ...), and is there a test that proves the live DOM works before serialization/reparse? String-prefixed attributes can serialize plausibly while the live DOM still throws namespace errors. Reference: fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180 (xmlns:w14/w15 declared on comments root before writing prefixed attrs).

  3. Deleted field markup keeps w:fldChar outside w:del: If this PR touches field atomization, validateFieldStructure, hasFldCharInsideDel, w:fldChar, w:instrText, w:delInstrText, or collapsed field comparison logic, does deleted field output stay ECMA-376-conformant — w:fldChar sibling-level (never inside w:del), deleted instructions use w:delInstrText only inside valid delete wrappers, accept/reject safety checks still reject malformed combined output? Word treats deleted field-state markup in the wrong container as document-corrupting. References: fix(docx-core): validate w:delInstrText placement and reject w:fldChar inside <w:del> #211, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228.

  4. Field validation per story, not global: If this PR touches packages/docx-core/src/baselines/atomizer/pipeline.ts, splitStories, validateFieldStructure, side-part merge logic, or footnote/endnote field handling, is field validation run independently per ECMA story (document.xml, each footnote, each endnote), with sidecars from both original and revised archives considered, and global counter balance not treated as sufficient? A document can be globally balanced but have an invalid field sequence inside one story. References: fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228, feat(docx-core): sweep side-part revisions on accept/reject #218.

  5. Revision IDs seeded from all revision-bearing side parts: If this PR touches packages/docx-mcp/src/session/manager.ts (especially getRevisionContextForSession or FIXED_REVISION_ID_SEED_PARTS), createRevisionContext, revision-ID allocation, or MCP tools that create tracked changes/comments/footnotes, does revision-ID allocation scan all relevant package parts before issuing new IDs — comments, footnotes, endnotes, glossary, headers, footers — ignore non-revision w:id values (comment IDs, bookmarks), and handle malformed optional parts gracefully? Revision IDs are package-wide; document-only seeding collides with existing side-part revisions. Reference: fix(docx-mcp): seed revision ids from side parts #216 (seed revision ids from side parts).

  6. Accept/reject sweep side parts and caches: If this PR touches DocxDocument.acceptChanges, DocxDocument.rejectChanges, REVISION_STORY_PART_PATHS, accept_changes, reject_changes, or side-part revision markup, does accept/reject process every revision-bearing story — updating document.xml + footnotes.xml + endnotes.xml + comments.xml, writing back only changed side parts while refreshing cached XML, and pruning orphan footnotes without deleting reserved separator entries? Accepting only in the main document leaves stale revisions and dangling references in the package. References: feat(docx-core): sweep side-part revisions on accept/reject #218, fix(docx-mcp): seed revision ids from side parts #216, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225.

  7. DocumentViewNode.heading stays canonical: If this PR touches packages/docx-core/src/primitives/document_view.ts, HeadingValue, heading heuristics, ListMetadata.header_style, or Google Docs document-view heading normalization, does node.heading remain a structural heading signal — exact Word styles Heading1Heading6 win, heuristic sources suppressed inside table cells while real Word heading styles still pass, ordinary body paragraphs omit the heading key? Consumers use node.heading != null as a structural test; heuristic false positives break downstream navigation. References: fix(docx-core): harden heading detection (#157 Phase 1) #178, fix(docx-core): suppress non-sectional false-positive headings (closes #187) #188, feat(docx-core): add derived heading object to DocumentViewNode (closes #179) #190.

  8. AI-author parity across entry points: If this PR touches packages/docx-mcp/src/server.ts, packages/docx-mcp/src/cli/tool_runner.ts, packages/docx-mcp/src/cli/commands/**, or adds any new new SessionManager(...) call site in docx-mcp, does every entry point that constructs a SessionManager resolve SAFE_DOCX_AI_AUTHOR with the same three-way semantics (set → use it; empty string → opt out to untracked; unset → defaultAiAuthor), or has a new entry path silently bypassed tracked emission? Each entry path looks locally correct while diverging from another; tracked emission has gone dark in one path before anyone noticed. References: feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172 (production MCP wiring would have kept tracked emission dark), fix(docx-mcp): honor SAFE_DOCX_AI_AUTHOR in CLI entry points (#181) #182 (CLI runners constructing bare SessionManager() silently produced untracked edits).

  9. Property-change wrapper discipline: If this PR touches packages/docx-core/src/primitives/layout.ts, packages/docx-core/src/primitives/text.ts, packages/docx-mcp/src/tools/clear_formatting.ts, or packages/docx-core/src/primitives/track-changes-emitter.ts, do tracked formatting/property edits emit exactly one correct *PrChange wrapper (pPrChange / rPrChange / trPrChange / tcPrChange) carrying a snapshot of the prior live properties — not stacking stale wrappers, not stripping valid historical children (cellIns/cellDel/cellMerge), and not omitting the snapshot when the operation is formatting-aware? Emitted OOXML is visually plausible but subtle snapshot mistakes only surface during later accept/reject or in Word's tracked-changes UI. References: feat(docx-core): emit pPrChange/trPrChange/tcPrChange from layout setters (#140) #167 (duplicate pPrChange/trPrChange/tcPrChange stacking + over-broad tcPr exclusion), feat(docx-mcp): emit rPrChange from clear_formatting MCP tool (#141) #170 (clear_formatting failing to strip stale rPrChange), feat(docx-core): emit rPrChange for formatted paragraph replacements #215 (rPrChange for formatted paragraph replacements + filtering nested stale records).

  10. SUPPORT.md Table A drift vs. implementation: If this PR modifies OOXML revision emission behavior (w:ins, w:del, w:rPrChange, etc.) in packages/docx-core/src/primitives/**, or touches packages/docx-core/SUPPORT.md, does the PR symmetrically update Table A in SUPPORT.md when the supported revision-emission surface in primitives changed — added, removed, or weakened — or is the documented contract now lying about what's supported? Reviewers focus on TS AST correctness and golden tests; Markdown contract tables get treated as an afterthought, so the documented surface drifts from the actual surface. Reference: [120.8] Regression suite for canonical revision emission across the surface #143 review caught replaceParagraphTextRange should emit w:rPrChange when run formatting changes #173 (formatting mismatch in Table A) and addCommentReply should emit body revision markup OR SUPPORT.md should be softened #174 (comment body revision omission forcing a Table A softening) late in peer review.

  11. Table A / Table B boundary on side-part revisions: If this PR touches packages/docx-core/src/primitives/comments.ts, packages/docx-core/src/primitives/footnotes.ts, or other side-part primitives, and adds/changes revision markup (w:ins, w:del), does tracked-change revision logic stay scoped to Table A (document-body content inside the side part) without leaking revision markup into Table B (the side-part package bootstrap — comments.xml/footnotes.xml element registration itself)? Body runs and side-part package elements share nearly identical XML namespace schemas; revisions emitted in the wrong table corrupt the package contract while looking plausible. References: [120.3] Emit w:ins/w:del for comment body anchors #138 (comment-body straddle constraints), [120.4] Emit w:ins/w:del for footnote reference and text #139 (footnote-reference straddle constraints).

  12. Canonical-emission surface completeness: If this PR adds or changes a tracked-edit surface in packages/docx-core/src/primitives/** or packages/docx-mcp/src/tools/**, are the paired artifacts updated together — packages/docx-core/src/integration/canonical-emission-regression.test.ts, packages/docx-mcp/src/integration/canonical-emission-mcp.test.ts, and the documented emitter surface (Table A) — or is the rollout only partially wired? The primitive change looks done before the MCP path, regression matrix, and documented contract are wired through; partial rollouts ship undocumented surface that drifts. References: feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172 (RevisionContext threaded through every Table A MCP tool), test(docx-core,docx-mcp): final regression suite for canonical emission (#143) #175 (24-test regression suite + verified write-time emitter rows), feat(docx-core): emit rPrChange for formatted paragraph replacements #215 (re-enabled rPrChange regression + updated support surface for replaceParagraphTextRange).

  13. Lean predicate drift against engine semantics (asymmetric): If this PR changes field-wrapper semantics, the proof boundary, or atomizer behavior — packages/docx-core/src/baselines/atomizer/**, verification/lean/LeanSpike/Spec.lean, verification/lean/Tier2/**, or packages/docx-core/src/integration/lean-spec-bridge.test.ts — and if the TS engine semantics shifted, did the PR also update the Lean residual predicate and bridge tests, or is the proof now pinned to a stale stronger/weaker assumption? Asymmetric: a TS change without a corresponding Lean update is WARN; a Lean-only change without a TS update should not fire. The Lean side can still compile while the abstraction boundary is subtly wrong for the next engine refactor. References: feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset #208 (closed inv_field_001 using stronger recursivelyWellformed), refactor(verification): weaken inv_field_001 axiom to document-level preservationFriendly (rebased follow-up to #208) #220 (weakened the axiom to document-level preservationFriendly to avoid breakage when field fragmentation lands).

  14. Unit-test quality (avoid tautological / change-detector tests): If this PR adds or modifies any **/*.test.ts (or other test files), are the test assertions independent of the system under test — expected values constructed from first principles rather than re-derived from the function under test, mocks limited to external boundaries (filesystem, network, clocks) rather than mocking the SUT itself, assertions making concrete semantic claims rather than just snapshotting current behavior or asserting non-null, and any test added alongside a bug fix actually exercising the bug? Tests that re-implement the production code as the "expected" value, or mock out the system under test, pass green while providing no regression protection.

Estimated cost (this run): $0.0173 — 55,045 input + 294 output tokens (≈4 chars/token) on gemini-3.5-flash. Char-count estimate, not provider telemetry.

@stevenobiajulu stevenobiajulu enabled auto-merge (squash) June 7, 2026 02:50
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 7, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
packages/docx-mcp/src/tools/odf/add_comment.ts 87.50% 2 Missing and 1 partial ⚠️
packages/docx-mcp/src/tools/odf/get_comments.ts 66.66% 1 Missing and 1 partial ⚠️
packages/docx-mcp/src/server.ts 66.66% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…llocation

Peer review (Codex + agy) on the final diff surfaced three issues in the new
office:annotation path:

- addAnnotation accepted malformed ranges (reversed, one-sided, out-of-bounds),
  which duplicated visible text on a reversed range. Validate the range and fail
  closed with INVALID_RANGE before mutating the DOM. (Not reachable via the MCP
  handler, whose ranges come from findUniqueSubstringMatch, but the odf-core API
  is public.)
- Multi-line comment bodies were lossy: makeAnnotation wrote a literal \n into one
  text node (LibreOffice renders it as a space) and annotationBodyText read via
  textContent (drops text:line-break / text:s). Write blank-line-split text:p with
  text:line-break (parity with insertParagraph) and read bodies via buildSegments
  so multi-line comments round-trip.
- allocateName ignored the synthetic ids readAnnotations assigns to custom-named
  (non-__Annot__N) annotations, so a returned commentId could coincide with another
  annotation's id. Reserve maxParsed + (count of non-matching annotations) + 1.

Adds odf-core tests for each: INVALID_RANGE rejection (no text mutation), multi-line
round-trip, and id allocation past custom-named annotations.
@stevenobiajulu stevenobiajulu added the llm-gate/override Bypass LLM gate WARN findings on this PR label Jun 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

LLM-Based Quality Gate

⚠️ Override activellm-gate/override label is set; this gate's WARN findings are non-blocking on this PR.

Overall: ✅ PASS (5 pass · 0 warn · 9 skipped · 14 total)

Check Verdict
⏭️ SKIPPED read_file response metadata parity paths not touched by this PR
⏭️ SKIPPED Live DOM namespace-safe OOXML writes paths not touched by this PR
Deleted field markup keeps w:fldChar outside w:del The PR does not touch field atomization, validateFieldStructure, hasFldCharInsideDel, or related OOXML field markup, as it only introduces ODF comment support in packages/odf-core and packages/docx-mcp.
⏭️ SKIPPED Field validation per story, not global paths not touched by this PR
⏭️ SKIPPED Revision IDs seeded from all revision-bearing side parts paths not touched by this PR
Accept/reject sweep side parts and caches The PR does not touch DocxDocument.acceptChanges, DocxDocument.rejectChanges, REVISION_STORY_PART_PATHS, accept_changes, reject_changes, or side-part revision markup, as it focuses on ODF comment support.
⏭️ SKIPPED DocumentViewNode.heading stays canonical paths not touched by this PR
AI-author parity across entry points Every entry point that constructs a SessionManager resolves SAFE_DOCX_AI_AUTHOR with matching three-way semantics: packages/docx-mcp/src/server.ts:250-252 resolves it inline, and the CLI paths in packages/docx-mcp/src/cli/tool_runner.ts:27, edit.ts:132, and grep.ts:140 construct it with resolveCliAiAuthor() from tool_runner.ts:17-20.
⏭️ SKIPPED Property-change wrapper discipline paths not touched by this PR
⏭️ SKIPPED SUPPORT.md Table A drift vs. implementation paths not touched by this PR
⏭️ SKIPPED Table A / Table B boundary on side-part revisions paths not touched by this PR
Canonical-emission surface completeness The PR only introduces comment editing support for ODF (.odt) files in packages/docx-mcp/src/tools/odf/ and does not add or change any OOXML or other tracked-edit surfaces under Table A.
⏭️ SKIPPED Lean predicate drift against engine semantics (asymmetric) paths not touched by this PR
Unit-test quality (avoid tautological / change-detector tests) The new test files packages/docx-mcp/src/tools/odf/odf_comments.test.ts and packages/odf-core/src/comments.test.ts use concrete, manually-constructed expected XML/JSON values from first principles, avoid mocking the SUT, and verify all semantic aspects of ODF commenting (e.g., round-trips, boundary conditions, and ID allocation).
Full checklist questions
  1. read_file response metadata parity: If this PR touches packages/docx-mcp/src/tools/read_file.ts, budgeted pagination returns, or additive response metadata like warnings / comment_load_error, do every successful return path (default budgeted early return, non-budget fallthrough, explicit limit/node_ids) preserve the same additive diagnostic fields? read_file has multiple success exits; diagnostics have already disappeared on one path before. Reference: fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180 surfaced comment_load_error, fix(docx-mcp): warn when read_file budget is exceeded by a single node (closes #184) #186 added an early budget return + warnings, fix(docx-mcp): surface comment_load_error on the default budgeted read path (closes #189) #191 fixed the missing comment_load_error on the default budgeted path.

  2. Live DOM namespace-safe OOXML writes: If this PR touches packages/docx-core/src/primitives/comments.ts or writes prefixed OOXML attributes/elements (w14:*, w15:*, xmlns:*, comments.xml, commentsExtended.xml, people.xml), are prefixed OOXML names written with namespace-aware APIs — root aliases bound with setAttributeNS(XMLNS_NS, ...), prefixed attributes with setAttributeNS(W14_NS/W15_NS, ...), and is there a test that proves the live DOM works before serialization/reparse? String-prefixed attributes can serialize plausibly while the live DOM still throws namespace errors. Reference: fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180 (xmlns:w14/w15 declared on comments root before writing prefixed attrs).

  3. Deleted field markup keeps w:fldChar outside w:del: If this PR touches field atomization, validateFieldStructure, hasFldCharInsideDel, w:fldChar, w:instrText, w:delInstrText, or collapsed field comparison logic, does deleted field output stay ECMA-376-conformant — w:fldChar sibling-level (never inside w:del), deleted instructions use w:delInstrText only inside valid delete wrappers, accept/reject safety checks still reject malformed combined output? Word treats deleted field-state markup in the wrong container as document-corrupting. References: fix(docx-core): validate w:delInstrText placement and reject w:fldChar inside <w:del> #211, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228.

  4. Field validation per story, not global: If this PR touches packages/docx-core/src/baselines/atomizer/pipeline.ts, splitStories, validateFieldStructure, side-part merge logic, or footnote/endnote field handling, is field validation run independently per ECMA story (document.xml, each footnote, each endnote), with sidecars from both original and revised archives considered, and global counter balance not treated as sufficient? A document can be globally balanced but have an invalid field sequence inside one story. References: fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228, feat(docx-core): sweep side-part revisions on accept/reject #218.

  5. Revision IDs seeded from all revision-bearing side parts: If this PR touches packages/docx-mcp/src/session/manager.ts (especially getRevisionContextForSession or FIXED_REVISION_ID_SEED_PARTS), createRevisionContext, revision-ID allocation, or MCP tools that create tracked changes/comments/footnotes, does revision-ID allocation scan all relevant package parts before issuing new IDs — comments, footnotes, endnotes, glossary, headers, footers — ignore non-revision w:id values (comment IDs, bookmarks), and handle malformed optional parts gracefully? Revision IDs are package-wide; document-only seeding collides with existing side-part revisions. Reference: fix(docx-mcp): seed revision ids from side parts #216 (seed revision ids from side parts).

  6. Accept/reject sweep side parts and caches: If this PR touches DocxDocument.acceptChanges, DocxDocument.rejectChanges, REVISION_STORY_PART_PATHS, accept_changes, reject_changes, or side-part revision markup, does accept/reject process every revision-bearing story — updating document.xml + footnotes.xml + endnotes.xml + comments.xml, writing back only changed side parts while refreshing cached XML, and pruning orphan footnotes without deleting reserved separator entries? Accepting only in the main document leaves stale revisions and dangling references in the package. References: feat(docx-core): sweep side-part revisions on accept/reject #218, fix(docx-mcp): seed revision ids from side parts #216, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225.

  7. DocumentViewNode.heading stays canonical: If this PR touches packages/docx-core/src/primitives/document_view.ts, HeadingValue, heading heuristics, ListMetadata.header_style, or Google Docs document-view heading normalization, does node.heading remain a structural heading signal — exact Word styles Heading1Heading6 win, heuristic sources suppressed inside table cells while real Word heading styles still pass, ordinary body paragraphs omit the heading key? Consumers use node.heading != null as a structural test; heuristic false positives break downstream navigation. References: fix(docx-core): harden heading detection (#157 Phase 1) #178, fix(docx-core): suppress non-sectional false-positive headings (closes #187) #188, feat(docx-core): add derived heading object to DocumentViewNode (closes #179) #190.

  8. AI-author parity across entry points: If this PR touches packages/docx-mcp/src/server.ts, packages/docx-mcp/src/cli/tool_runner.ts, packages/docx-mcp/src/cli/commands/**, or adds any new new SessionManager(...) call site in docx-mcp, does every entry point that constructs a SessionManager resolve SAFE_DOCX_AI_AUTHOR with the same three-way semantics (set → use it; empty string → opt out to untracked; unset → defaultAiAuthor), or has a new entry path silently bypassed tracked emission? Each entry path looks locally correct while diverging from another; tracked emission has gone dark in one path before anyone noticed. References: feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172 (production MCP wiring would have kept tracked emission dark), fix(docx-mcp): honor SAFE_DOCX_AI_AUTHOR in CLI entry points (#181) #182 (CLI runners constructing bare SessionManager() silently produced untracked edits).

  9. Property-change wrapper discipline: If this PR touches packages/docx-core/src/primitives/layout.ts, packages/docx-core/src/primitives/text.ts, packages/docx-mcp/src/tools/clear_formatting.ts, or packages/docx-core/src/primitives/track-changes-emitter.ts, do tracked formatting/property edits emit exactly one correct *PrChange wrapper (pPrChange / rPrChange / trPrChange / tcPrChange) carrying a snapshot of the prior live properties — not stacking stale wrappers, not stripping valid historical children (cellIns/cellDel/cellMerge), and not omitting the snapshot when the operation is formatting-aware? Emitted OOXML is visually plausible but subtle snapshot mistakes only surface during later accept/reject or in Word's tracked-changes UI. References: feat(docx-core): emit pPrChange/trPrChange/tcPrChange from layout setters (#140) #167 (duplicate pPrChange/trPrChange/tcPrChange stacking + over-broad tcPr exclusion), feat(docx-mcp): emit rPrChange from clear_formatting MCP tool (#141) #170 (clear_formatting failing to strip stale rPrChange), feat(docx-core): emit rPrChange for formatted paragraph replacements #215 (rPrChange for formatted paragraph replacements + filtering nested stale records).

  10. SUPPORT.md Table A drift vs. implementation: If this PR modifies OOXML revision emission behavior (w:ins, w:del, w:rPrChange, etc.) in packages/docx-core/src/primitives/**, or touches packages/docx-core/SUPPORT.md, does the PR symmetrically update Table A in SUPPORT.md when the supported revision-emission surface in primitives changed — added, removed, or weakened — or is the documented contract now lying about what's supported? Reviewers focus on TS AST correctness and golden tests; Markdown contract tables get treated as an afterthought, so the documented surface drifts from the actual surface. Reference: [120.8] Regression suite for canonical revision emission across the surface #143 review caught replaceParagraphTextRange should emit w:rPrChange when run formatting changes #173 (formatting mismatch in Table A) and addCommentReply should emit body revision markup OR SUPPORT.md should be softened #174 (comment body revision omission forcing a Table A softening) late in peer review.

  11. Table A / Table B boundary on side-part revisions: If this PR touches packages/docx-core/src/primitives/comments.ts, packages/docx-core/src/primitives/footnotes.ts, or other side-part primitives, and adds/changes revision markup (w:ins, w:del), does tracked-change revision logic stay scoped to Table A (document-body content inside the side part) without leaking revision markup into Table B (the side-part package bootstrap — comments.xml/footnotes.xml element registration itself)? Body runs and side-part package elements share nearly identical XML namespace schemas; revisions emitted in the wrong table corrupt the package contract while looking plausible. References: [120.3] Emit w:ins/w:del for comment body anchors #138 (comment-body straddle constraints), [120.4] Emit w:ins/w:del for footnote reference and text #139 (footnote-reference straddle constraints).

  12. Canonical-emission surface completeness: If this PR adds or changes a tracked-edit surface in packages/docx-core/src/primitives/** or packages/docx-mcp/src/tools/**, are the paired artifacts updated together — packages/docx-core/src/integration/canonical-emission-regression.test.ts, packages/docx-mcp/src/integration/canonical-emission-mcp.test.ts, and the documented emitter surface (Table A) — or is the rollout only partially wired? The primitive change looks done before the MCP path, regression matrix, and documented contract are wired through; partial rollouts ship undocumented surface that drifts. References: feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172 (RevisionContext threaded through every Table A MCP tool), test(docx-core,docx-mcp): final regression suite for canonical emission (#143) #175 (24-test regression suite + verified write-time emitter rows), feat(docx-core): emit rPrChange for formatted paragraph replacements #215 (re-enabled rPrChange regression + updated support surface for replaceParagraphTextRange).

  13. Lean predicate drift against engine semantics (asymmetric): If this PR changes field-wrapper semantics, the proof boundary, or atomizer behavior — packages/docx-core/src/baselines/atomizer/**, verification/lean/LeanSpike/Spec.lean, verification/lean/Tier2/**, or packages/docx-core/src/integration/lean-spec-bridge.test.ts — and if the TS engine semantics shifted, did the PR also update the Lean residual predicate and bridge tests, or is the proof now pinned to a stale stronger/weaker assumption? Asymmetric: a TS change without a corresponding Lean update is WARN; a Lean-only change without a TS update should not fire. The Lean side can still compile while the abstraction boundary is subtly wrong for the next engine refactor. References: feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset #208 (closed inv_field_001 using stronger recursivelyWellformed), refactor(verification): weaken inv_field_001 axiom to document-level preservationFriendly (rebased follow-up to #208) #220 (weakened the axiom to document-level preservationFriendly to avoid breakage when field fragmentation lands).

  14. Unit-test quality (avoid tautological / change-detector tests): If this PR adds or modifies any **/*.test.ts (or other test files), are the test assertions independent of the system under test — expected values constructed from first principles rather than re-derived from the function under test, mocks limited to external boundaries (filesystem, network, clocks) rather than mocking the SUT itself, assertions making concrete semantic claims rather than just snapshotting current behavior or asserting non-null, and any test added alongside a bug fix actually exercising the bug? Tests that re-implement the production code as the "expected" value, or mock out the system under test, pass green while providing no regression protection.

Estimated cost (this run): $0.0183 — 58,208 input + 314 output tokens (≈4 chars/token) on gemini-3.5-flash. Char-count estimate, not provider telemetry.

@stevenobiajulu stevenobiajulu merged commit b4aadf1 into main Jun 7, 2026
25 checks passed
@stevenobiajulu stevenobiajulu deleted the odf-comments-20260606 branch June 7, 2026 09:35
@stevenobiajulu
Copy link
Copy Markdown
Member Author

✅ Post-merge smoke passed

Merged: b4aadf1 (squash, #341)
Built from: main @ b4aadf1
Smoke: clean build (odf-core → docx-mcp) + full suites + real-NVCA .odt feature path

Steps

  • ✅ build (clean dist from main HEAD)
  • ✅ tests (867 passed — odf-core 40, docx-mcp 827)
  • add_comment (whole-paragraph + ranged anchor_text + multi-line) → get_commentssave → reopen

Real-world fixtures

  • nvca-coi-regression/source.docx.odt via soffice --convert-to odt (real document, not a synthetic fixture)
    • Whole-paragraph comment by Jane Doe on p0; ranged comment by Acme Manufacturing anchored to "AMENDED"; multi-line comment ("First line…\nSecond line…") on p1.
    • get_comments returned all 3 with unique ids; the multi-line body round-tripped exactly (line break preserved, not collapsed) — exercising the post-review fix.
    • Saved (109 KB), reopened in a fresh session → 3 comments, authors intact.
    • LibreOffice converted the commented .odt.docx cleanly, preserving both comment authors and the multi-line break (w:br) — i.e. LibreOffice reads the office:annotations as native comments.

Cleanup

  • ✅ remote branch odf-comments-20260606 deleted (auto-delete on merge)
  • ✅ removed worktree ../safe-docx-odf-comments
  • ✅ deleted local branch + pruned origin

Log: $LOG (local).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat llm-gate/override Bypass LLM gate WARN findings on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant