Skip to content

fix(doc): strip duplicate title H1 on the JSONML doc-create path#464

Open
PeterGuy326 wants to merge 1 commit into
mainfrom
fix/doc-create-jsonml-dup-title
Open

fix(doc): strip duplicate title H1 on the JSONML doc-create path#464
PeterGuy326 wants to merge 1 commit into
mainfrom
fix/doc-create-jsonml-dup-title

Conversation

@PeterGuy326

Copy link
Copy Markdown
Collaborator

Problem

dws doc create renders --name as the page title. When the body also opens with an H1 of the same text, the document shows the title twice — the "两个标题头" effect.

PR #448 fixed this only on the markdown path (stripLeadingDuplicateTitleHeading). The JSONML path (--content-format jsonml) was never covered: rich docs — tables, callouts, styled blocks — go through create_document + update_document(jsonml=...), which writes the body verbatim. So a leading h1 whose text equals the document name survives and renders twice.

This was hit in practice by a richly-formatted doc (tables + styled 🧪 callout) created via the JSONML path — the leading H1 duplicating the page title was not stripped.

Fix

Add stripLeadingDuplicateTitleJSONML, the JSONML counterpart of the markdown guard, wired into the format == "jsonml" branch of doc create right after prepareDocJSONMLBody and before update_document:

  • Parse the marshaled JSONML body.
  • Skip an optional ["root", {}, ...] wrapper.
  • Drop the first node only when it's an h1 whose concatenated leaf text equals --name (trimmed, case-insensitive).
  • Any parse failure or non-match leaves the body untouched — a valid write is never blocked.
  • Emits the same stderr note as the markdown path.

Scope matches #448 exactly: only a leading h1 that exactly equals the name is removed; h2+, distinct headings, and names that merely share a prefix are kept.

Tests

  • TestStripLeadingDuplicateTitleJSONML — unit table: root-wrapped, nested leaf text, bare body (no wrapper), case-insensitive, distinct heading kept, non-h1 kept, invalid JSON untouched.
  • TestDocCreateStripsDuplicateTitleJSONML — end-to-end through the real doc create --content-format jsonml command, asserting the forwarded update_document jsonml has the duplicate h1 removed and the body content preserved.
ok  github.com/DingTalk-Real-AI/dingtalk-workspace-cli/internal/helpers

PR #448 stripped a leading H1 matching --name on the markdown path of
`dws doc create`, but the JSONML path (--content-format jsonml) was never
covered. Rich documents — tables, callouts, styled blocks — go through
create_document + update_document(jsonml=...), which writes the body
verbatim, so a leading h1 whose text equals the document name renders the
title twice (the "two headings" effect the doc platform shows because it
already renders --name as the page title).

Add stripLeadingDuplicateTitleJSONML: parse the marshaled JSONML body,
skip an optional ["root", {}, ...] wrapper, and drop the first node when
it is an h1 whose concatenated leaf text equals --name (trimmed,
case-insensitive). Any parse failure or non-match leaves the body
untouched, so a valid write is never blocked. A stderr note mirrors the
markdown path so agents learn the convention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant