feat(doc): strip server-rejected unsafe chars on markdown write path#465
Merged
PeterGuy326 merged 1 commit intoJun 16, 2026
Conversation
The doc write boundary only stripped a fixed dangerous-Unicode set, and only on the JSONML path. C0 control characters (except tab/newline), DEL (0x7F), and a few zero-width / line-separator codepoints still reached the server, where RejectControlChars rejects them — so doc create/update failed on content that pasted in such characters (common with LLM-generated or copy-pasted text). - Rename stripDocDangerousUnicode -> stripDocInputUnsafe and extend it to drop C0 controls (except \t and \n) and DEL, matching apiclient.rejectDangerousChars. - Add U+200D, U+2028, U+2029 to the dangerous-Unicode set so it covers the full server-rejected range. - Apply the strip on the markdown write path (doc create/update) and the JSONML node path, not just the JSONML body. - Add unit tests for stripDocInputUnsafe. Ported from dws-wukong (feat: 增加输入安全字符过滤功能).
audanye-sudo
approved these changes
Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Harden the
dws doc create/dws doc updatewrite boundary so content that contains control characters or dangerous Unicode is stripped before being sent, instead of being rejected by the server.Why
Today the doc write path only strips a fixed dangerous-Unicode set, and only on the JSONML branch. The Markdown branch sends raw content straight through. When content carries:
< 0x20other than\t/\n) or DEL (0x7F), orU+200D,U+2028,U+2029)…the server-side
RejectControlCharsvalidator rejects the request and the command fails. This is common with LLM-generated or copy-pasted text (e.g. a stray\x00, a zero-width joiner, or a Windows\r).Changes
stripDocDangerousUnicode→stripDocInputUnsafeand extend it to also drop C0 controls (except\t/\n) and DEL, matching the existing authoritativeapiclient.rejectDangerousCharsdefinition (so the CLI strips exactly what the API layer would reject).U+200D,U+2028,U+2029to the dangerous-Unicode set to cover the full server-rejected range.doc create/doc update) and the JSONML node path — previously only the JSONML body path was covered.TestStripDocInputUnsafe) using explicit\u/\xescapes so the offending codepoints are unambiguous in source.Tab and newline are intentionally preserved as legitimate document text.
Test
go test ./internal/helpers/— passgo test ./internal/helpers/docjsonml/— passgo build ./...— cleango vet ./internal/helpers/— cleanPorted from dws-wukong (
feat: 增加输入安全字符过滤功能).