fix: escapeStringJson escapes DEL and C1 control characters#1005
Open
He-Pin wants to merge 1 commit into
Open
Conversation
This was referenced Jun 20, 2026
8d5cb43 to
33c87a2
Compare
Motivation: std.escapeStringJson and std.escapeStringPython passed DEL (0x7F) and C1 control characters (0x80-0x9F) through literally. go-jsonnet and C++ jsonnet render these values as \uXXXX escapes, so matching them avoids downstream divergence. The renderer fast paths also need to preserve UTF-8 semantics: byte-array scans must not treat UTF-8 continuation bytes in 0x80-0x9F as standalone C1 controls. Modification: - Escape DEL and C1 controls in BaseRenderer/BaseCharRenderer/BaseByteRenderer char-level paths. - Keep byte-level CharSWAR scans limited to JSON-significant bytes plus DEL, while String/char[] scans detect DEL/C1 before UTF-8 encoding. - Add JVM, JS, and Native CharSWAR updates and regression tests for DEL, C1, long strings, and non-ASCII passthrough. Result: DEL and C1 controls now render as \uXXXX without corrupting UTF-8 encoded non-ASCII data. Characters above U+009F still remain literal when escapeUnicode is false.
80912b5 to
d8aaab4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
std.escapeStringJsonandstd.escapeStringPythonpassed DEL (0x7F)and C1 control characters (0x80-0x9F) through unescaped. RFC 8259 only
requires U+0000-U+001F to be escaped, so the output was still valid
JSON, but it diverged from go-jsonnet v0.22.0 and C++ jsonnet, which
both emit
\uXXXXfor this range.Modification
BaseRenderer.escape,BaseCharRenderer, andBaseByteRendererto include 0x7F-0x9F(always escaped regardless of unicode flag).
CharSWARimplementations:byte-level SWAR detects DEL via XOR with
HOLE; 16-bit SWAR addsU16_DEL; scalar paths add range check.RenderUtils.escapeByte/escapeCharfallbackcalls with local escape helpers that consistently emit
\uXXXXforDEL / C1.
isAsciiJsonSafethreshold from>= 0x80to>= 0x7Ftoexclude DEL from ASCII-safe classification (fixes
std.base64Decodepath).
non-ASCII preservation.
Result
escapeStringJsonandescapeStringPythonnow escape DEL and C1control characters as
\uXXXX, matching go-jsonnet v0.22.0 and C++jsonnet behavior. Characters above 0x9F (NBSP, accented letters, CJK,
emoji) remain literal when
unicode=false. All tests pass on JVM /JS / Native with Scala 3.3.7 and 2.13.18.
References