fix: sanitize SVG output to remove invalid XML characters#46
Conversation
PlantUML may emit character references like  (backspace) in SVG output when source Javadoc contains inline-code delimiters or {@link} tags. These resolve to control characters that are illegal in XML 1.0, causing XML parse errors in downstream consumers. Add a sanitizeXml() step in PUMLDiagram that strips both invalid character references and raw control characters in a single pass, using the XML 1.0 valid character spec as the allowlist. Bumps version to 3.8.5. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dabc60a538
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| int last = 0; | ||
| while (m.find()) { | ||
| appendValidChars(sb, svg, last, m.start()); | ||
| int val = m.group(1) != null ? Integer.parseInt(m.group(1)) : Integer.parseInt(m.group(2), 16); |
There was a problem hiding this comment.
Guard numeric char ref parsing against overflow
sanitizeXml assumes every matched &#...;/&#x...; fits in a Java int, but Integer.parseInt(...) throws NumberFormatException for oversized values (for example �). In that case diagram generation now fails with an uncaught runtime exception instead of sanitizing and dropping the invalid reference, which is a regression for any SVG text that contains out-of-range numeric character references.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. Addressed in commit 30d7326 — extracted parsing into a parseCharRef helper that catches NumberFormatException and returns -1, which causes the invalid reference to be dropped silently.
Remove trailing whitespace after semicolon in for-loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace ternary with helper method to satisfy AvoidInlineConditionals - Guard parseInt against NumberFormatException from oversized char refs (e.g. �) per codex review feedback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip generic type parameters (<T>) and parentheses from component names before passing to PlantUML to avoid syntax errors when rendering C# code diagrams. Add C# integration and relationship extraction tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…text When filesFilter is set, Clarpse only parses filtered files. Relations to non-filtered components (e.g., extends ClassB where ClassB.java is not in the filter) are silently dropped because Clarpse classifies unresolved references as external dependencies. This adds an opt-in config toggle that, after the initial filtered parse: 1. Scans component references for unresolved targets 2. Derives source file names from component unique names 3. Parses just those files from the full ProjectFiles 4. Merges them into the models before CodeDiff creation Also updates ExtractedRelationships to check externalDependencies for specialization and realization references, since references to initially unparsed components are classified as external. Bumps version to 3.9.0. Adds ADR-005 documenting the decision. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ening style attrs GitHub's SVG sanitizer strips <image> elements and style attributes, causing metric badges and all lines/borders to disappear when diagrams are embedded in GitHub. This adds SvgImageInliner post-processing that: 1. Converts <image> elements with data:image/svg+xml;base64 data URIs into inline <g> elements with de-duplicated IDs 2. Flattens CSS style="stroke:X;stroke-width:Y;" into individual SVG presentation attributes (stroke, stroke-width) Also includes improvements to contextual component resolution, relation extraction, and change set tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
backspace) originating from{@link}/{@code}inline-code delimiters in source JavadocsanitizeXml()inPUMLDiagramthat strips both invalid&#N;/&#xN;character references and raw control characters in a single passTest plan
🤖 Generated with Claude Code