v1.6.2 robustness: glyph sanitizer + capacity tolerance + LayerStack row + error messages#40
Open
DemchaAV wants to merge 10 commits into
Open
v1.6.2 robustness: glyph sanitizer + capacity tolerance + LayerStack row + error messages#40DemchaAV wants to merge 10 commits into
DemchaAV wants to merge 10 commits into
Conversation
Promote PdfFont.sanitizeByFont(PDFont, String) to public and add a companion PdfFont.sanitizeForRender(TextStyle, String) entry point. sanitizeForRender resolves the active font from a TextStyle, applies the standard control-character cleanup, then substitutes any code point the font cannot encode with '?'. This is the single seam render handlers will route through in R1.b/R1.c so wrap geometry stays in lockstep with the bytes drawn on the page. getTextWidth now measures against sanitizeForRender so width and render see the same string. Existing ASCII-only callers are unaffected. One snapshot (document/nested_list_three_levels) widened because ListBuilder default markers for deep nesting (U+25E6 white bullet, U+25AA black small square) are outside Helvetica's WinAnsi coverage. Pre-R1.a measurement silently returned 0 for those markers and the renderer would have crashed when actually drawing them; the new width matches the substituted glyphs that R1.b will draw. Follow-up tracked: ListBuilder defaults should be ASCII-safe or font-aware so end users do not see '?' as a bullet. Adds PdfFontSanitizerTest covering: unknown glyph substitution, ASCII pass-through, single/collapsed spaces, empty/null input, width consistency between bullet and substitute inputs, and the direct sanitizeByFont escape hatch. Refs R1.a; precedes R1.b (paragraph render handler wiring) and R1.c (watermark/header/footer/table wiring).
PdfParagraphFragmentRenderHandler now sanitises text via PdfFont.sanitizeForRender(textStyle, text) — the same entry point introduced in R1.a for width measurement — before passing it to PDPageContentStream.showText(). Any code point Helvetica's WinAnsiEncoding cannot encode (arrows U+2192, bullets U+25CF, emoji, custom unicode) is substituted with '?', preventing PDFBox from throwing IllegalArgumentException during render. Width measurement and render now see the same string for the same text, so wrap geometry stays consistent with the painted output. GlyphFallbackLogger (new) emits a single WARN per unique (font, codePoint) pair on first substitution. Subsequent substitutions of the same glyph are suppressed to avoid log flooding on documents that use one unsupported character many times. Logger category: com.demcha.compose.engine.render.pdf.glyph-fallback. Set to DEBUG in logback-test.xml to inspect every substitution. The dead private sanitize() helper and unused TextControlSanitizer import are removed — control-character cleanup is already part of sanitizeForRender. Refs R1.b; precedes R1.c (watermark/header/footer/table wiring) and R1.d (UnicodeFallbackDemoTest visual regression).
… render Routes the remaining six PDF text render seams through the same glyph substitution policy R1.b applied to the paragraph handler. Any code point Helvetica's WinAnsiEncoding cannot encode is replaced with '?' instead of crashing PDFBox showText: - PdfWatermarkRenderer: one sanitise call before render covers both the single-placement and tiled branches. - PdfHeaderFooterRenderer: left / center / right zones each sanitise; the now-uniform empty check folds the null guard into isEmpty() on the sanitised result. - PdfTextRenderHandler: route through font.sanitizeForRender so width and render agree on the same string. - PdfTableRowRenderHandler: per-line sanitise inside the setFont/setColor envelope so a single bad glyph in one cell does not crash the whole table render. - PdfTableRowFragmentRenderHandler: mirror of the row-render handler on the fragment path. - PdfBlockTextRenderHandler: setFont now returns the active PDFont so the body loop can sanitise against it; the IllegalArgumentException catch is removed because showText can no longer throw on unsupported glyphs. To keep both raw-PDFont helpers (watermark, header/footer) and PdfFont-wrapper handlers on the exact same substitution policy, GlyphFallbackLogger.sanitize(PDFont, String) was promoted to a public static helper. PdfFont.sanitizeByFont now delegates to it, so the PdfFontSanitizerTest contract from R1.a still holds end-to-end. Verified: full mvnw verify zelyony (828 tests). No snapshots shifted beyond the deliberate ListBuilder marker baseline already updated in R1.a. Refs R1.c; closes the wiring half of R1 — only R1.d (UnicodeFallbackDemoTest end-to-end visual regression) remains.
End-to-end proof that R1.a-R1.c protect every PDF text render seam
from glyphs Helvetica's WinAnsi encoding cannot cover. Four
independent test methods exercise the four wired paths so a future
regression in any one is localised to a single failure:
- paragraphWithUnsupportedGlyphsRendersWithoutCrash — arrows, bullets,
white circles, small squares, emoji, copyright in a paragraph body
- tableWithUnsupportedGlyphsRendersWithoutCrash — header + body cells
containing bullets and emoji
- watermarkWithUnsupportedGlyphRendersWithoutCrash — diagonal DRAFT
watermark text with arrow, copyright, bullet
- headerFooterWithUnsupportedGlyphRendersWithoutCrash — left / right
header zones and centre footer with arrow, bullet, white circle,
copyright, emoji
Pre-R1 every assertion would never run because PDFBox would throw
IllegalArgumentException ('arrowright is not available in the font
Helvetica, encoding: WinAnsiEncoding') deep inside showText. After
R1.b/c the unsupported code points become '?' and the document
completes; output PDFs land in target/visual-tests/glyph-fallback/
for manual review.
assertValidPdf checks the %PDF- magic header and a > 500 byte body —
small enough to allow single-paragraph PDFBox output, large enough
to catch a truncated/empty render.
Refs R1.d; closes R1. v1.6.2 R2 (capacity tolerance), R3 (LayerStack
row relax), R4 (error messages) follow.
DocumentPageSize.A4 resolves to the exact PDF point value 841.88977 pt. Pre-R2 the page-capacity guard compared against this with EPS=1e-6 tolerance, which meant a node authored with the rounded human input 842.0 pt failed with "requires outer height 842.0 but page capacity is 841.88977" — blocking legitimate full-page background surfaces. Introduces a separate CAPACITY_TOLERANCE = 0.5 pt constant used only where outer height is compared against the full inner-canvas height (row composite, stack composite, atomic leaf). EPS stays 1e-6 for floating-point noise inside split decisions and remaining-height checks — bigger tolerance there would mask real overflow inside the splittable path. 0.5 pt ≈ 0.18 mm — visually indistinguishable, and the test matrix proves the boundary: 156.4 pt on a 156 pt inner area now renders (was throwing); 158 pt on the same area still throws as a real overflow. PaginationEdgeCaseTest gains two boundary cases pinning the new contract — atomicNodeHalfPointOverCapacityShouldFitWithinTolerance and atomicNodeClearlyOverCapacityShouldStillThrow — without weakening the existing oversizedAtomicImageShouldThrowDomainSpecificError which still triggers on the much larger 240 pt vs 156 pt mismatch. Refs R2.a; precedes R2.b (full-A4 surface visual regression).
Renders a single ShapeNode sized 595 x 842 pt on a DocumentSession configured with DocumentPageSize.A4 (exact PDF point value 841.88977) and zero margin. The 0.11 pt difference between the rounded user input (842) and the true inner-canvas height (841.88977) pre-R2 hit the EPS=1e-6 capacity check and threw AtomicNodeTooLargeException. After R2.a the new CAPACITY_TOLERANCE absorbs the rounding and a valid PDF lands in target/visual-tests/page-capacity/PageCapacityToleranceDemo.pdf. Asserts the rendered file has the %PDF- magic header and a > 500 byte body — small enough to allow a single-shape PDFBox output, large enough to catch a truncated render. Refs R2.b; closes R2. R3 (LayerStack row relax) and R4 (error message clarity pass) follow.
…leNodeInFixedSlot Pure plumbing change — no behavior is altered. Introduces a private FixedSlotKind enum and threads it through compileNodeInFixedSlot: - ROW_SLOT (default for row-composite children) — column of a horizontal row band where a nested row would create real composition conflict. - STACK_LAYER_SLOT — child of a LayerStackNode layer rectangle, where a 'horizontal' arrangement is just a column-row inside an already-fixed layer area, not a competing band. Three call sites updated to pass the correct kind: - row composite child (compileRowComposite line ~408): ROW_SLOT - placeStackLayer (line ~913): STACK_LAYER_SLOT - recursive VERTICAL children (compileNodeInFixedSlot line ~1267): propagate the caller's kind so a STACK descendant (column → row → ...) keeps the relaxed policy. Validator at line ~1102 still rejects HORIZONTAL uniformly — the behaviour relax happens in R3.b. This commit alone verifies green (no test sees a row inside a layer yet). Refs R3.a; precedes R3.b (validator relax) and R3.c (visual + snapshot).
Relaxes the no-nested-horizontal-row rule at LayoutCompiler line ~1154 so it fires only when the immediate parent slot is a ROW_SLOT. STACK_LAYER_SLOT parents (LayerStack layers and any section descendants thereof) accept horizontal composites because the surrounding rectangle is already fixed by the layer's alignment — a row there is a normal column-row inside the layer area, not a competing band. Real-world case unlocked: page-level LayerStack with a full-width dark band as one layer and a sidebar+main row as the content layer — exactly what the Noir corporate CV flow was trying to do (see audit feedback item 1). Error message rewritten to point at the new escape hatch and the historical workaround: "Row '...' cannot contain a nested horizontal row. Wrap the inner row in a LayerStack layer (allowed since v1.6.2), or stack horizontal content as sections inside a vertical column." LayerStackRowCompositionTest (new) pins both ends of the contract: - rowSitsDirectlyInsideLayerStackContentLayer - layerStackAtRootWithBackgroundLayerAndContentRowRenders (Noir CV shape: dark band + sidebar+main row) - rowDeepInsideLayerStackThroughVerticalSectionsRenders (verifies STACK_LAYER_SLOT propagates through nested vertical composites) - rowInsideRowStillThrowsAfterRelaxation (negative guard — the relaxation must not leak into the ROW_SLOT path) Refs R3.b; precedes R3.c (visual demo + snapshot baselines).
End-to-end PDF render of the Noir-corporate-CV-style shape from the v1.6.2 audit feedback: A4 page with a full-width dark hero band as one LayerStack layer and a sidebar + main row as the second content layer. Pre-R3 this exact composition crashed with "Row '...' cannot contain a nested horizontal row" because the validator could not tell a row-band parent from a layer rectangle parent. After R3.a (FixedSlotKind plumbing) + R3.b (validator relax) the document renders cleanly to target/visual-tests/layer-stack/LayerStackRowDemo.pdf for manual review. Sidebar is intentionally sized well under the page capacity (no fillRemainingHeight() primitive yet — see v1.7.0 B6 in the execution plan) so this regression isolates R3 from the fill-to-page-bottom feature. Refs R3.c; closes R3. R4 (error message clarity pass for remaining exceptions) follows.
Rewrites the five remaining engine-thrown exception messages to
include an action verb so developers see what to do next, not just
which rule fired. Existing substring assertions in
PaginationEdgeCaseTest and LayerStackRowCompositionTest stay
intact — every message still contains its original anchor phrase
plus the new action sentence.
Changes:
- "Node '...' has no horizontal layout space."
-> + "Reduce padding or margin on the parent, or increase the
page width."
- "Node '...' measured width X exceeds available width Y."
-> + "Reduce the node width, shorten inline content, or wrap
content in a smaller container."
- "Row '...' child '...' measured height X exceeds row inner
height."
-> + "Reduce the child height, shorten its content, or increase
the row height."
- "Split did not make progress for node '...'."
-> + "The node's NodeDefinition.split() returned the original
input as the tail — check the definition for an infinite
split loop and ensure each split advances."
- "Node '...' requires outer height X but page capacity is Y."
(AtomicNodeTooLargeException)
-> + "Reduce the node height, split content into multiple atomic
blocks, or increase the page size. Differences under 0.5 pt
are tolerated as rounding noise (v1.6.2+)."
Refs R4; closes the v1.6.2 robustness pass (R1 + R2 + R3 + R4).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the v1.6.2 robustness pass (R1-R4). Every change is additive — no public API removed or renamed, all 828+ existing tests still pass.
R1 — Glyph sanitizer (4 commits)
Any Unicode codepoint Helvetica/WinAnsi cannot encode (arrows, bullets, emoji, ©, custom symbols) is now substituted with
?on the render path instead of crashing PDFBox withIllegalArgumentException. Width measurement and render see the same sanitized string so wrap geometry stays consistent.feat(engine): expose sanitizeForRender entry point on PdfFontfeat(engine): wire glyph sanitizer into paragraph render handlerfeat(engine): wire glyph sanitizer into watermark/header/footer/table rendertest(engine): visual regression for unicode fallbackR2 — Capacity tolerance (2 commits)
DocumentPageSize.A4.height()= 841.88977. Pre-R2 a node sized at rounded842.0failed the capacity check withEPS=1e-6. NewCAPACITY_TOLERANCE = 0.5ptabsorbs human-input rounding without masking real overflow (>1pt still throws).fix(engine): allow 0.5pt rounding tolerance in page capacity checktest(engine): visual regression for full-A4 surfaceR3 — LayerStack + Row (3 commits)
Compiler validator no longer rejects
Rowinside aLayerStackcontent layer. NewFixedSlotKindenum distinguishes row-band parent (still strict) from stack-layer parent (relaxed). Real-world unlock: full-width dark band + sidebar/main row page layout from Noir corporate CV.refactor(engine): introduce FixedSlotKind enum threaded through compileNodeInFixedSlotfeat(engine): allow Row inside LayerStack content layertest(engine): visual regression for row-in-layer-stackR4 — Error message clarity (1 commit)
Five engine exceptions now include action verbs (Reduce, Wrap, Stack, Increase) so developers see what to do, not just which rule fired. All existing substring assertions still pass.
chore(engine): exception messages now suggest concrete actionsTest plan
mvnw verifyzelyony (~828 tests) after every atomic committarget/visual-tests/:glyph-fallback/UnicodeFallbackDemo*.pdfpage-capacity/PageCapacityToleranceDemo.pdflayer-stack/LayerStackRowDemo.pdfdocument/nested_list_three_levels.json) — documented deliberate change in commit bodyFollow-ups (separate work)
?after R1.a wiring — should use ASCII-safe or font-aware fallback in v1.7 (issue/task to be filed separately)docs/private/v1.6.2-v1.7-execution-plan.mdManual decisions still owned by author
v1.6.2develop->main