Skip to content

v1.6.2 robustness: glyph sanitizer + capacity tolerance + LayerStack row + error messages#40

Open
DemchaAV wants to merge 10 commits into
developfrom
release/v1.6.2
Open

v1.6.2 robustness: glyph sanitizer + capacity tolerance + LayerStack row + error messages#40
DemchaAV wants to merge 10 commits into
developfrom
release/v1.6.2

Conversation

@DemchaAV
Copy link
Copy Markdown
Owner

@DemchaAV DemchaAV commented May 19, 2026

Summary

Closes the v1.6.2 robustness pass (R1-R4). Every change is additive — no public API removed or renamed, all 828+ existing tests still pass.

R1 — Glyph sanitizer (4 commits)

Any Unicode codepoint Helvetica/WinAnsi cannot encode (arrows, bullets, emoji, ©, custom symbols) is now substituted with ? on the render path instead of crashing PDFBox with IllegalArgumentException. Width measurement and render see the same sanitized string so wrap geometry stays consistent.

  • feat(engine): expose sanitizeForRender entry point on PdfFont
  • feat(engine): wire glyph sanitizer into paragraph render handler
  • feat(engine): wire glyph sanitizer into watermark/header/footer/table render
  • test(engine): visual regression for unicode fallback

R2 — Capacity tolerance (2 commits)

DocumentPageSize.A4.height() = 841.88977. Pre-R2 a node sized at rounded 842.0 failed the capacity check with EPS=1e-6. New CAPACITY_TOLERANCE = 0.5pt absorbs human-input rounding without masking real overflow (>1pt still throws).

  • fix(engine): allow 0.5pt rounding tolerance in page capacity check
  • test(engine): visual regression for full-A4 surface

R3 — LayerStack + Row (3 commits)

Compiler validator no longer rejects Row inside a LayerStack content layer. New FixedSlotKind enum distinguishes row-band parent (still strict) from stack-layer parent (relaxed). Real-world unlock: full-width dark band + sidebar/main row page layout from Noir corporate CV.

  • refactor(engine): introduce FixedSlotKind enum threaded through compileNodeInFixedSlot
  • feat(engine): allow Row inside LayerStack content layer
  • test(engine): visual regression for row-in-layer-stack

R4 — Error message clarity (1 commit)

Five engine exceptions now include action verbs (Reduce, Wrap, Stack, Increase) so developers see what to do, not just which rule fired. All existing substring assertions still pass.

  • chore(engine): exception messages now suggest concrete actions

Test plan

  • mvnw verify zelyony (~828 tests) after every atomic commit
  • New visual demos under target/visual-tests/:
    • glyph-fallback/UnicodeFallbackDemo*.pdf
    • page-capacity/PageCapacityToleranceDemo.pdf
    • layer-stack/LayerStackRowDemo.pdf
  • One snapshot baseline updated (document/nested_list_three_levels.json) — documented deliberate change in commit body
  • Negative tests pin contracts that must keep throwing (row-in-row, >1pt overflow, etc.)

Follow-ups (separate work)

  • ListBuilder default markers (◦ U+25E6, ▪ U+25AA) substitute to ? after R1.a wiring — should use ASCII-safe or font-aware fallback in v1.7 (issue/task to be filed separately)
  • v1.7 DX helpers (heading levels, field rows, two-column, heading bar, inline shape runs) tracked in docs/private/v1.6.2-v1.7-execution-plan.md

Manual decisions still owned by author

  • Tagging v1.6.2
  • Merging develop -> main
  • GitHub Release publication
  • README install snippet bump

DemchaAV added 10 commits May 19, 2026 14:46
Promote PdfFont.sanitizeByFont(PDFont, String) to public and add a
companion PdfFont.sanitizeForRender(TextStyle, String) entry point.
sanitizeForRender resolves the active font from a TextStyle, applies
the standard control-character cleanup, then substitutes any code
point the font cannot encode with '?'. This is the single seam render
handlers will route through in R1.b/R1.c so wrap geometry stays in
lockstep with the bytes drawn on the page.

getTextWidth now measures against sanitizeForRender so width and
render see the same string. Existing ASCII-only callers are
unaffected. One snapshot (document/nested_list_three_levels) widened
because ListBuilder default markers for deep nesting (U+25E6 white
bullet, U+25AA black small square) are outside Helvetica's WinAnsi
coverage. Pre-R1.a measurement silently returned 0 for those markers
and the renderer would have crashed when actually drawing them; the
new width matches the substituted glyphs that R1.b will draw.
Follow-up tracked: ListBuilder defaults should be ASCII-safe or
font-aware so end users do not see '?' as a bullet.

Adds PdfFontSanitizerTest covering: unknown glyph substitution, ASCII
pass-through, single/collapsed spaces, empty/null input, width
consistency between bullet and substitute inputs, and the direct
sanitizeByFont escape hatch.

Refs R1.a; precedes R1.b (paragraph render handler wiring) and R1.c
(watermark/header/footer/table wiring).
PdfParagraphFragmentRenderHandler now sanitises text via
PdfFont.sanitizeForRender(textStyle, text) — the same entry point
introduced in R1.a for width measurement — before passing it to
PDPageContentStream.showText(). Any code point Helvetica's
WinAnsiEncoding cannot encode (arrows U+2192, bullets U+25CF, emoji,
custom unicode) is substituted with '?', preventing PDFBox from
throwing IllegalArgumentException during render.

Width measurement and render now see the same string for the same
text, so wrap geometry stays consistent with the painted output.

GlyphFallbackLogger (new) emits a single WARN per unique
(font, codePoint) pair on first substitution. Subsequent substitutions
of the same glyph are suppressed to avoid log flooding on documents
that use one unsupported character many times. Logger category:
com.demcha.compose.engine.render.pdf.glyph-fallback. Set to DEBUG in
logback-test.xml to inspect every substitution.

The dead private sanitize() helper and unused TextControlSanitizer
import are removed — control-character cleanup is already part of
sanitizeForRender.

Refs R1.b; precedes R1.c (watermark/header/footer/table wiring) and
R1.d (UnicodeFallbackDemoTest visual regression).
… render

Routes the remaining six PDF text render seams through the same glyph
substitution policy R1.b applied to the paragraph handler. Any code
point Helvetica's WinAnsiEncoding cannot encode is replaced with '?'
instead of crashing PDFBox showText:

- PdfWatermarkRenderer: one sanitise call before render covers both
  the single-placement and tiled branches.
- PdfHeaderFooterRenderer: left / center / right zones each
  sanitise; the now-uniform empty check folds the null guard into
  isEmpty() on the sanitised result.
- PdfTextRenderHandler: route through font.sanitizeForRender so
  width and render agree on the same string.
- PdfTableRowRenderHandler: per-line sanitise inside the
  setFont/setColor envelope so a single bad glyph in one cell does
  not crash the whole table render.
- PdfTableRowFragmentRenderHandler: mirror of the row-render handler
  on the fragment path.
- PdfBlockTextRenderHandler: setFont now returns the active PDFont so
  the body loop can sanitise against it; the IllegalArgumentException
  catch is removed because showText can no longer throw on
  unsupported glyphs.

To keep both raw-PDFont helpers (watermark, header/footer) and
PdfFont-wrapper handlers on the exact same substitution policy,
GlyphFallbackLogger.sanitize(PDFont, String) was promoted to a public
static helper. PdfFont.sanitizeByFont now delegates to it, so the
PdfFontSanitizerTest contract from R1.a still holds end-to-end.

Verified: full mvnw verify zelyony (828 tests). No snapshots shifted
beyond the deliberate ListBuilder marker baseline already updated in
R1.a.

Refs R1.c; closes the wiring half of R1 — only R1.d
(UnicodeFallbackDemoTest end-to-end visual regression) remains.
End-to-end proof that R1.a-R1.c protect every PDF text render seam
from glyphs Helvetica's WinAnsi encoding cannot cover. Four
independent test methods exercise the four wired paths so a future
regression in any one is localised to a single failure:

- paragraphWithUnsupportedGlyphsRendersWithoutCrash — arrows, bullets,
  white circles, small squares, emoji, copyright in a paragraph body
- tableWithUnsupportedGlyphsRendersWithoutCrash — header + body cells
  containing bullets and emoji
- watermarkWithUnsupportedGlyphRendersWithoutCrash — diagonal DRAFT
  watermark text with arrow, copyright, bullet
- headerFooterWithUnsupportedGlyphRendersWithoutCrash — left / right
  header zones and centre footer with arrow, bullet, white circle,
  copyright, emoji

Pre-R1 every assertion would never run because PDFBox would throw
IllegalArgumentException ('arrowright is not available in the font
Helvetica, encoding: WinAnsiEncoding') deep inside showText. After
R1.b/c the unsupported code points become '?' and the document
completes; output PDFs land in target/visual-tests/glyph-fallback/
for manual review.

assertValidPdf checks the %PDF- magic header and a > 500 byte body —
small enough to allow single-paragraph PDFBox output, large enough
to catch a truncated/empty render.

Refs R1.d; closes R1. v1.6.2 R2 (capacity tolerance), R3 (LayerStack
row relax), R4 (error messages) follow.
DocumentPageSize.A4 resolves to the exact PDF point value
841.88977 pt. Pre-R2 the page-capacity guard compared against this
with EPS=1e-6 tolerance, which meant a node authored with the
rounded human input 842.0 pt failed with
"requires outer height 842.0 but page capacity is 841.88977" —
blocking legitimate full-page background surfaces.

Introduces a separate CAPACITY_TOLERANCE = 0.5 pt constant used only
where outer height is compared against the full inner-canvas height
(row composite, stack composite, atomic leaf). EPS stays 1e-6 for
floating-point noise inside split decisions and remaining-height
checks — bigger tolerance there would mask real overflow inside the
splittable path.

0.5 pt ≈ 0.18 mm — visually indistinguishable, and the test matrix
proves the boundary: 156.4 pt on a 156 pt inner area now renders
(was throwing); 158 pt on the same area still throws as a real
overflow.

PaginationEdgeCaseTest gains two boundary cases pinning the new
contract — atomicNodeHalfPointOverCapacityShouldFitWithinTolerance
and atomicNodeClearlyOverCapacityShouldStillThrow — without
weakening the existing oversizedAtomicImageShouldThrowDomainSpecificError
which still triggers on the much larger 240 pt vs 156 pt mismatch.

Refs R2.a; precedes R2.b (full-A4 surface visual regression).
Renders a single ShapeNode sized 595 x 842 pt on a DocumentSession
configured with DocumentPageSize.A4 (exact PDF point value
841.88977) and zero margin. The 0.11 pt difference between the
rounded user input (842) and the true inner-canvas height (841.88977)
pre-R2 hit the EPS=1e-6 capacity check and threw
AtomicNodeTooLargeException. After R2.a the new CAPACITY_TOLERANCE
absorbs the rounding and a valid PDF lands in
target/visual-tests/page-capacity/PageCapacityToleranceDemo.pdf.

Asserts the rendered file has the %PDF- magic header and a
> 500 byte body — small enough to allow a single-shape PDFBox
output, large enough to catch a truncated render.

Refs R2.b; closes R2. R3 (LayerStack row relax) and R4 (error
message clarity pass) follow.
…leNodeInFixedSlot

Pure plumbing change — no behavior is altered. Introduces a private
FixedSlotKind enum and threads it through compileNodeInFixedSlot:

- ROW_SLOT (default for row-composite children) — column of a
  horizontal row band where a nested row would create real
  composition conflict.
- STACK_LAYER_SLOT — child of a LayerStackNode layer rectangle,
  where a 'horizontal' arrangement is just a column-row inside an
  already-fixed layer area, not a competing band.

Three call sites updated to pass the correct kind:
- row composite child (compileRowComposite line ~408): ROW_SLOT
- placeStackLayer (line ~913): STACK_LAYER_SLOT
- recursive VERTICAL children (compileNodeInFixedSlot line ~1267):
  propagate the caller's kind so a STACK descendant
  (column → row → ...) keeps the relaxed policy.

Validator at line ~1102 still rejects HORIZONTAL uniformly — the
behaviour relax happens in R3.b. This commit alone verifies
green (no test sees a row inside a layer yet).

Refs R3.a; precedes R3.b (validator relax) and R3.c (visual + snapshot).
Relaxes the no-nested-horizontal-row rule at LayoutCompiler line
~1154 so it fires only when the immediate parent slot is a
ROW_SLOT. STACK_LAYER_SLOT parents (LayerStack layers and any
section descendants thereof) accept horizontal composites because
the surrounding rectangle is already fixed by the layer's
alignment — a row there is a normal column-row inside the layer
area, not a competing band.

Real-world case unlocked: page-level LayerStack with a full-width
dark band as one layer and a sidebar+main row as the content layer
— exactly what the Noir corporate CV flow was trying to do
(see audit feedback item 1).

Error message rewritten to point at the new escape hatch and the
historical workaround:
  "Row '...' cannot contain a nested horizontal row.
   Wrap the inner row in a LayerStack layer (allowed since v1.6.2),
   or stack horizontal content as sections inside a vertical column."

LayerStackRowCompositionTest (new) pins both ends of the contract:
- rowSitsDirectlyInsideLayerStackContentLayer
- layerStackAtRootWithBackgroundLayerAndContentRowRenders (Noir CV
  shape: dark band + sidebar+main row)
- rowDeepInsideLayerStackThroughVerticalSectionsRenders (verifies
  STACK_LAYER_SLOT propagates through nested vertical composites)
- rowInsideRowStillThrowsAfterRelaxation (negative guard — the
  relaxation must not leak into the ROW_SLOT path)

Refs R3.b; precedes R3.c (visual demo + snapshot baselines).
End-to-end PDF render of the Noir-corporate-CV-style shape from the
v1.6.2 audit feedback: A4 page with a full-width dark hero band as
one LayerStack layer and a sidebar + main row as the second content
layer.

Pre-R3 this exact composition crashed with
"Row '...' cannot contain a nested horizontal row" because the
validator could not tell a row-band parent from a layer rectangle
parent. After R3.a (FixedSlotKind plumbing) + R3.b (validator
relax) the document renders cleanly to
target/visual-tests/layer-stack/LayerStackRowDemo.pdf for manual
review.

Sidebar is intentionally sized well under the page capacity (no
fillRemainingHeight() primitive yet — see v1.7.0 B6 in the
execution plan) so this regression isolates R3 from the
fill-to-page-bottom feature.

Refs R3.c; closes R3. R4 (error message clarity pass for remaining
exceptions) follows.
Rewrites the five remaining engine-thrown exception messages to
include an action verb so developers see what to do next, not just
which rule fired. Existing substring assertions in
PaginationEdgeCaseTest and LayerStackRowCompositionTest stay
intact — every message still contains its original anchor phrase
plus the new action sentence.

Changes:

- "Node '...' has no horizontal layout space."
  -> + "Reduce padding or margin on the parent, or increase the
       page width."

- "Node '...' measured width X exceeds available width Y."
  -> + "Reduce the node width, shorten inline content, or wrap
       content in a smaller container."

- "Row '...' child '...' measured height X exceeds row inner
  height."
  -> + "Reduce the child height, shorten its content, or increase
       the row height."

- "Split did not make progress for node '...'."
  -> + "The node's NodeDefinition.split() returned the original
       input as the tail — check the definition for an infinite
       split loop and ensure each split advances."

- "Node '...' requires outer height X but page capacity is Y."
  (AtomicNodeTooLargeException)
  -> + "Reduce the node height, split content into multiple atomic
       blocks, or increase the page size. Differences under 0.5 pt
       are tolerated as rounding noise (v1.6.2+)."

Refs R4; closes the v1.6.2 robustness pass (R1 + R2 + R3 + R4).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant