Skip to content

Ontology rendering#113

Merged
jnu merged 3 commits into
mainfrom
onto-paint
Apr 15, 2026
Merged

Ontology rendering#113
jnu merged 3 commits into
mainfrom
onto-paint

Conversation

@jnu
Copy link
Copy Markdown
Contributor

@jnu jnu commented Apr 15, 2026

Add a new paint module (differentiated from the general render module that operates on RedactedText) to annotate the input pdf with the results of the ontology extraction.

Refactors some existing modules to pass through the appropriate data in the pipeline context.

Comment thread bc2/core/common/ontopainter.py Outdated
@jnu jnu merged commit 5b730c8 into main Apr 15, 2026
4 checks passed
@jnu jnu deleted the onto-paint branch April 15, 2026 16:50
scaled_points = [
(p[0] * page_width, p[1] * page_height) for p in region.points
]
shape.draw_rect(pymupdf.Quad(*scaled_points).rect)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: pymupdf.Quad(*scaled_points) crashes if region.points has anything other than exactly 4 points.
Severity: MEDIUM

Suggested Fix

Clamp scaled_points to exactly 4 points (e.g., use scaled_points[:4]) or construct the rect directly via pymupdf.Rect(min_x, min_y, max_x, max_y) computed from all points, avoiding the Quad constraint entirely.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: bc2/core/common/ontopainter.py#L127

Potential issue: In `_paint_rect` (line 127), `pymupdf.Quad(*scaled_points)` unpacks all
points from `region.points` as positional arguments. `pymupdf.Quad` accepts exactly 4
corner points (ul, ur, ll, lr). The polygon from Azure DI is converted in `openai.py` by
iterating `range(0, len(polygon), 2)`, which typically yields 4 points (8-float
polygon). However, `SourceChunkBoundingRegion.points` is typed as an unconstrained
`list[tuple[float, float]]`, so if Azure DI returns a polygon with more or fewer than 4
points (e.g., for irregular regions or future API changes),
`pymupdf.Quad(*scaled_points)` will raise a `TypeError` about incorrect number of
arguments, crashing the painting pipeline for that document.

Comment on lines +113 to +120
OntoPainterFieldConfig(
accessor=lambda report: [subject.dob for subject in report.subjects],
label="Subject DOB",
mark=OntoPainterMark.RECT,
fill=None,
stroke=Palette.Cyan1,
stroke_width=2,
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Subject DOB field is registered twice in painter, causing it to be painted twice on every document.
Severity: LOW

Suggested Fix

Remove the duplicate entry (lines 113-120) or replace it with the intended field configuration (e.g., subject.seq).

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: bc2/core/paint/ontology.py#L113-L120

Potential issue: In `bc2/core/paint/ontology.py`, lines 105-120, the
`OntoPainterFieldConfig` for `Subject DOB` is defined twice with identical configuration
(`accessor=lambda report: [subject.dob for subject in report.subjects]`, label `"Subject
DOB"`, same stroke color). This causes the painter to iterate over DOB fields twice per
document: once at lines 105-112 and again at lines 113-120. Each DOB annotation will be
drawn twice on top of itself, wasting rendering time and potentially confusing
downstream users who try to match labels to fields. This is clearly a copy-paste
mistake—one entry should likely be for a different field (e.g., subject `seq`).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant