fix: stabilize formula/layout inference, dedup VL masks, and tidy code & docs by GreatV · Pull Request #129 · GreatV/oar-ocr

GreatV · 2026-06-03T14:16:38Z

No description provided.

…de & docs

gemini-code-assist

Code Review

This pull request introduces several improvements, refactorings, and bug fixes across the OCR pipeline. Key changes include setting up CUDA_LAUNCH_BLOCKING=1 up front to prevent CUDA EP arena buffer races in PP-FormulaNet, introducing a shared create_generation_mask helper to mask out left-padding positions during autoregressive decoding, and removing stub table results in favor of surfacing errors. Additionally, the PR fixes a bug in PicoDet argmax initialization by seeding with f32::NEG_INFINITY instead of 0.0f32, guards against under-width outputs in PP-DocLayout to prevent panics, and cleans up various unused methods and comments. Feedback on the changes highlights the use of unstable let-chains in pp_formulanet.rs, which should be rewritten with nested if statements to ensure compatibility with stable Rust.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Copilot

Pull request overview

This PR tightens and stabilizes several OCR pipeline behaviors (notably table analysis, model preset inference, and CUDA formula decoding), deduplicates VLM generation masking logic across models, and cleans up a broad set of comments/docs to better reflect current behavior.

Changes:

Table analysis now surfaces errors instead of emitting stub/placeholder TableResults, and the public API is simplified to accept only the page image + layout elements.
Structure builder/model plumbing is made more robust: layout model preset matching is normalized, adapters are labeled with caller-provided model names, and a CUDA workaround is applied early for PP-FormulaNet (plus CUDA arena tuning).
VLM decoding gains a shared left-padding generation mask helper; layout postprocess gets additional guards/tests to prevent panics and improve argmax correctness.

Reviewed changes

Copilot reviewed 46 out of 47 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/oarocr/table_analyzer.rs	Removes formula/text-region inputs, removes stub tables, and surfaces structure/cell failures as errors.
src/oarocr/structure.rs	Normalizes layout preset parsing, sets CUDA workaround early for formula on CUDA, and threads model-name metadata into adapters.
src/oarocr/stitching.rs	Clarifies which layout labels are excluded from OCR matching and how formula exclusion is controlled.
src/oarocr/result.rs	Minor doc/comment cleanup on OCR result fields.
src/oarocr/ocr.rs	Docstring argument ordering/wording cleanup for word-box helpers.
src/lib.rs	Fixes doc comment formatting/indentation in the prelude docs.
oar-ocr-vl/src/paddleocr_vl/model.rs	Adds per-step generation mask to avoid attending to left padding in KV cache during decode.
oar-ocr-vl/src/paddleocr_vl/ernie.rs	Condenses KV-cache comment to a clearer, single rationale.
oar-ocr-vl/src/mineru/model.rs	Deduplicates generation-mask logic by using the shared attention helper.
oar-ocr-vl/src/hunyuanocr/vision.rs	Removes unused/dead-code helper.
oar-ocr-vl/src/hunyuanocr/model.rs	Adds per-step generation mask to avoid attending to left padding during decode; comment cleanup.
oar-ocr-vl/src/hsd/verify.rs	Clarifies position-id formula for HSD verification.
oar-ocr-vl/src/glmocr/vision.rs	Deduplicates RoPE inv_freq computation via shared helper.
oar-ocr-vl/src/attention.rs	Adds shared `create_generation_mask` and updates attention doc wording to match implementation.
oar-ocr-core/src/utils/transform.rs	Updates perspective warp docs to match bicubic interpolation behavior.
oar-ocr-core/src/utils/topk.rs	Updates docs to match Unknown class-name formatting behavior.
oar-ocr-core/src/utils/tensor.rs	Updates module docs to reflect that no helpers are currently exported.
oar-ocr-core/src/processors/uvdoc_postprocess.rs	Removes unused transformation helpers.
oar-ocr-core/src/processors/unimernet_preprocess.rs	Updates padding comment to match actual normalized padding fill behavior.
oar-ocr-core/src/processors/types.rs	Doc cleanup around color channel ordering description.
oar-ocr-core/src/processors/normalization.rs	Updates docs to reflect default RGB channel order.
oar-ocr-core/src/processors/layout_sorting.rs	Updates comment to match the new distance threshold constant.
oar-ocr-core/src/processors/layout_postprocess.rs	Fixes argmax seed for negative scores, adds PP-DocLayout shape guards, and adds regression tests.
oar-ocr-core/src/processors/geometry.rs	Clarifies IoU commentary (AABB approximation rationale).
oar-ocr-core/src/models/rectification/uvdoc.rs	Removes outdated/duplicative comment about BGR handling.
oar-ocr-core/src/models/recognition/unimernet.rs	Ensures UniMERNet model is identified via `model_name` in inference config.
oar-ocr-core/src/models/recognition/pp_formulanet.rs	Adds CUDA arena strategy tuning and ensures PP-FormulaNet model is identified via `model_name`.
oar-ocr-core/src/domain/structure.rs	Updates heading-formatting docs to match the current API semantics.
oar-ocr-core/src/domain/adapters/text_recognition_adapter.rs	Adds model-name override support for adapter identification.
oar-ocr-core/src/domain/adapters/text_detection_adapter.rs	Adds model-name override support for adapter identification.
oar-ocr-core/src/domain/adapters/layout_detection_adapter.rs	Updates PP-DocLayout builder docs to match normalized/preset naming expectations.
oar-ocr-core/src/domain/adapters/formula_recognition_adapter.rs	Condenses explanation of why confidence/thresholding is not applied.
oar-ocr-core/src/core/traits/task_def.rs	Doc cleanup to match current macro/registry wording.
oar-ocr-core/src/core/traits/granular.rs	Doc cleanup for the granular trait module summary.
oar-ocr-core/src/core/mod.rs	Removes stale commented re-exports and outdated note about tracing init location.
oar-ocr-core/src/core/inference/ort_infer_builders.rs	Extracts and exposes `ensure_cuda_launch_blocking()` for early, pipeline-wide use.
oar-ocr-core/src/core/inference/mod.rs	Re-exports `ensure_cuda_launch_blocking`.
oar-ocr-core/src/core/constants.rs	Comment simplifications for default constants.
oar-ocr-core/src/core/config/model_input.rs	Corrects dynamic-dimension documentation and clarifies default shape semantics.
oar-ocr-core/src/core/config/errors.rs	Doc cleanup for config validation errors.
oar-ocr-core/src/core/config/builder.rs	Removes redundant comment on `Default` impl.
oar-ocr-core/src/core/batch/dynamic/types.rs	Fixes comment to reflect `(width, height)` ordering.
examples/utils/markdown.rs	Removes stale trailing comment.
examples/table_structure_recognition.rs	Improves CLI docs and examples section wording.
examples/table_cell_detection.rs	Tightens example description wording.
examples/formula_recognition.rs	Fixes redundant phrasing in example docs.
examples/document_rectification.rs	Removes unused/obsolete input shape override CLI options and related logging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…esult processing

GreatV · 2026-06-04T01:54:16Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors and enhances the OCR pipeline, notably introducing a per-step attention mask to handle left-padded batches during autoregressive decoding, configuring PP-FormulaNet's CUDA BFC arena to grow tightly to prevent OOM errors, and modifying TableAnalyzer to surface errors rather than emitting stub results. Feedback on these changes highlights a compilation issue on stable Rust due to the use of unstable let-chains in pp_formulanet.rs. Additionally, suggestions were made to improve the robustness of analyze_tables by skipping failed tables instead of aborting the entire page analysis, and to use more idiomatic tensor creation methods (Tensor::zeros and Tensor::full) in attention.rs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…ration_mask

GreatV · 2026-06-04T08:27:57Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors several components of the OCR pipeline, including sharing the create_generation_mask helper across models, introducing model name overrides for identification in logs, configuring the CUDA arena growth strategy for PP-FormulaNet to prevent VRAM ballooning, and updating TableAnalyzer to surface errors instead of emitting stub results. It also fixes an argmax issue with negative scores in LayoutPostProcess and adds shape validation guards. Feedback on the changes highlights that the use of unstable let_chains syntax in pp_formulanet.rs will fail on stable Rust, the attention mask value of -1e9_f32 in attention.rs will overflow to -inf in f16 precision, and removing the binary-splitting retry logic in structure.rs reduces robustness by skipping entire batches upon a single crop failure.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Copilot

Pull request overview

Copilot reviewed 46 out of 47 changed files in this pull request and generated 3 comments.

fix: stabilize formula/layout inference, dedup VL masks, and tidy co…

f794e78

…de & docs

GreatV requested a review from Copilot June 3, 2026 14:16

Copilot started reviewing on behalf of GreatV June 3, 2026 14:17 View session

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread oar-ocr-core/src/models/recognition/pp_formulanet.rs

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread src/oarocr/table_analyzer.rs Outdated

refactor(table_analyzer): improve error handling and optimize table r…

e2a3ae8

…esult processing

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread oar-ocr-core/src/models/recognition/pp_formulanet.rs

Comment thread src/oarocr/table_analyzer.rs

Comment thread oar-ocr-vl/src/attention.rs Outdated

refactor(attention): simplify mask tensor construction in create_gene…

32629e8

…ration_mask

GreatV requested a review from Copilot June 4, 2026 08:28

Copilot started reviewing on behalf of GreatV June 4, 2026 08:28 View session

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread oar-ocr-core/src/models/recognition/pp_formulanet.rs

Comment thread oar-ocr-vl/src/attention.rs Outdated

Comment thread src/oarocr/structure.rs

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Comment thread src/oarocr/table_analyzer.rs Outdated

Comment thread src/oarocr/table_analyzer.rs Outdated

Comment thread src/oarocr/structure.rs

GreatV added 2 commits June 4, 2026 08:55

fix the CI preflight checks

2a00939

fix the CI preflight checks

3881e05

GreatV merged commit 0516248 into main Jun 4, 2026
7 checks passed

GreatV deleted the tidycode branch June 4, 2026 09:28

Conversation

GreatV commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

GreatV commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GreatV commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants