Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces several improvements, refactorings, and bug fixes across the OCR pipeline. Key changes include setting up CUDA_LAUNCH_BLOCKING=1 up front to prevent CUDA EP arena buffer races in PP-FormulaNet, introducing a shared create_generation_mask helper to mask out left-padding positions during autoregressive decoding, and removing stub table results in favor of surfacing errors. Additionally, the PR fixes a bug in PicoDet argmax initialization by seeding with f32::NEG_INFINITY instead of 0.0f32, guards against under-width outputs in PP-DocLayout to prevent panics, and cleans up various unused methods and comments. Feedback on the changes highlights the use of unstable let-chains in pp_formulanet.rs, which should be rewritten with nested if statements to ensure compatibility with stable Rust.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Pull request overview
This PR tightens and stabilizes several OCR pipeline behaviors (notably table analysis, model preset inference, and CUDA formula decoding), deduplicates VLM generation masking logic across models, and cleans up a broad set of comments/docs to better reflect current behavior.
Changes:
- Table analysis now surfaces errors instead of emitting stub/placeholder
TableResults, and the public API is simplified to accept only the page image + layout elements. - Structure builder/model plumbing is made more robust: layout model preset matching is normalized, adapters are labeled with caller-provided model names, and a CUDA workaround is applied early for PP-FormulaNet (plus CUDA arena tuning).
- VLM decoding gains a shared left-padding generation mask helper; layout postprocess gets additional guards/tests to prevent panics and improve argmax correctness.
Reviewed changes
Copilot reviewed 46 out of 47 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/oarocr/table_analyzer.rs | Removes formula/text-region inputs, removes stub tables, and surfaces structure/cell failures as errors. |
| src/oarocr/structure.rs | Normalizes layout preset parsing, sets CUDA workaround early for formula on CUDA, and threads model-name metadata into adapters. |
| src/oarocr/stitching.rs | Clarifies which layout labels are excluded from OCR matching and how formula exclusion is controlled. |
| src/oarocr/result.rs | Minor doc/comment cleanup on OCR result fields. |
| src/oarocr/ocr.rs | Docstring argument ordering/wording cleanup for word-box helpers. |
| src/lib.rs | Fixes doc comment formatting/indentation in the prelude docs. |
| oar-ocr-vl/src/paddleocr_vl/model.rs | Adds per-step generation mask to avoid attending to left padding in KV cache during decode. |
| oar-ocr-vl/src/paddleocr_vl/ernie.rs | Condenses KV-cache comment to a clearer, single rationale. |
| oar-ocr-vl/src/mineru/model.rs | Deduplicates generation-mask logic by using the shared attention helper. |
| oar-ocr-vl/src/hunyuanocr/vision.rs | Removes unused/dead-code helper. |
| oar-ocr-vl/src/hunyuanocr/model.rs | Adds per-step generation mask to avoid attending to left padding during decode; comment cleanup. |
| oar-ocr-vl/src/hsd/verify.rs | Clarifies position-id formula for HSD verification. |
| oar-ocr-vl/src/glmocr/vision.rs | Deduplicates RoPE inv_freq computation via shared helper. |
| oar-ocr-vl/src/attention.rs | Adds shared create_generation_mask and updates attention doc wording to match implementation. |
| oar-ocr-core/src/utils/transform.rs | Updates perspective warp docs to match bicubic interpolation behavior. |
| oar-ocr-core/src/utils/topk.rs | Updates docs to match Unknown class-name formatting behavior. |
| oar-ocr-core/src/utils/tensor.rs | Updates module docs to reflect that no helpers are currently exported. |
| oar-ocr-core/src/processors/uvdoc_postprocess.rs | Removes unused transformation helpers. |
| oar-ocr-core/src/processors/unimernet_preprocess.rs | Updates padding comment to match actual normalized padding fill behavior. |
| oar-ocr-core/src/processors/types.rs | Doc cleanup around color channel ordering description. |
| oar-ocr-core/src/processors/normalization.rs | Updates docs to reflect default RGB channel order. |
| oar-ocr-core/src/processors/layout_sorting.rs | Updates comment to match the new distance threshold constant. |
| oar-ocr-core/src/processors/layout_postprocess.rs | Fixes argmax seed for negative scores, adds PP-DocLayout shape guards, and adds regression tests. |
| oar-ocr-core/src/processors/geometry.rs | Clarifies IoU commentary (AABB approximation rationale). |
| oar-ocr-core/src/models/rectification/uvdoc.rs | Removes outdated/duplicative comment about BGR handling. |
| oar-ocr-core/src/models/recognition/unimernet.rs | Ensures UniMERNet model is identified via model_name in inference config. |
| oar-ocr-core/src/models/recognition/pp_formulanet.rs | Adds CUDA arena strategy tuning and ensures PP-FormulaNet model is identified via model_name. |
| oar-ocr-core/src/domain/structure.rs | Updates heading-formatting docs to match the current API semantics. |
| oar-ocr-core/src/domain/adapters/text_recognition_adapter.rs | Adds model-name override support for adapter identification. |
| oar-ocr-core/src/domain/adapters/text_detection_adapter.rs | Adds model-name override support for adapter identification. |
| oar-ocr-core/src/domain/adapters/layout_detection_adapter.rs | Updates PP-DocLayout builder docs to match normalized/preset naming expectations. |
| oar-ocr-core/src/domain/adapters/formula_recognition_adapter.rs | Condenses explanation of why confidence/thresholding is not applied. |
| oar-ocr-core/src/core/traits/task_def.rs | Doc cleanup to match current macro/registry wording. |
| oar-ocr-core/src/core/traits/granular.rs | Doc cleanup for the granular trait module summary. |
| oar-ocr-core/src/core/mod.rs | Removes stale commented re-exports and outdated note about tracing init location. |
| oar-ocr-core/src/core/inference/ort_infer_builders.rs | Extracts and exposes ensure_cuda_launch_blocking() for early, pipeline-wide use. |
| oar-ocr-core/src/core/inference/mod.rs | Re-exports ensure_cuda_launch_blocking. |
| oar-ocr-core/src/core/constants.rs | Comment simplifications for default constants. |
| oar-ocr-core/src/core/config/model_input.rs | Corrects dynamic-dimension documentation and clarifies default shape semantics. |
| oar-ocr-core/src/core/config/errors.rs | Doc cleanup for config validation errors. |
| oar-ocr-core/src/core/config/builder.rs | Removes redundant comment on Default impl. |
| oar-ocr-core/src/core/batch/dynamic/types.rs | Fixes comment to reflect (width, height) ordering. |
| examples/utils/markdown.rs | Removes stale trailing comment. |
| examples/table_structure_recognition.rs | Improves CLI docs and examples section wording. |
| examples/table_cell_detection.rs | Tightens example description wording. |
| examples/formula_recognition.rs | Fixes redundant phrasing in example docs. |
| examples/document_rectification.rs | Removes unused/obsolete input shape override CLI options and related logging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors and enhances the OCR pipeline, notably introducing a per-step attention mask to handle left-padded batches during autoregressive decoding, configuring PP-FormulaNet's CUDA BFC arena to grow tightly to prevent OOM errors, and modifying TableAnalyzer to surface errors rather than emitting stub results. Feedback on these changes highlights a compilation issue on stable Rust due to the use of unstable let-chains in pp_formulanet.rs. Additionally, suggestions were made to improve the robustness of analyze_tables by skipping failed tables instead of aborting the entire page analysis, and to use more idiomatic tensor creation methods (Tensor::zeros and Tensor::full) in attention.rs.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors several components of the OCR pipeline, including sharing the create_generation_mask helper across models, introducing model name overrides for identification in logs, configuring the CUDA arena growth strategy for PP-FormulaNet to prevent VRAM ballooning, and updating TableAnalyzer to surface errors instead of emitting stub results. It also fixes an argmax issue with negative scores in LayoutPostProcess and adds shape validation guards. Feedback on the changes highlights that the use of unstable let_chains syntax in pp_formulanet.rs will fail on stable Rust, the attention mask value of -1e9_f32 in attention.rs will overflow to -inf in f16 precision, and removing the binary-splitting retry logic in structure.rs reduces robustness by skipping entire batches upon a single crop failure.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
No description provided.