【PaddlePaddle Hackathon 10】Add Doc2Prototype OpenVINO demo by Dryoung95 · Pull Request #548 · openvinotoolkit/openvino_build_deploy

Dryoung95 · 2026-05-15T19:56:03Z

Description

This PR adds a Doc2Prototype MVP demo for the Intel/OpenVINO PaddlePaddle Hackathon 10 task.

The demo implements a reproducible document-understanding-to-downstream-processing workflow:

document / diagram image -> PaddleOCR-VL deployed with OpenVINO -> structured JSON -> downstream Agent/Coder workflow -> generated prototype artifact -> visual report

Primary scenarios:

API documentation image -> endpoint JSON -> generated FastAPI skeleton
Flowchart image -> node/edge JSON -> generated Mermaid diagram
Technical/reference document image -> structured sections -> Markdown summary

Requirement Alignment

Adds a standalone demo under demos/doc2prototype_demo.
Uses OpenVINO IR deployment for PaddleOCR-VL inference.
Shows the complete handoff from document/visual understanding to downstream intelligent processing.
Recommends OpenVINO GenAI for the real local Coder model path.
Keeps deterministic generation as the fast reviewer smoke path.
Provides source code, README, pinned dependencies, model preparation commands, example inputs, tracked smoke outputs, screenshots, and run metadata.

Downstream Agent / Coder Flow

The structured output is passed through downstream_agent.py:

PlannerAgent builds the downstream generation plan from structured JSON.
GeneratorAgent creates the downstream artifact.
ReviewAgent checks whether the generated artifact covers extracted endpoints, nodes, edges, or document sections.

Coder backend policy:

Recommended real local Coder path: --code-model-backend openvino with OpenVINO/Qwen2.5-Coder-0.5B-Instruct-int4-ov.
Fast default reviewer path: deterministic template backend, no second model download required.
Fallback/comparison path: HuggingFace backend with --code-model-backend hf.

Verified OpenVINO Coder path:

python -c "from code_generator import download_openvino_code_model; print(download_openvino_code_model())"
python main.py examples/api_doc_sample.png --task api_doc --device CPU --code-model-path _models/OpenVINO/Qwen2.5-Coder-0.5B-Instruct-int4-ov --code-model-backend openvino --code-max-new-tokens 768 --output-dir outputs/mvp_api_image_ov_coder

Result:

Scenario	Parser backend	Coder backend	Device	Structured output	Agent review	Total
API image + OpenVINO Coder	PaddleOCR-VL OpenVINO	OpenVINO GenAI Coder	CPU	5 endpoints	pass	27.162 s

Effect Display

API document image -> deterministic FastAPI skeleton:

This image shows examples/api_doc_sample.png parsed by PaddleOCR-VL with OpenVINO. The structured JSON contains five endpoints. The downstream Agent checks that all five extracted endpoints are represented in the generated FastAPI skeleton. The timing chart separates model load, OpenVINO inference, structure extraction, and generation time. The layout overlay marks detected text regions, and the heatmap shows text-density concentration.

API document image -> OpenVINO Coder model:

This image uses the same API input but switches downstream generation to OpenVINO/Qwen2.5-Coder-0.5B-Instruct-int4-ov through --code-model-backend openvino. A correct run should show OpenVINO: True, Backend: OpenVINO Coder model inside agent workflow, five extracted endpoints, and Agent review status pass.

Flowchart image -> Mermaid diagram:

This image shows examples/flowchart_sample.png parsed by PaddleOCR-VL with OpenVINO. The structured JSON contains six nodes and five directed edges. The downstream Agent generates a Mermaid diagram, and the review passes when all extracted nodes are represented in the generated diagram.

Reproduction

From demos/doc2prototype_demo:

python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
python prepare_model.py --device CPU
python scripts/make_sample_images.py
python main.py examples/api_doc_sample.png --task api_doc --device CPU --output-dir outputs/mvp_api_image_smoke
python main.py examples/flowchart_sample.png --task flowchart --device CPU --output-dir outputs/mvp_flow_image_smoke

Expected smoke results:

Scenario	OpenVINO	Device	Structured output	Downstream artifact	Agent review
API document image	yes	CPU	5 endpoints	`generated_api.py`	pass
Flowchart image	yes	CPU	6 nodes / 5 edges	`generated_flowchart.mmd`	pass

Intel Hardware Scope

Validated on a local Core Ultra / RTX 5070 Ti laptop environment, but the benchmark and recommendation focus remains Intel/OpenVINO:

CPU: primary reproducible path for this PR.
GPU.0: Intel iGPU path, available for optional Intel GPU validation.
NPU and AUTO: visible but currently limited by the stateful/dynamic-shape LLM path in the PaddleOCR-VL export, so they are documented as limitations rather than successful benchmarks.
GPU.1: NVIDIA dGPU on this machine; not used as a project highlight or primary benchmark for the Intel/OpenVINO task.

If final validation needs to match the provided GMK Intel Core Ultra mini PC more closely, I can migrate the same branch and commands back to that device and report CPU / Intel iGPU / NPU behavior there.

Artifacts

Tracked smoke reports:

demos/doc2prototype_demo/outputs/mvp_api_image_smoke/visual_report.html
demos/doc2prototype_demo/outputs/mvp_api_image_smoke/agent_review.md
demos/doc2prototype_demo/outputs/mvp_flow_image_smoke/visual_report.html
demos/doc2prototype_demo/outputs/mvp_flow_image_smoke/agent_review.md

Screenshots:

demos/doc2prototype_demo/assets/doc2prototype_api_report.png
demos/doc2prototype_demo/assets/doc2prototype_api_openvino_coder_report.png
demos/doc2prototype_demo/assets/doc2prototype_flowchart_report.png

Dryoung95 · 2026-05-16T07:35:47Z

This PR is submitted for PaddlePaddle Hackathon 10 Intel advanced task.

It adds the Doc2Prototype MVP demo:

PaddleOCR-VL exported to OpenVINO IR
API document and flowchart parsing
structured JSON output
FastAPI / Mermaid prototype generation
static visual reports with timing, layout overlay, and text heatmap

Validated locally with the commands listed in the PR description.

uvv-01 · 2026-05-29T08:04:41Z

@Dryoung95 This looks really interesting. I had one question though: what happens if the input image is blurry, low quality, or OCR is not able to extract much text from it? Does the demo handle such cases gracefully, or show any specific error message? It might be helpful to mention these scenarios in the documentation as well since new users may run into them while testing.

Dryoung95 · 2026-05-29T08:33:39Z

Thanks, good point. The demo should not crash in that case. If OCR extracts very little text, it still writes the normal outputs, but structured.json may have empty fields and agent_review.md will show needs_attention instead of pass. I added a short README note for blurry / low-quality inputs in 536a439.

uvv-01 · 2026-05-29T08:47:32Z

@Dryoung95 Thanks for the clarification and for adding the README note.

I was also wondering, would it make sense to show a warning in the console output or visual report when very little text is extracted? I feel that could help users quickly understand that the input quality might be affecting the results, especially when testing the demo for the first time.

Dryoung95 · 2026-05-29T10:38:03Z

Yes, agreed. I added this in 3fff826: weak OCR runs now print [mvp] warning: lines in the CLI, and visual_report.html shows a Warnings section. I also verified a low-text probe still writes the normal artifacts and marks the review as needs_attention.

Add Doc2Prototype OpenVINO demo

93eabb8

Dryoung95 added 4 commits May 16, 2026 16:41

Document visual report screenshot workflow

a374703

Add official Doc2Prototype reference sample workflow

0b30b17

Add downstream agent workflow and local coder backend

8125539

Add PR evidence and agent report display

c6445eb

Dryoung95 changed the title ~~Add Doc2Prototype OpenVINO demo~~ 【PaddlePaddle Hackathon 10】Add Doc2Prototype OpenVINO demo May 22, 2026

Recommend OpenVINO Coder backend

b45e7ac

Document weak OCR input behavior

536a439

Warn on weak OCR extraction

3fff826

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【PaddlePaddle Hackathon 10】Add Doc2Prototype OpenVINO demo#548

【PaddlePaddle Hackathon 10】Add Doc2Prototype OpenVINO demo#548
Dryoung95 wants to merge 8 commits into
openvinotoolkit:masterfrom
Dryoung95:doc2prototype-mvp

Dryoung95 commented May 15, 2026 •

edited

Loading

Uh oh!

Dryoung95 commented May 16, 2026

Uh oh!

uvv-01 commented May 29, 2026

Uh oh!

Dryoung95 commented May 29, 2026 •

edited

Loading

Uh oh!

uvv-01 commented May 29, 2026

Uh oh!

Dryoung95 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dryoung95 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Requirement Alignment

Downstream Agent / Coder Flow

Effect Display

Reproduction

Intel Hardware Scope

Artifacts

Uh oh!

Dryoung95 commented May 16, 2026

Uh oh!

uvv-01 commented May 29, 2026

Uh oh!

Dryoung95 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uvv-01 commented May 29, 2026

Uh oh!

Dryoung95 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dryoung95 commented May 15, 2026 •

edited

Loading

Dryoung95 commented May 29, 2026 •

edited

Loading