【PaddlePaddle Hackathon 10】Add Doc2Prototype OpenVINO demo#548
【PaddlePaddle Hackathon 10】Add Doc2Prototype OpenVINO demo#548Dryoung95 wants to merge 8 commits into
Conversation
|
This PR is submitted for PaddlePaddle Hackathon 10 Intel advanced task. It adds the Doc2Prototype MVP demo:
Validated locally with the commands listed in the PR description. |
|
@Dryoung95 This looks really interesting. I had one question though: what happens if the input image is blurry, low quality, or OCR is not able to extract much text from it? Does the demo handle such cases gracefully, or show any specific error message? It might be helpful to mention these scenarios in the documentation as well since new users may run into them while testing. |
|
Thanks, good point. The demo should not crash in that case. If OCR extracts very little text, it still writes the normal outputs, but |
|
@Dryoung95 Thanks for the clarification and for adding the README note. I was also wondering, would it make sense to show a warning in the console output or visual report when very little text is extracted? I feel that could help users quickly understand that the input quality might be affecting the results, especially when testing the demo for the first time. |
|
Yes, agreed. I added this in |
Description
This PR adds a Doc2Prototype MVP demo for the Intel/OpenVINO PaddlePaddle Hackathon 10 task.
The demo implements a reproducible document-understanding-to-downstream-processing workflow:
document / diagram image -> PaddleOCR-VL deployed with OpenVINO -> structured JSON -> downstream Agent/Coder workflow -> generated prototype artifact -> visual reportPrimary scenarios:
Requirement Alignment
demos/doc2prototype_demo.Downstream Agent / Coder Flow
The structured output is passed through
downstream_agent.py:PlannerAgentbuilds the downstream generation plan from structured JSON.GeneratorAgentcreates the downstream artifact.ReviewAgentchecks whether the generated artifact covers extracted endpoints, nodes, edges, or document sections.Coder backend policy:
--code-model-backend openvinowithOpenVINO/Qwen2.5-Coder-0.5B-Instruct-int4-ov.--code-model-backend hf.Verified OpenVINO Coder path:
python -c "from code_generator import download_openvino_code_model; print(download_openvino_code_model())" python main.py examples/api_doc_sample.png --task api_doc --device CPU --code-model-path _models/OpenVINO/Qwen2.5-Coder-0.5B-Instruct-int4-ov --code-model-backend openvino --code-max-new-tokens 768 --output-dir outputs/mvp_api_image_ov_coderResult:
Effect Display
API document image -> deterministic FastAPI skeleton:
This image shows
examples/api_doc_sample.pngparsed by PaddleOCR-VL with OpenVINO. The structured JSON contains five endpoints. The downstream Agent checks that all five extracted endpoints are represented in the generated FastAPI skeleton. The timing chart separates model load, OpenVINO inference, structure extraction, and generation time. The layout overlay marks detected text regions, and the heatmap shows text-density concentration.API document image -> OpenVINO Coder model:
This image uses the same API input but switches downstream generation to
OpenVINO/Qwen2.5-Coder-0.5B-Instruct-int4-ovthrough--code-model-backend openvino. A correct run should showOpenVINO: True,Backend: OpenVINO Coder model inside agent workflow, five extracted endpoints, and Agent review statuspass.Flowchart image -> Mermaid diagram:
This image shows
examples/flowchart_sample.pngparsed by PaddleOCR-VL with OpenVINO. The structured JSON contains six nodes and five directed edges. The downstream Agent generates a Mermaid diagram, and the review passes when all extracted nodes are represented in the generated diagram.Reproduction
From
demos/doc2prototype_demo:python3 -m venv venv source venv/bin/activate python -m pip install --upgrade pip pip install -r requirements.txt python prepare_model.py --device CPU python scripts/make_sample_images.py python main.py examples/api_doc_sample.png --task api_doc --device CPU --output-dir outputs/mvp_api_image_smoke python main.py examples/flowchart_sample.png --task flowchart --device CPU --output-dir outputs/mvp_flow_image_smokeExpected smoke results:
generated_api.pygenerated_flowchart.mmdIntel Hardware Scope
Validated on a local Core Ultra / RTX 5070 Ti laptop environment, but the benchmark and recommendation focus remains Intel/OpenVINO:
CPU: primary reproducible path for this PR.GPU.0: Intel iGPU path, available for optional Intel GPU validation.NPUandAUTO: visible but currently limited by the stateful/dynamic-shape LLM path in the PaddleOCR-VL export, so they are documented as limitations rather than successful benchmarks.GPU.1: NVIDIA dGPU on this machine; not used as a project highlight or primary benchmark for the Intel/OpenVINO task.If final validation needs to match the provided GMK Intel Core Ultra mini PC more closely, I can migrate the same branch and commands back to that device and report CPU / Intel iGPU / NPU behavior there.
Artifacts
Tracked smoke reports:
demos/doc2prototype_demo/outputs/mvp_api_image_smoke/visual_report.htmldemos/doc2prototype_demo/outputs/mvp_api_image_smoke/agent_review.mddemos/doc2prototype_demo/outputs/mvp_flow_image_smoke/visual_report.htmldemos/doc2prototype_demo/outputs/mvp_flow_image_smoke/agent_review.mdScreenshots:
demos/doc2prototype_demo/assets/doc2prototype_api_report.pngdemos/doc2prototype_demo/assets/doc2prototype_api_openvino_coder_report.pngdemos/doc2prototype_demo/assets/doc2prototype_flowchart_report.png