diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md
new file mode 100644
index 00000000..7e8bd25d
--- /dev/null
+++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md
@@ -0,0 +1,82 @@
+### 认领者 GitHub ID
+megemini
+
+### 赛题信息
+
+- **进阶任务序号**：#15
+- **赛题名称**：基于天数智芯硬件与文心多模态模型的创新应用
+- **关联厂商**：天数
+
+### 本周工作
+
+1. **RFC 文档**
+
+   - 已经完成 RFC 文档
+   - AI Studio 地址：https://aistudio.baidu.com/project/edit/10221576
+
+2. **代码实现**
+
+   - 已经完成 AI Studio 项目的 notebook
+   - 已经创建了双卡的天数环境
+
+3. **README**
+
+    - 可以参考 AI Studio 项目的 notebook
+
+4. **演示视频/截图**
+
+    - 待完成
+
+5. **问题与解决**
+
+   - 问题：AI Studio 的 notebook 中无法正常调用 ERNIE-4.5-0.3B-Paddle
+
+    现在有一个很奇怪的问题，AI Studio 的 notebook 中无法 `正常` 调用 ERNIE-4.5-0.3B-Paddle 模型。模型可以正常的运行，但是，输出是 `答非所问` 。
+
+    请看下面的截图，我将 PaddleOCR-VL-1.5 识别的结果手动放入到 prompt 中：
+
+    ![images/cli_prompt.png](images/cli_prompt.png)
+
+    使用命令行调用模型，输出是正常的：
+
+    ![images/cli_ok.png](images/cli_ok.png)
+
+    但是，如果放到 notebook 中，输出就是一长串的空白（空格和回车）！
+
+    我手动将 notebook 中的 prompt 修改为 `你是谁` 测试模型的输出：
+
+    ![images/notebook_prompt.png](images/notebook_input.png)
+
+    输出是一段奇怪的东西：
+
+    ![images/notebook_output.png](images/notebook_output.png)
+
+    有时候还会给我输出一段完形填空题。
+
+    我尝试在 notebook 中进行函数调用，也尝试使用子进行调用，都不行！
+
+    现在附上 notebook 文件 `medical_pipeline_20260503.ipynbS`，可以直接执行。
+
+    另外，还发现个问题，在 AI Studio 中，显存有时无法释放，可以看到截图中，即便什么都没有，现在也被占用了 45% 的显存。我不确定是 AI Studio 的问题，还是 Fastdeploy 配合天数硬件的问题。 请帮忙看一下。
+
+    - 问题：天数的双卡的框架开发环境，只有命令行模式，不能使用 notebook，也不能进行项目公开
+
+    现在的解决方案是，先在单卡环境中调通 notebook，然后再双卡环境中验证 pipeline 是否能够走通。
+
+### 下周计划
+
+1. 调试 notebook
+2. 调试双卡环境
+
+### 当前阻塞（无则填"无"）
+
+- 解决 notebook 中无法正常调用 ERNIE-4.5-0.3B-Paddle 模型的问题
+
+### 交付物进展
+
+| 交付物 | 状态 | 备注 |
+|--------|:----:|------|
+| RFC 文档 | ✅ 已完成 | - |
+| 代码实现 | 🔄  | |
+| README | 🔄  | - |
+| 演示视频/截图 |🔄  | - |
\ No newline at end of file
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.05.08~2026.05.28.md b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.05.08~2026.05.28.md
new file mode 100644
index 00000000..52612ccf
--- /dev/null
+++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.05.08~2026.05.28.md
@@ -0,0 +1,206 @@
+### 认领者 GitHub ID
+megemini
+
+### 赛题信息
+
+- **进阶任务序号**：#15
+- **赛题名称**：基于天数智芯硬件与文心多模态模型的创新应用
+- **关联厂商**：天数
+
+### 本周工作
+
+1. **RFC 文档**
+
+   - 已经完成 RFC 文档
+   - AI Studio 地址：https://aistudio.baidu.com/project/edit/10221576
+
+2. **代码实现**
+
+   - 已经完成 AI Studio 项目的 notebook
+   - 已经创建了双卡的天数环境
+   - 已完成 cli 的脚本，`drug_ocr_cli.py`
+   - 已发布 AI Studio notebook 项目：https://aistudio.baidu.com/projectdetail/10413884
+   > 注意：因为后面提到的 AI Studio 环境问题，此 notebook 的 ERNIE-4.5-0.3B-Paddle 输出混乱，因此，此 notebook 仅作为参考，可在本地最新的天数环境运行调试。
+
+3. **README**
+
+    - 可以参考 AI Studio 项目的 notebook
+
+4. **演示视频/截图**
+
+    - 待完成
+
+5. **问题与解决**
+
+   - 问题：AI Studio 的 notebook 中无法正常调用 ERNIE-4.5-0.3B-Paddle
+
+   解决：经确认，AI Studio 的 notebook 环境有问题，后续使用 cli 的方式
+
+   ![notebook](images/notebook.png)
+
+   - 问题：天数的双卡框架开发环境中不能编译最新的 FastDeploy 版本 https://github.com/PaddlePaddle/FastDeploy/issues/7948
+
+   ```shell
+    /home/aistudio/FastDeploy/custom_ops/build/fastdeploy_ops/temp.linux-x86_64-cpython-310/build/fastdeploy_ops/temp.linux-x86_64-cpython-310/iluvatar_ops/runtime/iluvatar_context.o is compiled
+    /home/aistudio/FastDeploy/custom_ops/iluvatar_ops/paged_attn.cu:199:37: error: no matching constructor for initialization of 'PageAttentionWithKVCacheArguments'
+    199 |   PageAttentionWithKVCacheArguments args{
+        |                                     ^   ~
+    200 |       static_cast<float>(scale),
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~
+    201 |       1.0,
+        |       ~~~~
+    202 |       1.0,
+        |       ~~~~
+    203 |       static_cast<float>(softcap),
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    204 |       window_left,
+        |       ~~~~~~~~~~~~
+    205 |       window_right,
+        |       ~~~~~~~~~~~~~
+    206 |       causal,
+        |       ~~~~~~~
+    207 |       use_sqrt_alibi,
+        |       ~~~~~~~~~~~~~~~
+    208 |       enable_cuda_graph,
+        |       ~~~~~~~~~~~~~~~~~~
+    209 |       false,
+        |       ~~~~~~
+    210 |       alibi_slopes_ptr,
+        |       ~~~~~~~~~~~~~~~~~
+    211 |       key_ptr,
+        |       ~~~~~~~~
+    212 |       value_ptr,
+        |       ~~~~~~~~~~
+    213 |       workspace_ptr,
+        |       ~~~~~~~~~~~~~~
+    214 |       merged_qkv,
+        |       ~~~~~~~~~~~
+    /usr/local/corex-4.3.8/include/ixinfer.h:3699:3: note: candidate constructor not viable: requires at most 27 arguments, but 28 were provided
+    3699 |   PageAttentionWithKVCacheArguments(
+        |   ^
+    3700 |       float scale = 1.f, float k_scale = 1.f, float v_scale = 1.f,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3701 |       float softcap = 0.f, int window_size_left = -1,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3702 |       int window_size_right = -1, bool is_causal = false,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3703 |       bool alibi_sqrt = false, bool enable_cuda_graph = false,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3704 |       bool is_bbhh = false, const float *alibi_slopes_ptr = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3705 |       const void *key = nullptr, const void *value = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3706 |       void *workspace = nullptr, bool merge_qkv = false,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3707 |       const float *rope_sin = nullptr, const float *rope_cos = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3708 |       const float *qScalePtr = nullptr, const float *kScalePtr = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3709 |       const float *vScalePtr = nullptr, const float *kScaleVec = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3710 |       int qLength = 1, int keyStride = 0, int valueStride = 0,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3711 |       const void *aux = nullptr, const size_t rope_batch_stride = 0,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3712 |       const cuinferAttentionRopeMode_t rope_type = CUINFER_ATTEN_NORMAL)
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    /usr/local/corex-4.3.8/include/ixinfer.h:3666:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 28 were provided
+    3666 | struct PageAttentionWithKVCacheArguments {
+        |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    /usr/local/corex-4.3.8/include/ixinfer.h:3666:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 28 were provided
+    3666 | struct PageAttentionWithKVCacheArguments {
+        |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    /home/aistudio/FastDeploy/custom_ops/iluvatar_ops/mixed_fused_attn.cu:269:37: error: no matching constructor for initialization of 'PageAttentionWithKVCacheArguments'
+    269 |   PageAttentionWithKVCacheArguments args{
+        |                                     ^   ~
+    270 |       static_cast<float>(scale),
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~
+    271 |       1.0,
+        |       ~~~~
+    272 |       1.0,
+        |       ~~~~
+    273 |       static_cast<float>(softcap),
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    274 |       window_left,
+        |       ~~~~~~~~~~~~
+    275 |       window_right,
+        |       ~~~~~~~~~~~~~
+    276 |       causal,
+        |       ~~~~~~~
+    277 |       use_sqrt_alibi,
+        |       ~~~~~~~~~~~~~~~
+    278 |       enable_cuda_graph,
+        |       ~~~~~~~~~~~~~~~~~~
+    279 |       false,
+        |       ~~~~~~
+    280 |       nullptr,
+        |       ~~~~~~~~
+    281 |       decode_qkv_ptr,
+        |       ~~~~~~~~~~~~~~~
+    282 |       decode_qkv_ptr,
+        |       ~~~~~~~~~~~~~~~
+    283 |       decode_workspace_ptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~
+    284 |       true,
+        |       ~~~~~
+    /usr/local/corex-4.3.8/include/ixinfer.h:3699:3: note: candidate constructor not viable: requires at most 27 arguments, but 28 were provided
+    3699 |   PageAttentionWithKVCacheArguments(
+        |   ^
+    3700 |       float scale = 1.f, float k_scale = 1.f, float v_scale = 1.f,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3701 |       float softcap = 0.f, int window_size_left = -1,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3702 |       int window_size_right = -1, bool is_causal = false,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3703 |       bool alibi_sqrt = false, bool enable_cuda_graph = false,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3704 |       bool is_bbhh = false, const float *alibi_slopes_ptr = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3705 |       const void *key = nullptr, const void *value = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3706 |       void *workspace = nullptr, bool merge_qkv = false,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3707 |       const float *rope_sin = nullptr, const float *rope_cos = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3708 |       const float *qScalePtr = nullptr, const float *kScalePtr = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3709 |       const float *vScalePtr = nullptr, const float *kScaleVec = nullptr,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3710 |       int qLength = 1, int keyStride = 0, int valueStride = 0,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3711 |       const void *aux = nullptr, const size_t rope_batch_stride = 0,
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    3712 |       const cuinferAttentionRopeMode_t rope_type = CUINFER_ATTEN_NORMAL)
+        |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    /usr/local/corex-4.3.8/include/ixinfer.h:3666:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 28 were provided
+    3666 | struct PageAttentionWithKVCacheArguments {
+        |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    /usr/local/corex-4.3.8/include/ixinfer.h:3666:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 28 were provided
+    3666 | struct PageAttentionWithKVCacheArguments {
+        |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    1 error generated when compiling for ivcore11.
+    /home/aistudio/FastDeploy/custom_ops/iluvatar_ops/paged_attn.cu compile failed, command '/usr/local/corex/bin/clang++' failed with exit code 1
+    /home/aistudio/FastDeploy/custom_ops/build/fastdeploy_ops/temp.linux-x86_64-cpython-310/build/fastdeploy_ops/temp.linux-x86_64-cpython-310/iluvatar_ops/paged_attn.cu.o is compiled
+    1 error generated when compiling for ivcore11.
+    /home/aistudio/FastDeploy/custom_ops/iluvatar_ops/mixed_fused_attn.cu compile failed, command '/usr/local/corex/bin/clang++' failed with exit code 1
+
+   ```
+
+   解决：使用 commit： 172ab6020dbe1ccb730f09df74764d6ea388d88f 重新编译
+
+### 下周计划
+
+1. 调试双卡环境
+
+### 当前阻塞（无则填"无"）
+
+- 重新编译 FastDeploy
+
+### 交付物进展
+
+| 交付物 | 状态 | 备注 |
+|--------|:----:|------|
+| RFC 文档 | ✅ 已完成 | - |
+| 代码实现 | 🔄  | |
+| README | 🔄  | - |
+| 演示视频/截图 |🔄  | - |
\ No newline at end of file
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.05.28~2026.06.11.md b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.05.28~2026.06.11.md
new file mode 100644
index 00000000..f3d3d44a
--- /dev/null
+++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.05.28~2026.06.11.md
@@ -0,0 +1,72 @@
+### 认领者 GitHub ID
+megemini
+
+### 赛题信息
+
+- **进阶任务序号**：#15
+- **赛题名称**：基于天数智芯硬件与文心多模态模型的创新应用
+- **关联厂商**：天数
+
+### 本周工作
+
+1. **RFC 文档**
+
+   - 已经完成 RFC 文档
+   - AI Studio 地址：https://aistudio.baidu.com/project/edit/10221576
+
+2. **代码实现**
+
+   - 已经完成 AI Studio 项目的 notebook
+   - 已经创建了双卡的天数环境
+   - 已完成 cli 的脚本，`drug_ocr_cli.py`
+      针对最新实现的脚本，有两处大的改进：
+      1. 增加了 patch aistudio sdk 的函数，原因是，paddlespeech 与 fastdeploy 对于 aistudio sdk 使用的版本不一样，因此，需要先利用脚本修改源码，将其统一
+      2. 增加了针对 tts 合成的音频文件进行音量修改的函数，原因是，天数的框架开发环境的音频编码库好像与 paddlespeech 有点兼容问题，导致合成的音频会出现截止的情况，因此，与 paddlespeech 的研发讨论后，决定增加这个后处理的函数。
+   - 已发布 AI Studio notebook 项目：https://aistudio.baidu.com/projectdetail/10413884
+   > 注意：因为后面提到的 AI Studio 环境问题，此 notebook 的 ERNIE-4.5-0.3B-Paddle 输出混乱，因此，此 notebook 仅作为参考，可在本地最新的天数环境运行调试。
+   - 已经在天数双卡环境中验证了 `tensor_parallel_size` 为 `2` 时，可以加载与使用 `ERNIE-4.5-VL-28B-A3B-Thinking` 模型，单卡占用显存约 `20G` ，共计 `40G` 显存的占用。
+
+3. **README**
+
+    - 可以参考 AI Studio 项目的 notebook
+
+4. **演示视频/截图**
+
+    ![ernie28b](images/ernie_28b.png)
+
+5. **问题与解决**
+
+   - 问题：天数的框架开发环境极度不稳定，导致没有办法持续的验证优化后的脚本。
+
+   天数的框架开发环境好像是共享的模式，所谓的启动、关闭，只是用于控制用户是否可以 ssh 远程连接到服务器。这就导致了，经常出现：
+
+   - 突然被断开连接，踢出了环境
+   - 新连接的环境，显存已经被占用满了
+   - 运行过程中提示硬盘没有空间了，实际上 aistudio 的工作目录只有 76G 的文件（包括模型文件等）
+   - 运行过程中提示识别不到模型，实际上模型没有问题，可能再运行一次就好了
+   - 运行过程中加载模型很慢，有的时候要将近10分钟才能加载完 `ERNIE-4.5-VL-28B-A3B-Thinking`
+   - 加载完模型后，输出 token 到一半就卡住了，再运行一次可能又会在其他地方卡住
+   - ixsmi 命令有时候不能反应当前环境的显存使用情况，比如，模型都加载完了，还显示只有 64MB 的显存占用
+
+   以上只是这两周遇到的部分环境问题，导致，从上周开始调试到现在，只有 `2` 次能够完整的运行完脚本，其他时间都是不断的被各种情况打断。
+
+   目前的状态是：脚本应该没有什么问题，但是，还需要再至少完整的运行完一次，从而抓取完整的日志。
+
+   周报目录中的 `drug_ernie03.log` 是使用 `ERNIE-4.5-0.3B-Paddle` 完整运行后的日志，`ERNIE-4.5-VL-28B-A3B-Thinking` 也完整运行过一次，不过，当时并没有注意到这个模型会先输出 thinking 部分，导致最终输出的 token 不够，因此，最新的 `drug_ocr_cli.py` 脚本已经进行了修改，但是，到目前为止，整整一周多的时间都没有再完整的运行过一次。
+
+### 下周计划
+
+1. 调试双卡环境
+
+### 当前阻塞（无则填"无"）
+
+- AI Studio 环境极度不稳定
+
+### 交付物进展
+
+| 交付物 | 状态 | 备注 |
+|--------|:----:|------|
+| RFC 文档 | ✅ 已完成 | - |
+| 代码实现 | ✅ 已完成 | |
+| README | ✅ 已完成 | - |
+| 演示视频/截图 | ✅ 已完成 | - |
\ No newline at end of file
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/drug_ernie03.log b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/drug_ernie03.log
new file mode 100644
index 00000000..e80ec62b
--- /dev/null
+++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/drug_ernie03.log
@@ -0,0 +1,822 @@
+aistudio@ssh-942478-10243234-79b6d74556-jggs6:~$ python drug_ocr_cli.py --image resource/test.jpg --no-split --ocr-tokens 200 --llm-tokens 100
+File already patched.
+20:24:01 [drug_ocr] INFO: ============================================================
+20:24:01 [drug_ocr] INFO: Drug OCR pipeline started (subprocess mode)
+20:24:01 [drug_ocr] INFO:   Image path: resource/test.jpg
+20:24:01 [drug_ocr] INFO:   Image split: False (num_splits=4, overlap=0.10)
+20:24:01 [drug_ocr] INFO: ============================================================
+20:24:01 [drug_ocr] INFO: [OCR Step] Loading image...
+20:24:01 [drug_ocr] INFO: [OCR Step] Image loaded, size: (2014, 2881)
+20:24:01 [drug_ocr] INFO: [OCR Step] Skipping image split
+20:24:03 [drug_ocr] INFO: [OCR Step] Starting OCR subprocess...
+I0601 20:24:06.824122  7196 init.cc:254] ENV [CUSTOM_DEVICE_ROOT]=/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device
+I0601 20:24:06.824187  7196 init.cc:162] Try loading custom device libs from: [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device]
+I0601 20:24:06.954595  7196 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device/libpaddle-iluvatar-gpu.so
+I0601 20:24:06.954654  7196 custom_device_load.cc:58] Skipped lib [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.
+I0601 20:24:06.964550  7196 custom_kernel.cc:68] Succeed in loading 913 custom kernel(s) from loaded lib(s), will be used like native ones.
+I0601 20:24:06.964919  7196 init.cc:174] Finished in LoadCustomDevice with libs_path: [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device]
+I0601 20:24:06.964968  7196 init.cc:260] CustomDevice: iluvatar_gpu, visible devices count: 2
+WARNING  2026-06-01 20:24:19,431 7196  transfer_manager.py[line:30] cupy not available, falling back to synchronous transfers
+[OCR Worker] Loading OCR model (PaddleOCR-VL)...
+WARNING  2026-06-01 20:24:23,436 7196  common.py[line:63] Model path 'baidu/PaddleOCR-VL-1.5' is not a local directory or file, will try to download from huggingface hub.
+WARNING  2026-06-01 20:24:26,471 7196  common.py[line:73] Cannot reach huggingface.co. If the model is stored locally, please check the path 'baidu/PaddleOCR-VL-1.5'. Otherwise check network/proxy settings (DOWNLOAD_SOURCE=huggingface).
+INFO     2026-06-01 20:24:26,764 7196  log.py[line:76] Downloading Model from remote to directory: /home/aistudio/PaddlePaddle/PaddleOCR-VL-1.5
+INFO     2026-06-01 20:24:26,981 7196  log.py[line:76] Got 18 files, start to download ...
+Processing 18 items:   0%|                                                                                                                          | 0.00/18.0 [00:00<?, ?it/s]INFO     2026-06-01 20:24:52,732 7196  log.py[line:76] 
+File added_tokens.json already in cache with identical hash, skip downloading!
+Processing 18 items:   6%|██████▍                                                                                                            | 1.00/18.0 [00:00<00:02, 7.82it/s]INFO     2026-06-01 20:24:52,736 7196  log.py[line:76] 
+File README.md already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:52,739 7196  log.py[line:76] 
+File LICENSE already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:52,827 7196  log.py[line:76] 
+File chat_template.jinja already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:52,923 7196  log.py[line:76] 
+File generation_config.json already in cache with identical hash, skip downloading!
+Processing 18 items:  28%|███████████████████████████████▉                                                                                   | 5.00/18.0 [00:00<00:00, 16.8it/s]INFO     2026-06-01 20:24:53,023 7196  log.py[line:76] 
+File image_processing_paddleocr_vl.py already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:53,123 7196  log.py[line:76] 
+File inference.yml already in cache with identical hash, skip downloading!
+Processing 18 items:  39%|████████████████████████████████████████████▋                                                                      | 7.00/18.0 [00:00<00:00, 13.3it/s]INFO     2026-06-01 20:24:53,213 7196  log.py[line:76] 
+File model.safetensors already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:53,332 7196  log.py[line:76] 
+File modeling_paddleocr_vl.py already in cache with identical hash, skip downloading!
+Processing 18 items:  50%|█████████████████████████████████████████████████████████▌                                                         | 9.00/18.0 [00:00<00:00, 11.8it/s]INFO     2026-06-01 20:24:53,421 7196  log.py[line:76] 
+File preprocessor_config.json already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:53,529 7196  log.py[line:76] 
+File processing_paddleocr_vl.py already in cache with identical hash, skip downloading!
+Processing 18 items:  61%|██████████████████████████████████████████████████████████████████████▎                                            | 11.0/18.0 [00:00<00:00, 11.2it/s]INFO     2026-06-01 20:24:53,623 7196  log.py[line:76] 
+File processor_config.json already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:53,715 7196  log.py[line:76] 
+File special_tokens_map.json already in cache with identical hash, skip downloading!
+Processing 18 items:  72%|███████████████████████████████████████████████████████████████████████████████████                                | 13.0/18.0 [00:01<00:00, 11.0it/s]INFO     2026-06-01 20:24:53,813 7196  log.py[line:76] 
+File tokenizer.json already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:53,852 7196  log.py[line:76] 
+File configuration_paddleocr_vl.py already in cache with identical hash, skip downloading!
+Processing 18 items:  83%|███████████████████████████████████████████████████████████████████████████████████████████████▊                   | 15.0/18.0 [00:01<00:00, 11.9it/s]INFO     2026-06-01 20:24:53,906 7196  log.py[line:76] 
+File tokenizer.model already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:24:53,956 7196  log.py[line:76] 
+File tokenizer_config.json already in cache with identical hash, skip downloading!
+Processing 18 items:  94%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▌      | 17.0/18.0 [00:01<00:00, 13.6it/s]INFO     2026-06-01 20:24:54,339 7196  log.py[line:76] 
+File config.json already in cache with identical hash, skip downloading!
+Processing 18 items: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18.0/18.0 [00:01<00:00, 10.4it/s]
+INFO     2026-06-01 20:24:54,340 7196  log.py[line:76] Download model 'PaddlePaddle/PaddleOCR-VL-1.5' successfully.
+INFO     2026-06-01 20:24:54,340 7196  log.py[line:76] Target directory already exists, skipping creation.
+INFO     2026-06-01 20:24:54,358 7196  args_utils.py[line:658] Parameter `engine_worker_queue_port` is not specified, found available ports for possible use: [46509]
+INFO     2026-06-01 20:24:54,360 7196  args_utils.py[line:658] Parameter `cache_queue_port` is not specified, found available ports for possible use: [59850]
+INFO     2026-06-01 20:24:54,361 7196  args_utils.py[line:658] Parameter `rdma_comm_ports` is not specified, found available ports for possible use: [25285]
+INFO     2026-06-01 20:24:54,362 7196  args_utils.py[line:658] Parameter `pd_comm_port` is not specified, found available ports for possible use: [60151]
+INFO     2026-06-01 20:24:54,363 7196  download.py[line:146] Using download source: huggingface
+INFO     2026-06-01 20:24:54,363 7196  configuration_utils.py[line:1208] Loading configuration file /home/aistudio/PaddlePaddle/PaddleOCR-VL-1.5/config.json
+WARNING  2026-06-01 20:24:54,368 7196  configuration_utils.py[line:1239] You are using a model of type paddleocr_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
+WARNING  2026-06-01 20:24:54,369 7196  configuration_utils.py[line:1239] You are using a model of type paddleocr_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
+INFO     2026-06-01 20:25:00,291 7196  flash_attn_backend.py[line:105] Only support CUDA version flash attention.
+/home/aistudio/.local/lib/python3.10/site-packages/paddle/compat/proxy.py:440: UserWarning: Extending PyTorch compat scope, previous scope: {'triton'}, new scope: {'flash_mla'}.
+/home/aistudio/.local/lib/python3.10/site-packages/fastdeploy/model_executor/graph_optimization/utils.py:21: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
+WARNING  2026-06-01 20:25:01,688 7196  moe.py[line:41] import noaux_tc Failed!
+INFO:legacy.config:The model format is Hugging Face Torch
+INFO:legacy.config:use_sequence_parallel_moe: False
+INFO:legacy.config:cudagraph sizes specified by model runner [1, 2, 4, 8, 8] is overridden by config [8, 1, 2, 4]
+INFO:legacy.config:Doing profile, the total_block_num:528
+INFO:legacy.config:Auto-setting num_max_dispatch_tokens_per_rank from 128 to 8 (max_num_seqs=8).
+INFO:legacy.config:register_info: {'role': 'mixed', 'host_ip': '10.234.11.170', 'port': None, 'metrics_port': None, 'connector_port': 60151, 'rdma_ports': [25285], 'engine_worker_queue_port': 46509, 'device_ids': ['0'], 'transfer_protocol': ['ipc', 'rdma'], 'tp_size': 1, 'is_paused': False, 'version': 'init', 'connected_decodes': []}
+INFO:legacy.config:=================== Configuration Information ===============
+INFO:legacy.config:Model Configuration Information :
+INFO:legacy.config:model               :      /home/aistudio/PaddlePaddle/PaddleOCR-VL-1.5
+INFO:legacy.config:is_quantized        :      False
+INFO:legacy.config:is_moe_quantized    :      False
+INFO:legacy.config:max_model_len       :      8192
+INFO:legacy.config:dtype               :      bfloat16
+INFO:legacy.config:enable_logprob      :      False
+INFO:legacy.config:max_logprobs        :      20
+INFO:legacy.config:logprobs_mode       :      raw_logprobs
+INFO:legacy.config:redundant_experts_num:      0
+INFO:legacy.config:seed                :      0
+INFO:legacy.config:quantization        :      wint8
+INFO:legacy.config:pad_token_id        :      0
+INFO:legacy.config:eos_tokens_lens     :      2
+INFO:legacy.config:lm_head_fp32        :      False
+INFO:legacy.config:moe_gate_fp32       :      False
+INFO:legacy.config:model_format        :      torch
+INFO:legacy.config:runner              :      auto
+INFO:legacy.config:convert             :      auto
+INFO:legacy.config:pooler_config       :      None
+INFO:legacy.config:override_pooler_config:      None
+INFO:legacy.config:revision            :      master
+INFO:legacy.config:prefix_layer_name   :      layers
+INFO:legacy.config:kv_cache_quant_scale_path:      /home/aistudio/PaddlePaddle/PaddleOCR-VL-1.5/kv_cache_scale.json
+INFO:legacy.config:enable_entropy      :      False
+INFO:legacy.config:model_impl          :      auto
+INFO:legacy.config:version             :      init
+INFO:legacy.config:partial_rotary_factor:      1.0
+INFO:legacy.config:num_nextn_predict_layers:      0
+INFO:legacy.config:mm_max_tokens_per_item:      None
+INFO:legacy.config:pretrained_config   :      PretrainedConfig {
+  "architectures": [
+    "PaddleOCRVLForConditionalGeneration"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration_paddleocr_vl.PaddleOCRVLConfig",
+    "AutoModel": "modeling_paddleocr_vl.PaddleOCRVLForConditionalGeneration",
+    "AutoModelForCausalLM": "modeling_paddleocr_vl.PaddleOCRVLForConditionalGeneration"
+  },
+  "compression_ratio": 1.0,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 1024,
+  "ignored_index": -100,
+  "image_token_id": 100295,
+  "intermediate_size": 3072,
+  "max_position_embeddings": 131072,
+  "max_sequence_length": null,
+  "num_attention_heads": 16,
+  "num_hidden_layers": 18,
+  "num_key_value_heads": 2,
+  "pad_token_id": 0,
+  "paddleformers_version": "1.1.1",
+  "rms_norm_eps": 1e-05,
+  "rope_is_neox_style": true,
+  "rope_scaling": {
+    "mrope_section": [
+      16,
+      24,
+      24
+    ],
+    "rope_type": "default",
+    "type": "default"
+  },
+  "rope_theta": 500000,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "use_3d_rope": true,
+  "use_bias": false,
+  "use_cache": false,
+  "use_flash_attention": false,
+  "video_token_id": 101307,
+  "vision_config": {
+    "architectures": [
+      "PaddleOCRVisionModel"
+    ],
+    "attention_dropout": 0.0,
+    "auto_map": {
+      "AutoConfig": "configuration_paddleocr_vl.PaddleOCRVLConfig",
+      "AutoModel": "modeling_paddleocr_vl.PaddleOCRVisionModel"
+    },
+    "hidden_act": "gelu_pytorch_tanh",
+    "hidden_size": 1152,
+    "image_size": 384,
+    "intermediate_size": 4304,
+    "layer_norm_eps": 1e-06,
+    "model_type": "paddleocr_vl",
+    "num_attention_heads": 16,
+    "num_channels": 3,
+    "num_hidden_layers": 27,
+    "pad_token_id": 0,
+    "patch_size": 14,
+    "spatial_merge_size": 2,
+    "temporal_patch_size": 2,
+    "tokens_per_second": 2,
+    "torch_dtype": "bfloat16"
+  },
+  "vision_end_token_id": 101306,
+  "vision_start_token_id": 101305,
+  "vocab_size": 103424,
+  "weight_share_add_bias": true
+}
+
+INFO:legacy.config:architectures       :      ['PaddleOCRVLForConditionalGeneration']
+INFO:legacy.config:attention_probs_dropout_prob:      0.0
+INFO:legacy.config:auto_map            :      {'AutoConfig': 'configuration_paddleocr_vl.PaddleOCRVLConfig', 'AutoModel': 'modeling_paddleocr_vl.PaddleOCRVLForConditionalGeneration', 'AutoModelForCausalLM': 'modeling_paddleocr_vl.PaddleOCRVLForConditionalGeneration'}
+INFO:legacy.config:compression_ratio   :      1.0
+INFO:legacy.config:head_dim            :      128
+INFO:legacy.config:hidden_act          :      silu
+INFO:legacy.config:hidden_dropout_prob :      0.0
+INFO:legacy.config:hidden_size         :      1024
+INFO:legacy.config:ignored_index       :      -100
+INFO:legacy.config:image_token_id      :      100295
+INFO:legacy.config:intermediate_size   :      3072
+INFO:legacy.config:max_position_embeddings:      131072
+INFO:legacy.config:max_sequence_length :      None
+INFO:legacy.config:model_type          :      paddleocr_vl
+INFO:legacy.config:num_attention_heads :      16
+INFO:legacy.config:num_hidden_layers   :      18
+INFO:legacy.config:num_key_value_heads :      2
+INFO:legacy.config:rms_norm_eps        :      1e-05
+INFO:legacy.config:rope_scaling        :      {'mrope_section': [16, 24, 24], 'rope_type': 'default', 'type': 'default'}
+INFO:legacy.config:rope_theta          :      500000
+INFO:legacy.config:sliding_window      :      None
+INFO:legacy.config:tie_word_embeddings :      False
+INFO:legacy.config:torch_dtype         :      bfloat16
+INFO:legacy.config:transformers_version:      4.55.0
+INFO:legacy.config:use_bias            :      False
+INFO:legacy.config:use_cache           :      False
+INFO:legacy.config:use_flash_attention :      False
+INFO:legacy.config:video_token_id      :      101307
+INFO:legacy.config:vision_config       :      PretrainedConfig {
+  "architectures": [
+    "PaddleOCRVisionModel"
+  ],
+  "attention_dropout": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration_paddleocr_vl.PaddleOCRVLConfig",
+    "AutoModel": "modeling_paddleocr_vl.PaddleOCRVisionModel"
+  },
+  "hidden_act": "gelu_pytorch_tanh",
+  "hidden_size": 1152,
+  "image_size": 384,
+  "intermediate_size": 4304,
+  "layer_norm_eps": 1e-06,
+  "num_attention_heads": 16,
+  "num_channels": 3,
+  "num_hidden_layers": 27,
+  "pad_token_id": 0,
+  "paddleformers_version": "1.1.1",
+  "patch_size": 14,
+  "spatial_merge_size": 2,
+  "temporal_patch_size": 2,
+  "tie_word_embeddings": true,
+  "tokens_per_second": 2,
+  "torch_dtype": "bfloat16"
+}
+
+INFO:legacy.config:vision_start_token_id:      101305
+INFO:legacy.config:vision_end_token_id :      101306
+INFO:legacy.config:vocab_size          :      103424
+INFO:legacy.config:weight_share_add_bias:      True
+INFO:legacy.config:use_3d_rope         :      True
+INFO:legacy.config:rope_is_neox_style  :      True
+INFO:legacy.config:top_p               :      1.0
+INFO:legacy.config:temperature         :      1.0
+INFO:legacy.config:penalty_score       :      1.0
+INFO:legacy.config:frequency_score     :      0.0
+INFO:legacy.config:presence_score      :      0.0
+INFO:legacy.config:min_length          :      1
+INFO:legacy.config:start_layer_index   :      0
+INFO:legacy.config:moe_num_shared_experts:      0
+INFO:legacy.config:moe_layer_start_index:      0
+INFO:legacy.config:num_max_dispatch_tokens_per_rank:      8
+INFO:legacy.config:moe_use_aux_free    :      False
+INFO:legacy.config:initializer_range   :      0.02
+INFO:legacy.config:quantization_config :      None
+INFO:legacy.config:moe_num_experts     :      None
+INFO:legacy.config:moe_layer_end_index :      None
+INFO:legacy.config:rope_3d             :      True
+INFO:legacy.config:freq_allocation     :      16
+INFO:legacy.config:ori_vocab_size      :      103424
+INFO:legacy.config:think_start_id      :      -1
+INFO:legacy.config:think_end_id        :      -1
+INFO:legacy.config:im_patch_id         :      -1
+INFO:legacy.config:line_break_id       :      -1
+INFO:legacy.config:think_truncate_prompt_ids:      [-1]
+INFO:legacy.config:reasoning_allowed_token_ids:      []
+INFO:legacy.config:is_unified_ckpt     :      True
+INFO:legacy.config:runner_type         :      generate
+INFO:legacy.config:convert_type        :      none
+INFO:legacy.config:is_reasoning_model  :      False
+INFO:legacy.config:enable_mm           :      True
+INFO:legacy.config:supported_tasks     :      []
+INFO:legacy.config:_model_info         :      ModelInfo(architecture='PaddleOCRVLForConditionalGeneration', category=<ModelCategory.MULTIMODAL: 2>, is_text_generation=False, is_multimodal=True, is_reasoning=False, is_pooling=False, module_path='paddleocr_vl.paddleocr_vl', default_pooling_type='LAST')
+INFO:legacy.config:_architecture       :      PaddleOCRVLForConditionalGeneration
+INFO:legacy.config:mla_use_absorb      :      False
+INFO:legacy.config:max_stop_seqs_num   :      5
+INFO:legacy.config:stop_seqs_max_len   :      8
+INFO:legacy.config:model_config        :      {'architectures': ['PaddleOCRVLForConditionalGeneration'], 'attention_probs_dropout_prob': 0.0, 'auto_map': {'AutoConfig': 'configuration_paddleocr_vl.PaddleOCRVLConfig', 'AutoModel': 'modeling_paddleocr_vl.PaddleOCRVLForConditionalGeneration', 'AutoModelForCausalLM': 'modeling_paddleocr_vl.PaddleOCRVLForConditionalGeneration'}, 'compression_ratio': 1.0, 'head_dim': 128, 'hidden_act': 'silu', 'hidden_dropout_prob': 0.0, 'hidden_size': 1024, 'ignored_index': -100, 'image_token_id': 100295, 'intermediate_size': 3072, 'max_position_embeddings': 131072, 'max_sequence_length': None, 'model_type': 'paddleocr_vl', 'num_attention_heads': 16, 'num_hidden_layers': 18, 'num_key_value_heads': 2, 'pad_token_id': 0, 'rms_norm_eps': 1e-05, 'rope_scaling': {'mrope_section': [16, 24, 24], 'rope_type': 'default', 'type': 'default'}, 'rope_theta': 500000, 'sliding_window': None, 'tie_word_embeddings': False, 'torch_dtype': 'bfloat16', 'transformers_version': '4.55.0', 'use_bias': False, 'use_cache': False, 'use_flash_attention': False, 'video_token_id': 101307, 'vision_config': {'architectures': ['PaddleOCRVisionModel'], 'attention_dropout': 0.0, 'auto_map': {'AutoConfig': 'configuration_paddleocr_vl.PaddleOCRVLConfig', 'AutoModel': 'modeling_paddleocr_vl.PaddleOCRVisionModel'}, 'hidden_act': 'gelu_pytorch_tanh', 'hidden_size': 1152, 'image_size': 384, 'intermediate_size': 4304, 'layer_norm_eps': 1e-06, 'model_type': 'paddleocr_vl', 'num_attention_heads': 16, 'num_channels': 3, 'num_hidden_layers': 27, 'pad_token_id': 0, 'patch_size': 14, 'spatial_merge_size': 2, 'temporal_patch_size': 2, 'tokens_per_second': 2, 'torch_dtype': 'bfloat16'}, 'vision_start_token_id': 101305, 'vision_end_token_id': 101306, 'vocab_size': 103424, 'weight_share_add_bias': True, 'use_3d_rope': True, 'rope_is_neox_style': True}
+INFO:legacy.config:moe_phase           :      <fastdeploy.config.MoEPhase object at 0x7f26d8cda860>
+INFO:legacy.config:=============================================================
+INFO:legacy.config:Cache Configuration Information :
+INFO:legacy.config:block_size          :      16
+INFO:legacy.config:gpu_memory_utilization:      0.9
+INFO:legacy.config:num_gpu_blocks_override:      None
+INFO:legacy.config:kv_cache_ratio      :      0.75
+INFO:legacy.config:enc_dec_block_num   :      2
+INFO:legacy.config:prealloc_dec_block_slot_num_threshold:      12
+INFO:legacy.config:cache_dtype         :      bfloat16
+INFO:legacy.config:model_cfg           :      <fastdeploy.config.ModelConfig object at 0x7f26f7445990>
+INFO:legacy.config:enable_chunked_prefill:      False
+INFO:legacy.config:rdma_comm_ports     :      [25285]
+INFO:legacy.config:local_rdma_comm_ports:      [25285]
+INFO:legacy.config:cache_transfer_protocol:      ipc,rdma
+INFO:legacy.config:pd_comm_port        :      [60151]
+INFO:legacy.config:local_pd_comm_port  :      60151
+INFO:legacy.config:enable_prefix_caching:      False
+INFO:legacy.config:enable_ssd_cache    :      False
+INFO:legacy.config:cache_queue_port    :      [59850]
+INFO:legacy.config:local_cache_queue_port:      59850
+INFO:legacy.config:swap_space          :      None
+INFO:legacy.config:max_encoder_cache   :      0
+INFO:legacy.config:max_processor_cache :      -1
+INFO:legacy.config:enable_output_caching:      False
+INFO:legacy.config:disable_chunked_mm_input:      False
+INFO:legacy.config:kvcache_storage_backend:      None
+INFO:legacy.config:write_policy        :      write_through
+INFO:legacy.config:write_through_threshold:      2
+INFO:legacy.config:num_cpu_blocks      :      0
+INFO:legacy.config:use_mla_cache       :      False
+INFO:legacy.config:head_num            :      2
+INFO:legacy.config:head_dim            :      128
+INFO:legacy.config:byte_size           :      2
+INFO:legacy.config:kv_factor           :      2
+INFO:legacy.config:bytes_per_token_per_layer:      1024
+INFO:legacy.config:bytes_per_block     :      294912
+INFO:legacy.config:max_block_num_per_seq:      512
+INFO:legacy.config:dec_token_num       :      32
+INFO:legacy.config:total_block_num     :      528
+INFO:legacy.config:prefill_kvcache_block_num:      528
+INFO:legacy.config:=============================================================
+INFO:legacy.config:LocalScheduler Configuration Information :
+INFO:legacy.config:max_size            :      -1
+INFO:legacy.config:ttl                 :      900
+INFO:legacy.config:max_model_len       :      8192
+INFO:legacy.config:enable_chunked_prefill:      False
+INFO:legacy.config:max_num_partial_prefills:      1
+INFO:legacy.config:max_long_partial_prefills:      1
+INFO:legacy.config:long_prefill_token_threshold:      327
+INFO:legacy.config:=============================================================
+INFO:legacy.config:Parallel Configuration Information :
+INFO:legacy.config:sequence_parallel   :      False
+INFO:legacy.config:use_ep              :      False
+INFO:legacy.config:msg_queue_id        :      1
+INFO:legacy.config:tensor_parallel_rank:      0
+INFO:legacy.config:tensor_parallel_size:      1
+INFO:legacy.config:expert_parallel_rank:      0
+INFO:legacy.config:expert_parallel_size:      1
+INFO:legacy.config:data_parallel_rank  :      0
+INFO:legacy.config:data_parallel_size  :      1
+INFO:legacy.config:enable_expert_parallel:      False
+INFO:legacy.config:enable_chunked_moe  :      False
+INFO:legacy.config:chunked_moe_size    :      256
+INFO:legacy.config:local_data_parallel_id:      0
+INFO:legacy.config:engine_worker_queue_port:      [46509]
+INFO:legacy.config:local_engine_worker_queue_port:      46509
+INFO:legacy.config:device_ids          :      0
+INFO:legacy.config:first_token_id      :      1
+INFO:legacy.config:engine_pid          :      None
+INFO:legacy.config:do_profile          :      False
+INFO:legacy.config:use_internode_ll_two_stage:      False
+INFO:legacy.config:disable_sequence_parallel_moe:      False
+INFO:legacy.config:shutdown_comm_group_if_worker_idle:      True
+INFO:legacy.config:ep_prefill_use_worst_num_tokens:      False
+INFO:legacy.config:pod_ip              :      None
+INFO:legacy.config:disable_custom_all_reduce:      False
+INFO:legacy.config:enable_flashinfer_allreduce_fusion:      False
+INFO:legacy.config:pd_disaggregation_mode:      None
+INFO:legacy.config:prefill_one_step_stop:      False
+INFO:legacy.config:use_sequence_parallel_moe:      False
+INFO:legacy.config:=============================================================
+INFO:legacy.config:speculative_config  :      {"method_list": ["ngram", "mtp", "naive", "suffix"], "mtp_strategy_list": ["default", "with_ngram"], "mtp_strategy": "default", "num_speculative_tokens": 1, "num_model_steps": 1, "max_candidate_len": 5, "verify_window": 2, "max_ngram_size": 5, "min_ngram_size": 2, "suffix_decoding_max_tree_depth": 64, "suffix_decoding_max_cached_requests": -1, "suffix_decoding_max_spec_factor": 1.0, "suffix_decoding_min_token_prob": 0.1, "model": "/home/aistudio/PaddlePaddle/PaddleOCR-VL-1.5", "quantization": "wint8", "num_gpu_block_expand_ratio": 1.0, "model_type": "main", "benchmark_mode": false, "enf_gen_phase_tag": false, "enable_draft_logprob": false, "verify_strategy": "target_match", "accept_policy": "normal", "model_config": {}, "num_extra_cache_layer": 0}
+INFO:legacy.config:eplb_config         :      <fastdeploy.config.EPLBConfig object at 0x7f26d8cda6e0>
+INFO:legacy.config:device_config       :      None
+INFO:legacy.config:load_config         :      {"load_choices": "default_v1", "is_pre_sharded": false, "dynamic_load_weight": false, "load_strategy": "normal", "rsync_config": null, "model_loader_extra_config": null}
+INFO:legacy.config:quant_config        :      None
+INFO:legacy.config:graph_opt_config    :      {"graph_opt_level": 0, "sot_warmup_sizes": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 16, 32, 64, 128], "use_cudagraph": false, "cudagraph_capture_sizes": [8, 4, 2, 1], "flag_cudagraph_capture_sizes_initlized": true, "cudagraph_capture_sizes_prefill": [512, 512, 480, 448, 416, 384, 352, 320, 288, 256, 240, 224, 208, 192, 176, 160, 144, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1], "cudagraph_num_of_warmups": 2, "cudagraph_copy_inputs": false, "cudagraph_splitting_ops": [], "cudagraph_only_prefill": false, "full_cuda_graph": true, "max_capture_size": 8, "real_shape_to_captured_size": {"4": 4, "5": 8, "6": 8, "7": 8, "2": 2, "3": 4, "1": 1, "0": 0, "8": 8}, "real_bsz_to_captured_size": {}, "use_unique_memory_pool": true, "draft_model_use_cudagraph": true, "max_capture_shape_prefill": 512, "max_capture_size_prefill": 512, "real_shape_to_captured_size_prefill": {"480": 480, "481": 512, "482": 512, "483": 512, "484": 512, "485": 512, "486": 512, "487": 512, "488": 512, "489": 512, "490": 512, "491": 512, "492": 512, "493": 512, "494": 512, "495": 512, "496": 512, "497": 512, "498": 512, "499": 512, "500": 512, "501": 512, "502": 512, "503": 512, "504": 512, "505": 512, "506": 512, "507": 512, "508": 512, "509": 512, "510": 512, "511": 512, "448": 448, "449": 480, "450": 480, "451": 480, "452": 480, "453": 480, "454": 480, "455": 480, "456": 480, "457": 480, "458": 480, "459": 480, "460": 480, "461": 480, "462": 480, "463": 480, "464": 480, "465": 480, "466": 480, "467": 480, "468": 480, "469": 480, "470": 480, "471": 480, "472": 480, "473": 480, "474": 480, "475": 480, "476": 480, "477": 480, "478": 480, "479": 480, "416": 416, "417": 448, "418": 448, "419": 448, "420": 448, "421": 448, "422": 448, "423": 448, "424": 448, "425": 448, "426": 448, "427": 448, "428": 448, "429": 448, "430": 448, "431": 448, "432": 448, "433": 448, "434": 448, "435": 448, "436": 448, "437": 448, "438": 448, "439": 448, "440": 448, "441": 448, "442": 448, "443": 448, "444": 448, "445": 448, "446": 448, "447": 448, "384": 384, "385": 416, "386": 416, "387": 416, "388": 416, "389": 416, "390": 416, "391": 416, "392": 416, "393": 416, "394": 416, "395": 416, "396": 416, "397": 416, "398": 416, "399": 416, "400": 416, "401": 416, "402": 416, "403": 416, "404": 416, "405": 416, "406": 416, "407": 416, "408": 416, "409": 416, "410": 416, "411": 416, "412": 416, "413": 416, "414": 416, "415": 416, "352": 352, "353": 384, "354": 384, "355": 384, "356": 384, "357": 384, "358": 384, "359": 384, "360": 384, "361": 384, "362": 384, "363": 384, "364": 384, "365": 384, "366": 384, "367": 384, "368": 384, "369": 384, "370": 384, "371": 384, "372": 384, "373": 384, "374": 384, "375": 384, "376": 384, "377": 384, "378": 384, "379": 384, "380": 384, "381": 384, "382": 384, "383": 384, "320": 320, "321": 352, "322": 352, "323": 352, "324": 352, "325": 352, "326": 352, "327": 352, "328": 352, "329": 352, "330": 352, "331": 352, "332": 352, "333": 352, "334": 352, "335": 352, "336": 352, "337": 352, "338": 352, "339": 352, "340": 352, "341": 352, "342": 352, "343": 352, "344": 352, "345": 352, "346": 352, "347": 352, "348": 352, "349": 352, "350": 352, "351": 352, "288": 288, "289": 320, "290": 320, "291": 320, "292": 320, "293": 320, "294": 320, "295": 320, "296": 320, "297": 320, "298": 320, "299": 320, "300": 320, "301": 320, "302": 320, "303": 320, "304": 320, "305": 320, "306": 320, "307": 320, "308": 320, "309": 320, "310": 320, "311": 320, "312": 320, "313": 320, "314": 320, "315": 320, "316": 320, "317": 320, "318": 320, "319": 320, "256": 256, "257": 288, "258": 288, "259": 288, "260": 288, "261": 288, "262": 288, "263": 288, "264": 288, "265": 288, "266": 288, "267": 288, "268": 288, "269": 288, "270": 288, "271": 288, "272": 288, "273": 288, "274": 288, "275": 288, "276": 288, "277": 288, "278": 288, "279": 288, "280": 288, "281": 288, "282": 288, "283": 288, "284": 288, "285": 288, "286": 288, "287": 288, "240": 240, "241": 256, "242": 256, "243": 256, "244": 256, "245": 256, "246": 256, "247": 256, "248": 256, "249": 256, "250": 256, "251": 256, "252": 256, "253": 256, "254": 256, "255": 256, "224": 224, "225": 240, "226": 240, "227": 240, "228": 240, "229": 240, "230": 240, "231": 240, "232": 240, "233": 240, "234": 240, "235": 240, "236": 240, "237": 240, "238": 240, "239": 240, "208": 208, "209": 224, "210": 224, "211": 224, "212": 224, "213": 224, "214": 224, "215": 224, "216": 224, "217": 224, "218": 224, "219": 224, "220": 224, "221": 224, "222": 224, "223": 224, "192": 192, "193": 208, "194": 208, "195": 208, "196": 208, "197": 208, "198": 208, "199": 208, "200": 208, "201": 208, "202": 208, "203": 208, "204": 208, "205": 208, "206": 208, "207": 208, "176": 176, "177": 192, "178": 192, "179": 192, "180": 192, "181": 192, "182": 192, "183": 192, "184": 192, "185": 192, "186": 192, "187": 192, "188": 192, "189": 192, "190": 192, "191": 192, "160": 160, "161": 176, "162": 176, "163": 176, "164": 176, "165": 176, "166": 176, "167": 176, "168": 176, "169": 176, "170": 176, "171": 176, "172": 176, "173": 176, "174": 176, "175": 176, "144": 144, "145": 160, "146": 160, "147": 160, "148": 160, "149": 160, "150": 160, "151": 160, "152": 160, "153": 160, "154": 160, "155": 160, "156": 160, "157": 160, "158": 160, "159": 160, "128": 128, "129": 144, "130": 144, "131": 144, "132": 144, "133": 144, "134": 144, "135": 144, "136": 144, "137": 144, "138": 144, "139": 144, "140": 144, "141": 144, "142": 144, "143": 144, "120": 120, "121": 128, "122": 128, "123": 128, "124": 128, "125": 128, "126": 128, "127": 128, "112": 112, "113": 120, "114": 120, "115": 120, "116": 120, "117": 120, "118": 120, "119": 120, "104": 104, "105": 112, "106": 112, "107": 112, "108": 112, "109": 112, "110": 112, "111": 112, "96": 96, "97": 104, "98": 104, "99": 104, "100": 104, "101": 104, "102": 104, "103": 104, "88": 88, "89": 96, "90": 96, "91": 96, "92": 96, "93": 96, "94": 96, "95": 96, "80": 80, "81": 88, "82": 88, "83": 88, "84": 88, "85": 88, "86": 88, "87": 88, "72": 72, "73": 80, "74": 80, "75": 80, "76": 80, "77": 80, "78": 80, "79": 80, "64": 64, "65": 72, "66": 72, "67": 72, "68": 72, "69": 72, "70": 72, "71": 72, "56": 56, "57": 64, "58": 64, "59": 64, "60": 64, "61": 64, "62": 64, "63": 64, "48": 48, "49": 56, "50": 56, "51": 56, "52": 56, "53": 56, "54": 56, "55": 56, "40": 40, "41": 48, "42": 48, "43": 48, "44": 48, "45": 48, "46": 48, "47": 48, "32": 32, "33": 40, "34": 40, "35": 40, "36": 40, "37": 40, "38": 40, "39": 40, "24": 24, "25": 32, "26": 32, "27": 32, "28": 32, "29": 32, "30": 32, "31": 32, "16": 16, "17": 24, "18": 24, "19": 24, "20": 24, "21": 24, "22": 24, "23": 24, "8": 8, "9": 16, "10": 16, "11": 16, "12": 16, "13": 16, "14": 16, "15": 16, "4": 4, "5": 8, "6": 8, "7": 8, "2": 2, "3": 4, "1": 1, "0": 0, "512": 512}}
+INFO:legacy.config:early_stop_config   :      {"enable_early_stop": false, "strategy": "repetition", "window_size": 3000, "threshold": 0.99}
+INFO:legacy.config:plas_attention_config:      {"plas_encoder_top_k_left": null, "plas_encoder_top_k_right": null, "plas_decoder_top_k_left": null, "plas_decoder_top_k_right": null, "plas_use_encoder_seq_limit": null, "plas_use_decoder_seq_limit": null, "plas_block_size": 128, "mlp_weight_name": "plas_attention_mlp_weight.safetensors", "plas_max_seq_length": 131072}
+INFO:legacy.config:structured_outputs_config:      {"reasoning_parser": null, "guided_decoding_backend": "off", "disable_any_whitespace": true, "logits_processors": null}
+INFO:legacy.config:router_config       :      {"router": null, "api_server_host": "10.234.11.170", "api_server_port": null, "metrics_port": null}
+INFO:legacy.config:routing_replay_config:      {"enable_routing_replay": false, "routing_store_type": "local", "local_store_dir": "./routing_replay_output", "rdma_store_server": "", "only_last_turn": false, "use_fused_put": false}
+INFO:legacy.config:deploy_modality     :      mixed
+INFO:legacy.config:tokenizer           :      /home/aistudio/PaddlePaddle/PaddleOCR-VL-1.5
+INFO:legacy.config:ips                 :      None
+INFO:legacy.config:tool_parser         :      None
+INFO:legacy.config:master_ip           :      0.0.0.0
+INFO:legacy.config:host_ip             :      10.234.11.170
+INFO:legacy.config:nnode               :      1
+INFO:legacy.config:node_rank           :      0
+INFO:legacy.config:limit_mm_per_prompt :      None
+INFO:legacy.config:mm_processor_kwargs :      None
+INFO:legacy.config:use_warmup          :      0
+INFO:legacy.config:max_num_partial_prefills:      1
+INFO:legacy.config:max_long_partial_prefills:      1
+INFO:legacy.config:long_prefill_token_threshold:      327
+INFO:legacy.config:max_prefill_batch   :      8
+INFO:legacy.config:max_chips_per_node  :      16
+INFO:legacy.config:worker_num_per_node :      1
+INFO:legacy.config:is_master           :      True
+INFO:legacy.config:paddle_commit_id    :      28667cd939ab01444ead356a35b2dfea066dd39b
+INFO:legacy.config:local_device_ids    :      ['0']
+INFO:legacy.config:splitwise_version   :      v1
+INFO:legacy.config:register_info       :      {'role': 'mixed', 'host_ip': '10.234.11.170', 'port': None, 'metrics_port': None, 'connector_port': 60151, 'rdma_ports': [25285], 'engine_worker_queue_port': 46509, 'device_ids': ['0'], 'transfer_protocol': ['ipc', 'rdma'], 'tp_size': 1, 'is_paused': False, 'version': 'init', 'connected_decodes': []}
+INFO:legacy.config:=============================================================
+INFO:legacy.prefix_cache_manager:Prefix cache manager is initialized with 528 gpu blocks and 0 cpu blocks, bytes_per_token_per_layer for each rank: 1024.0
+INFO     2026-06-01 20:25:58,230 7196  download.py[line:146] Using download source: huggingface
+INFO     2026-06-01 20:25:58,238 7196  configuration_utils.py[line:425] Loading configuration file /home/aistudio/PaddlePaddle/PaddleOCR-VL-1.5/generation_config.json
+INFO     2026-06-01 20:25:58,296 7196  tokenizer_utils.py[line:257] Using download source: huggingface
+huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
+To disable this warning, you can either:
+	- Avoid using `tokenizers` before the fork if possible
+	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
+INFO     2026-06-01 20:26:03,785 7196  engine.py[line:159] Waiting for worker processes to be ready...
+Loading Weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:36<00:00,  2.73it/s]
+Loading Layers: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 2195970.68it/s]INFO:legacy.config:Reset block num, the total_block_num:10922, prefill_kvcache_block_num:10922
+Loading Layers: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 199.69it/s]
+INFO     2026-06-01 20:26:46,032 7196  engine.py[line:218] Worker processes are launched with 102.24678826332092 seconds.
+INFO     2026-06-01 20:26:46,033 7196  engine.py[line:229] Detected 10922 gpu blocks and 0 cpu blocks in cache (block size: 16).
+INFO     2026-06-01 20:26:46,033 7196  engine.py[line:232] FastDeploy will be serving 8 running requests if each sequence reaches its maximum length: 8192
+[OCR Worker] OCR model loaded, elapsed: 142.60s
+[OCR Worker] Recognizing image 1/1, size: (2014, 2881)
+Processed prompts:   0%|                                                                              | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c allocate_gpu_blocks: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76], len(self.gpu_free_block_list) 10845
+[2026-06-01 20:26:48] [7196] [INFO] Prefill batch, dp_rank: 0, #new-seq: 1, #new-token: 1231, #cached-token: 0, token usage: 0.01, #free-block: 10845, #evictable-block: 0, #running-req: 1, #queue-req: 0, 
+INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c allocate_gpu_blocks: [77, 78], len(self.gpu_free_block_list) 10843
+INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c allocate_gpu_blocks: [79, 80], len(self.gpu_free_block_list) 10841
+INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c allocate_gpu_blocks: [81, 82], len(self.gpu_free_block_list) 10839
+INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c allocate_gpu_blocks: [83, 84], len(self.gpu_free_block_list) 10837
+INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c allocate_gpu_blocks: [85, 86], len(self.gpu_free_block_list) 10835
+[2026-06-01 20:26:52] [7196] [INFO] Decode batch, dp_rank: 0, #running-req: 1, #token: 1392, token usage: 0.01, #free-block: 10835, #evictable-block: 0, cuda graph: False, gen throughput (token/s): 1.07, #queue-req: 0, 
+INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c allocate_gpu_blocks: [87, 88], len(self.gpu_free_block_list) 10833
+INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c allocate_gpu_blocks: [89, 90], len(self.gpu_free_block_list) 10831
+INFO:legacy.prefix_cache_manager:req_id:1ca18bda-601c-428e-bb41-2a0b48fb011c recycle_gpu_blocks: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90], len(self.gpu_free_block_list) 10831
+Processed prompts: 100%|██████████████████████████████████████████████████████████████████████| 1/1 [00:09<00:00,  9.39s/it, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
+[OCR Worker] Image 1 done, text length: 277
+[OCR Worker] All images done, total text length: 277
+[OCR Worker] OCR model released
+/usr/local/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 16 leaked shared_memory objects to clean up at shutdown
+  warnings.warn('resource_tracker: There appear to be %d '
+20:26:58 [drug_ocr] INFO: [OCR Step] OCR complete, total text length: 277, elapsed: 176.95s
+20:26:58 [drug_ocr] INFO: [LLM Step] LLM extraction...
+20:26:58 [drug_ocr] INFO: [LLM Step] Starting LLM subprocess...
+I0601 20:27:00.538198 24555 init.cc:254] ENV [CUSTOM_DEVICE_ROOT]=/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device
+I0601 20:27:00.538252 24555 init.cc:162] Try loading custom device libs from: [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device]
+I0601 20:27:00.678949 24555 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device/libpaddle-iluvatar-gpu.so
+I0601 20:27:00.678995 24555 custom_device_load.cc:58] Skipped lib [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.
+I0601 20:27:00.687072 24555 custom_kernel.cc:68] Succeed in loading 913 custom kernel(s) from loaded lib(s), will be used like native ones.
+I0601 20:27:00.687419 24555 init.cc:174] Finished in LoadCustomDevice with libs_path: [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device]
+I0601 20:27:00.687461 24555 init.cc:260] CustomDevice: iluvatar_gpu, visible devices count: 2
+WARNING  2026-06-01 20:27:16,785 24555 transfer_manager.py[line:30] cupy not available, falling back to synchronous transfers
+[LLM Worker] Loading LLM model (ERNIE)...
+WARNING  2026-06-01 20:27:19,888 24555 common.py[line:63] Model path 'baidu/ERNIE-4.5-0.3B-Paddle' is not a local directory or file, will try to download from huggingface hub.
+WARNING  2026-06-01 20:27:22,912 24555 common.py[line:73] Cannot reach huggingface.co. If the model is stored locally, please check the path 'baidu/ERNIE-4.5-0.3B-Paddle'. Otherwise check network/proxy settings (DOWNLOAD_SOURCE=huggingface).
+INFO     2026-06-01 20:27:23,163 24555 log.py[line:76] Downloading Model from remote to directory: /home/aistudio/PaddlePaddle/ERNIE-4.5-0.3B-Paddle
+INFO     2026-06-01 20:27:23,367 24555 log.py[line:76] Got 9 files, start to download ...
+Processing 9 items:   0%|                                                                                                                           | 0.00/9.00 [00:00<?, ?it/s]INFO     2026-06-01 20:27:23,477 24555 log.py[line:76] 
+File README.md already in cache with identical hash, skip downloading!
+Processing 9 items:  11%|████████████▉                                                                                                       | 1.00/9.00 [00:00<00:00, 9.15it/s]INFO     2026-06-01 20:27:23,484 24555 log.py[line:76] 
+File added_tokens.json already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:27:23,579 24555 log.py[line:76] 
+File config.json already in cache with identical hash, skip downloading!
+Processing 9 items:  33%|██████████████████████████████████████▋                                                                             | 3.00/9.00 [00:00<00:00, 15.2it/s]INFO     2026-06-01 20:27:23,581 24555 log.py[line:76] 
+File generation_config.json already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:27:23,678 24555 log.py[line:76] 
+File model.safetensors already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:27:23,688 24555 log.py[line:76] 
+File special_tokens_map.json already in cache with identical hash, skip downloading!
+Processing 9 items:  67%|█████████████████████████████████████████████████████████████████████████████▎                                      | 6.00/9.00 [00:00<00:00, 20.9it/s]INFO     2026-06-01 20:27:23,767 24555 log.py[line:76] 
+File tokenizer.model already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:27:23,788 24555 log.py[line:76] 
+File tokenizer_config.json already in cache with identical hash, skip downloading!
+INFO     2026-06-01 20:27:24,966 24555 log.py[line:76] 
+File LICENSE already in cache with identical hash, skip downloading!
+Processing 9 items: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00/9.00 [00:01<00:00, 5.63it/s]
+INFO     2026-06-01 20:27:24,968 24555 log.py[line:76] Download model 'PaddlePaddle/ERNIE-4.5-0.3B-Paddle' successfully.
+INFO     2026-06-01 20:27:24,968 24555 log.py[line:76] Target directory already exists, skipping creation.
+INFO     2026-06-01 20:27:24,984 24555 args_utils.py[line:658] Parameter `engine_worker_queue_port` is not specified, found available ports for possible use: [51568]
+INFO     2026-06-01 20:27:24,986 24555 args_utils.py[line:658] Parameter `cache_queue_port` is not specified, found available ports for possible use: [31913]
+INFO     2026-06-01 20:27:24,987 24555 args_utils.py[line:658] Parameter `rdma_comm_ports` is not specified, found available ports for possible use: [15270, 15271]
+INFO     2026-06-01 20:27:24,988 24555 args_utils.py[line:658] Parameter `pd_comm_port` is not specified, found available ports for possible use: [36146]
+INFO     2026-06-01 20:27:24,989 24555 download.py[line:146] Using download source: huggingface
+INFO     2026-06-01 20:27:24,989 24555 configuration_utils.py[line:1208] Loading configuration file /home/aistudio/PaddlePaddle/ERNIE-4.5-0.3B-Paddle/config.json
+WARNING  2026-06-01 20:27:24,993 24555 configuration_utils.py[line:1239] You are using a model of type ernie4_5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
+INFO     2026-06-01 20:27:28,484 24555 flash_attn_backend.py[line:105] Only support CUDA version flash attention.
+/home/aistudio/.local/lib/python3.10/site-packages/paddle/compat/proxy.py:440: UserWarning: Extending PyTorch compat scope, previous scope: {'triton'}, new scope: {'flash_mla'}.
+/home/aistudio/.local/lib/python3.10/site-packages/fastdeploy/model_executor/graph_optimization/utils.py:21: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
+WARNING  2026-06-01 20:27:29,300 24555 moe.py[line:41] import noaux_tc Failed!
+INFO:legacy.config:Parameter `COMPRESSION_RATIO` will use default value 1.0.
+INFO:legacy.config:The model format is Paddle
+INFO:legacy.config:use_sequence_parallel_moe: False
+INFO:legacy.config:cudagraph sizes specified by model runner [1, 2, 4, 8, 8] is overridden by config [8, 1, 2, 4]
+INFO:legacy.config:Doing profile, the total_block_num:272
+INFO:legacy.config:Auto-setting num_max_dispatch_tokens_per_rank from 128 to 8 (max_num_seqs=8).
+INFO:legacy.config:register_info: {'role': 'mixed', 'host_ip': '10.234.11.170', 'port': None, 'metrics_port': None, 'connector_port': 36146, 'rdma_ports': [15270, 15271], 'engine_worker_queue_port': 51568, 'device_ids': ['0', '1'], 'transfer_protocol': ['ipc', 'rdma'], 'tp_size': 2, 'is_paused': False, 'version': 'init', 'connected_decodes': []}
+INFO:legacy.config:=================== Configuration Information ===============
+INFO:legacy.config:Model Configuration Information :
+INFO:legacy.config:model               :      /home/aistudio/PaddlePaddle/ERNIE-4.5-0.3B-Paddle
+INFO:legacy.config:is_quantized        :      False
+INFO:legacy.config:is_moe_quantized    :      False
+INFO:legacy.config:max_model_len       :      4096
+INFO:legacy.config:dtype               :      bfloat16
+INFO:legacy.config:enable_logprob      :      False
+INFO:legacy.config:max_logprobs        :      20
+INFO:legacy.config:logprobs_mode       :      raw_logprobs
+INFO:legacy.config:redundant_experts_num:      0
+INFO:legacy.config:seed                :      0
+INFO:legacy.config:quantization        :      wint8
+INFO:legacy.config:pad_token_id        :      0
+INFO:legacy.config:eos_tokens_lens     :      2
+INFO:legacy.config:lm_head_fp32        :      False
+INFO:legacy.config:moe_gate_fp32       :      False
+INFO:legacy.config:model_format        :      paddle
+INFO:legacy.config:runner              :      auto
+INFO:legacy.config:convert             :      auto
+INFO:legacy.config:pooler_config       :      None
+INFO:legacy.config:override_pooler_config:      None
+INFO:legacy.config:revision            :      master
+INFO:legacy.config:prefix_layer_name   :      layers
+INFO:legacy.config:kv_cache_quant_scale_path:      /home/aistudio/PaddlePaddle/ERNIE-4.5-0.3B-Paddle/kv_cache_scale.json
+INFO:legacy.config:enable_entropy      :      False
+INFO:legacy.config:model_impl          :      auto
+INFO:legacy.config:version             :      init
+INFO:legacy.config:partial_rotary_factor:      1.0
+INFO:legacy.config:num_nextn_predict_layers:      0
+INFO:legacy.config:mm_max_tokens_per_item:      None
+INFO:legacy.config:pretrained_config   :      PretrainedConfig {
+  "architectures": [
+    "Ernie4_5_ForCausalLM"
+  ],
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 1024,
+  "intermediate_size": 3072,
+  "max_position_embeddings": 131072,
+  "num_attention_heads": 16,
+  "num_hidden_layers": 18,
+  "num_key_value_heads": 2,
+  "pad_token_id": 0,
+  "paddleformers_version": "1.1.1",
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 500000,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "use_bias": false,
+  "use_cache": false,
+  "use_rmsnorm": true,
+  "vocab_size": 103424
+}
+
+INFO:legacy.config:architectures       :      ['Ernie4_5_ForCausalLM']
+INFO:legacy.config:bos_token_id        :      1
+INFO:legacy.config:eos_token_id        :      2
+INFO:legacy.config:hidden_act          :      silu
+INFO:legacy.config:hidden_size         :      1024
+INFO:legacy.config:intermediate_size   :      3072
+INFO:legacy.config:max_position_embeddings:      131072
+INFO:legacy.config:model_type          :      ernie4_5
+INFO:legacy.config:num_attention_heads :      16
+INFO:legacy.config:num_key_value_heads :      2
+INFO:legacy.config:head_dim            :      128
+INFO:legacy.config:num_hidden_layers   :      18
+INFO:legacy.config:rms_norm_eps        :      1e-05
+INFO:legacy.config:use_cache           :      False
+INFO:legacy.config:vocab_size          :      103424
+INFO:legacy.config:rope_theta          :      500000
+INFO:legacy.config:use_rmsnorm         :      True
+INFO:legacy.config:tie_word_embeddings :      True
+INFO:legacy.config:use_bias            :      False
+INFO:legacy.config:top_p               :      1.0
+INFO:legacy.config:temperature         :      1.0
+INFO:legacy.config:penalty_score       :      1.0
+INFO:legacy.config:frequency_score     :      0.0
+INFO:legacy.config:presence_score      :      0.0
+INFO:legacy.config:min_length          :      1
+INFO:legacy.config:start_layer_index   :      0
+INFO:legacy.config:moe_num_shared_experts:      0
+INFO:legacy.config:moe_layer_start_index:      0
+INFO:legacy.config:num_max_dispatch_tokens_per_rank:      8
+INFO:legacy.config:moe_use_aux_free    :      False
+INFO:legacy.config:hidden_dropout_prob :      0.0
+INFO:legacy.config:initializer_range   :      0.02
+INFO:legacy.config:quantization_config :      None
+INFO:legacy.config:moe_num_experts     :      None
+INFO:legacy.config:moe_layer_end_index :      None
+INFO:legacy.config:ori_vocab_size      :      103424
+INFO:legacy.config:think_start_id      :      -1
+INFO:legacy.config:think_end_id        :      -1
+INFO:legacy.config:im_patch_id         :      -1
+INFO:legacy.config:line_break_id       :      -1
+INFO:legacy.config:think_truncate_prompt_ids:      [-1]
+INFO:legacy.config:reasoning_allowed_token_ids:      []
+INFO:legacy.config:is_unified_ckpt     :      True
+INFO:legacy.config:runner_type         :      generate
+INFO:legacy.config:convert_type        :      none
+INFO:legacy.config:is_reasoning_model  :      False
+INFO:legacy.config:enable_mm           :      False
+INFO:legacy.config:supported_tasks     :      ['generate']
+INFO:legacy.config:_model_info         :      ModelInfo(architecture='Ernie4_5_ForCausalLM', category=<ModelCategory.TEXT_GENERATION: 1>, is_text_generation=True, is_multimodal=False, is_reasoning=False, is_pooling=False, module_path='ernie4_5_moe', default_pooling_type='LAST')
+INFO:legacy.config:_architecture       :      Ernie4_5_ForCausalLM
+INFO:legacy.config:mla_use_absorb      :      False
+INFO:legacy.config:max_stop_seqs_num   :      5
+INFO:legacy.config:stop_seqs_max_len   :      8
+INFO:legacy.config:compression_ratio   :      1.0
+INFO:legacy.config:model_config        :      {'architectures': ['Ernie4_5_ForCausalLM'], 'bos_token_id': 1, 'eos_token_id': 2, 'hidden_act': 'silu', 'hidden_size': 1024, 'intermediate_size': 3072, 'max_position_embeddings': 131072, 'model_type': 'ernie4_5', 'num_attention_heads': 16, 'num_key_value_heads': 2, 'head_dim': 128, 'num_hidden_layers': 18, 'pad_token_id': 0, 'rms_norm_eps': 1e-05, 'use_cache': False, 'vocab_size': 103424, 'rope_theta': 500000, 'use_rmsnorm': True, 'tie_word_embeddings': True, 'use_bias': False, 'dtype': 'bfloat16'}
+INFO:legacy.config:moe_phase           :      <fastdeploy.config.MoEPhase object at 0x7f26d8cd6680>
+INFO:legacy.config:=============================================================
+INFO:legacy.config:Cache Configuration Information :
+INFO:legacy.config:block_size          :      16
+INFO:legacy.config:gpu_memory_utilization:      0.9
+INFO:legacy.config:num_gpu_blocks_override:      None
+INFO:legacy.config:kv_cache_ratio      :      0.75
+INFO:legacy.config:enc_dec_block_num   :      2
+INFO:legacy.config:prealloc_dec_block_slot_num_threshold:      12
+INFO:legacy.config:cache_dtype         :      bfloat16
+INFO:legacy.config:model_cfg           :      <fastdeploy.config.ModelConfig object at 0x7f26f7449180>
+INFO:legacy.config:enable_chunked_prefill:      False
+INFO:legacy.config:rdma_comm_ports     :      [15270, 15271]
+INFO:legacy.config:local_rdma_comm_ports:      [15270, 15271]
+INFO:legacy.config:cache_transfer_protocol:      ipc,rdma
+INFO:legacy.config:pd_comm_port        :      [36146]
+INFO:legacy.config:local_pd_comm_port  :      36146
+INFO:legacy.config:enable_prefix_caching:      False
+INFO:legacy.config:enable_ssd_cache    :      False
+INFO:legacy.config:cache_queue_port    :      [31913]
+INFO:legacy.config:local_cache_queue_port:      31913
+INFO:legacy.config:swap_space          :      None
+INFO:legacy.config:max_encoder_cache   :      0
+INFO:legacy.config:max_processor_cache :      -1
+INFO:legacy.config:enable_output_caching:      False
+INFO:legacy.config:disable_chunked_mm_input:      False
+INFO:legacy.config:kvcache_storage_backend:      None
+INFO:legacy.config:write_policy        :      write_through
+INFO:legacy.config:write_through_threshold:      2
+INFO:legacy.config:num_cpu_blocks      :      0
+INFO:legacy.config:use_mla_cache       :      False
+INFO:legacy.config:head_num            :      2
+INFO:legacy.config:head_dim            :      128
+INFO:legacy.config:byte_size           :      2
+INFO:legacy.config:kv_factor           :      2
+INFO:legacy.config:bytes_per_token_per_layer:      1024
+INFO:legacy.config:bytes_per_block     :      294912
+INFO:legacy.config:max_block_num_per_seq:      256
+INFO:legacy.config:dec_token_num       :      32
+INFO:legacy.config:total_block_num     :      272
+INFO:legacy.config:prefill_kvcache_block_num:      272
+INFO:legacy.config:=============================================================
+INFO:legacy.config:LocalScheduler Configuration Information :
+INFO:legacy.config:max_size            :      -1
+INFO:legacy.config:ttl                 :      900
+INFO:legacy.config:max_model_len       :      4096
+INFO:legacy.config:enable_chunked_prefill:      False
+INFO:legacy.config:max_num_partial_prefills:      1
+INFO:legacy.config:max_long_partial_prefills:      1
+INFO:legacy.config:long_prefill_token_threshold:      163
+INFO:legacy.config:=============================================================
+INFO:legacy.config:Parallel Configuration Information :
+INFO:legacy.config:sequence_parallel   :      False
+INFO:legacy.config:use_ep              :      False
+INFO:legacy.config:msg_queue_id        :      1
+INFO:legacy.config:tensor_parallel_rank:      0
+INFO:legacy.config:tensor_parallel_size:      2
+INFO:legacy.config:expert_parallel_rank:      0
+INFO:legacy.config:expert_parallel_size:      1
+INFO:legacy.config:data_parallel_rank  :      0
+INFO:legacy.config:data_parallel_size  :      1
+INFO:legacy.config:enable_expert_parallel:      False
+INFO:legacy.config:enable_chunked_moe  :      False
+INFO:legacy.config:chunked_moe_size    :      256
+INFO:legacy.config:local_data_parallel_id:      0
+INFO:legacy.config:engine_worker_queue_port:      [51568]
+INFO:legacy.config:local_engine_worker_queue_port:      51568
+INFO:legacy.config:device_ids          :      0,1
+INFO:legacy.config:first_token_id      :      1
+INFO:legacy.config:engine_pid          :      None
+INFO:legacy.config:do_profile          :      False
+INFO:legacy.config:use_internode_ll_two_stage:      False
+INFO:legacy.config:disable_sequence_parallel_moe:      False
+INFO:legacy.config:shutdown_comm_group_if_worker_idle:      True
+INFO:legacy.config:ep_prefill_use_worst_num_tokens:      False
+INFO:legacy.config:pod_ip              :      None
+INFO:legacy.config:disable_custom_all_reduce:      False
+INFO:legacy.config:enable_flashinfer_allreduce_fusion:      False
+INFO:legacy.config:pd_disaggregation_mode:      None
+INFO:legacy.config:prefill_one_step_stop:      False
+INFO:legacy.config:use_sequence_parallel_moe:      False
+INFO:legacy.config:=============================================================
+INFO:legacy.config:speculative_config  :      {"method_list": ["ngram", "mtp", "naive", "suffix"], "mtp_strategy_list": ["default", "with_ngram"], "mtp_strategy": "default", "num_speculative_tokens": 1, "num_model_steps": 1, "max_candidate_len": 5, "verify_window": 2, "max_ngram_size": 5, "min_ngram_size": 2, "suffix_decoding_max_tree_depth": 64, "suffix_decoding_max_cached_requests": -1, "suffix_decoding_max_spec_factor": 1.0, "suffix_decoding_min_token_prob": 0.1, "model": "/home/aistudio/PaddlePaddle/ERNIE-4.5-0.3B-Paddle", "quantization": "wint8", "num_gpu_block_expand_ratio": 1.0, "model_type": "main", "benchmark_mode": false, "enf_gen_phase_tag": false, "enable_draft_logprob": false, "verify_strategy": "target_match", "accept_policy": "normal", "model_config": {}, "num_extra_cache_layer": 0}
+INFO:legacy.config:eplb_config         :      <fastdeploy.config.EPLBConfig object at 0x7f26d8cd6530>
+INFO:legacy.config:device_config       :      None
+INFO:legacy.config:load_config         :      {"load_choices": "default_v1", "is_pre_sharded": false, "dynamic_load_weight": false, "load_strategy": "normal", "rsync_config": null, "model_loader_extra_config": null}
+INFO:legacy.config:quant_config        :      None
+INFO:legacy.config:graph_opt_config    :      {"graph_opt_level": 0, "sot_warmup_sizes": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 16, 32, 64, 128], "use_cudagraph": false, "cudagraph_capture_sizes": [8, 4, 2, 1], "flag_cudagraph_capture_sizes_initlized": true, "cudagraph_capture_sizes_prefill": [512, 512, 480, 448, 416, 384, 352, 320, 288, 256, 240, 224, 208, 192, 176, 160, 144, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1], "cudagraph_num_of_warmups": 2, "cudagraph_copy_inputs": false, "cudagraph_splitting_ops": [], "cudagraph_only_prefill": false, "full_cuda_graph": true, "max_capture_size": 8, "real_shape_to_captured_size": {"4": 4, "5": 8, "6": 8, "7": 8, "2": 2, "3": 4, "1": 1, "0": 0, "8": 8}, "real_bsz_to_captured_size": {}, "use_unique_memory_pool": true, "draft_model_use_cudagraph": true, "max_capture_shape_prefill": 512, "max_capture_size_prefill": 512, "real_shape_to_captured_size_prefill": {"480": 480, "481": 512, "482": 512, "483": 512, "484": 512, "485": 512, "486": 512, "487": 512, "488": 512, "489": 512, "490": 512, "491": 512, "492": 512, "493": 512, "494": 512, "495": 512, "496": 512, "497": 512, "498": 512, "499": 512, "500": 512, "501": 512, "502": 512, "503": 512, "504": 512, "505": 512, "506": 512, "507": 512, "508": 512, "509": 512, "510": 512, "511": 512, "448": 448, "449": 480, "450": 480, "451": 480, "452": 480, "453": 480, "454": 480, "455": 480, "456": 480, "457": 480, "458": 480, "459": 480, "460": 480, "461": 480, "462": 480, "463": 480, "464": 480, "465": 480, "466": 480, "467": 480, "468": 480, "469": 480, "470": 480, "471": 480, "472": 480, "473": 480, "474": 480, "475": 480, "476": 480, "477": 480, "478": 480, "479": 480, "416": 416, "417": 448, "418": 448, "419": 448, "420": 448, "421": 448, "422": 448, "423": 448, "424": 448, "425": 448, "426": 448, "427": 448, "428": 448, "429": 448, "430": 448, "431": 448, "432": 448, "433": 448, "434": 448, "435": 448, "436": 448, "437": 448, "438": 448, "439": 448, "440": 448, "441": 448, "442": 448, "443": 448, "444": 448, "445": 448, "446": 448, "447": 448, "384": 384, "385": 416, "386": 416, "387": 416, "388": 416, "389": 416, "390": 416, "391": 416, "392": 416, "393": 416, "394": 416, "395": 416, "396": 416, "397": 416, "398": 416, "399": 416, "400": 416, "401": 416, "402": 416, "403": 416, "404": 416, "405": 416, "406": 416, "407": 416, "408": 416, "409": 416, "410": 416, "411": 416, "412": 416, "413": 416, "414": 416, "415": 416, "352": 352, "353": 384, "354": 384, "355": 384, "356": 384, "357": 384, "358": 384, "359": 384, "360": 384, "361": 384, "362": 384, "363": 384, "364": 384, "365": 384, "366": 384, "367": 384, "368": 384, "369": 384, "370": 384, "371": 384, "372": 384, "373": 384, "374": 384, "375": 384, "376": 384, "377": 384, "378": 384, "379": 384, "380": 384, "381": 384, "382": 384, "383": 384, "320": 320, "321": 352, "322": 352, "323": 352, "324": 352, "325": 352, "326": 352, "327": 352, "328": 352, "329": 352, "330": 352, "331": 352, "332": 352, "333": 352, "334": 352, "335": 352, "336": 352, "337": 352, "338": 352, "339": 352, "340": 352, "341": 352, "342": 352, "343": 352, "344": 352, "345": 352, "346": 352, "347": 352, "348": 352, "349": 352, "350": 352, "351": 352, "288": 288, "289": 320, "290": 320, "291": 320, "292": 320, "293": 320, "294": 320, "295": 320, "296": 320, "297": 320, "298": 320, "299": 320, "300": 320, "301": 320, "302": 320, "303": 320, "304": 320, "305": 320, "306": 320, "307": 320, "308": 320, "309": 320, "310": 320, "311": 320, "312": 320, "313": 320, "314": 320, "315": 320, "316": 320, "317": 320, "318": 320, "319": 320, "256": 256, "257": 288, "258": 288, "259": 288, "260": 288, "261": 288, "262": 288, "263": 288, "264": 288, "265": 288, "266": 288, "267": 288, "268": 288, "269": 288, "270": 288, "271": 288, "272": 288, "273": 288, "274": 288, "275": 288, "276": 288, "277": 288, "278": 288, "279": 288, "280": 288, "281": 288, "282": 288, "283": 288, "284": 288, "285": 288, "286": 288, "287": 288, "240": 240, "241": 256, "242": 256, "243": 256, "244": 256, "245": 256, "246": 256, "247": 256, "248": 256, "249": 256, "250": 256, "251": 256, "252": 256, "253": 256, "254": 256, "255": 256, "224": 224, "225": 240, "226": 240, "227": 240, "228": 240, "229": 240, "230": 240, "231": 240, "232": 240, "233": 240, "234": 240, "235": 240, "236": 240, "237": 240, "238": 240, "239": 240, "208": 208, "209": 224, "210": 224, "211": 224, "212": 224, "213": 224, "214": 224, "215": 224, "216": 224, "217": 224, "218": 224, "219": 224, "220": 224, "221": 224, "222": 224, "223": 224, "192": 192, "193": 208, "194": 208, "195": 208, "196": 208, "197": 208, "198": 208, "199": 208, "200": 208, "201": 208, "202": 208, "203": 208, "204": 208, "205": 208, "206": 208, "207": 208, "176": 176, "177": 192, "178": 192, "179": 192, "180": 192, "181": 192, "182": 192, "183": 192, "184": 192, "185": 192, "186": 192, "187": 192, "188": 192, "189": 192, "190": 192, "191": 192, "160": 160, "161": 176, "162": 176, "163": 176, "164": 176, "165": 176, "166": 176, "167": 176, "168": 176, "169": 176, "170": 176, "171": 176, "172": 176, "173": 176, "174": 176, "175": 176, "144": 144, "145": 160, "146": 160, "147": 160, "148": 160, "149": 160, "150": 160, "151": 160, "152": 160, "153": 160, "154": 160, "155": 160, "156": 160, "157": 160, "158": 160, "159": 160, "128": 128, "129": 144, "130": 144, "131": 144, "132": 144, "133": 144, "134": 144, "135": 144, "136": 144, "137": 144, "138": 144, "139": 144, "140": 144, "141": 144, "142": 144, "143": 144, "120": 120, "121": 128, "122": 128, "123": 128, "124": 128, "125": 128, "126": 128, "127": 128, "112": 112, "113": 120, "114": 120, "115": 120, "116": 120, "117": 120, "118": 120, "119": 120, "104": 104, "105": 112, "106": 112, "107": 112, "108": 112, "109": 112, "110": 112, "111": 112, "96": 96, "97": 104, "98": 104, "99": 104, "100": 104, "101": 104, "102": 104, "103": 104, "88": 88, "89": 96, "90": 96, "91": 96, "92": 96, "93": 96, "94": 96, "95": 96, "80": 80, "81": 88, "82": 88, "83": 88, "84": 88, "85": 88, "86": 88, "87": 88, "72": 72, "73": 80, "74": 80, "75": 80, "76": 80, "77": 80, "78": 80, "79": 80, "64": 64, "65": 72, "66": 72, "67": 72, "68": 72, "69": 72, "70": 72, "71": 72, "56": 56, "57": 64, "58": 64, "59": 64, "60": 64, "61": 64, "62": 64, "63": 64, "48": 48, "49": 56, "50": 56, "51": 56, "52": 56, "53": 56, "54": 56, "55": 56, "40": 40, "41": 48, "42": 48, "43": 48, "44": 48, "45": 48, "46": 48, "47": 48, "32": 32, "33": 40, "34": 40, "35": 40, "36": 40, "37": 40, "38": 40, "39": 40, "24": 24, "25": 32, "26": 32, "27": 32, "28": 32, "29": 32, "30": 32, "31": 32, "16": 16, "17": 24, "18": 24, "19": 24, "20": 24, "21": 24, "22": 24, "23": 24, "8": 8, "9": 16, "10": 16, "11": 16, "12": 16, "13": 16, "14": 16, "15": 16, "4": 4, "5": 8, "6": 8, "7": 8, "2": 2, "3": 4, "1": 1, "0": 0, "512": 512}}
+INFO:legacy.config:early_stop_config   :      {"enable_early_stop": false, "strategy": "repetition", "window_size": 3000, "threshold": 0.99}
+INFO:legacy.config:plas_attention_config:      {"plas_encoder_top_k_left": null, "plas_encoder_top_k_right": null, "plas_decoder_top_k_left": null, "plas_decoder_top_k_right": null, "plas_use_encoder_seq_limit": null, "plas_use_decoder_seq_limit": null, "plas_block_size": 128, "mlp_weight_name": "plas_attention_mlp_weight.safetensors", "plas_max_seq_length": 131072}
+INFO:legacy.config:structured_outputs_config:      {"reasoning_parser": null, "guided_decoding_backend": "off", "disable_any_whitespace": true, "logits_processors": null}
+INFO:legacy.config:router_config       :      {"router": null, "api_server_host": "10.234.11.170", "api_server_port": null, "metrics_port": null}
+INFO:legacy.config:routing_replay_config:      {"enable_routing_replay": false, "routing_store_type": "local", "local_store_dir": "./routing_replay_output", "rdma_store_server": "", "only_last_turn": false, "use_fused_put": false}
+INFO:legacy.config:deploy_modality     :      mixed
+INFO:legacy.config:tokenizer           :      /home/aistudio/PaddlePaddle/ERNIE-4.5-0.3B-Paddle
+INFO:legacy.config:ips                 :      None
+INFO:legacy.config:tool_parser         :      None
+INFO:legacy.config:master_ip           :      0.0.0.0
+INFO:legacy.config:host_ip             :      10.234.11.170
+INFO:legacy.config:nnode               :      1
+INFO:legacy.config:node_rank           :      0
+INFO:legacy.config:limit_mm_per_prompt :      None
+INFO:legacy.config:mm_processor_kwargs :      None
+INFO:legacy.config:use_warmup          :      0
+INFO:legacy.config:max_num_partial_prefills:      1
+INFO:legacy.config:max_long_partial_prefills:      1
+INFO:legacy.config:long_prefill_token_threshold:      163
+INFO:legacy.config:max_prefill_batch   :      3
+INFO:legacy.config:max_chips_per_node  :      16
+INFO:legacy.config:worker_num_per_node :      2
+INFO:legacy.config:is_master           :      True
+INFO:legacy.config:paddle_commit_id    :      28667cd939ab01444ead356a35b2dfea066dd39b
+INFO:legacy.config:local_device_ids    :      ['0', '1']
+INFO:legacy.config:splitwise_version   :      v1
+INFO:legacy.config:register_info       :      {'role': 'mixed', 'host_ip': '10.234.11.170', 'port': None, 'metrics_port': None, 'connector_port': 36146, 'rdma_ports': [15270, 15271], 'engine_worker_queue_port': 51568, 'device_ids': ['0', '1'], 'transfer_protocol': ['ipc', 'rdma'], 'tp_size': 2, 'is_paused': False, 'version': 'init', 'connected_decodes': []}
+INFO:legacy.config:=============================================================
+INFO:legacy.prefix_cache_manager:Prefix cache manager is initialized with 272 gpu blocks and 0 cpu blocks, bytes_per_token_per_layer for each rank: 512.0
+INFO     2026-06-01 20:27:59,772 24555 download.py[line:146] Using download source: huggingface
+INFO     2026-06-01 20:27:59,772 24555 configuration_utils.py[line:425] Loading configuration file /home/aistudio/PaddlePaddle/ERNIE-4.5-0.3B-Paddle/generation_config.json
+/home/aistudio/.local/lib/python3.10/site-packages/paddleformers/generation/configuration_utils.py:250: UserWarning: using greedy search strategy. However, `temperature` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `decode_strategy="greedy_search" ` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
+/home/aistudio/.local/lib/python3.10/site-packages/paddleformers/generation/configuration_utils.py:255: UserWarning: using greedy search strategy. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `decode_strategy="greedy_search" ` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
+WARNING  2026-06-01 20:27:59,790 24555 log.py[line:135] PretrainedTokenizer will be deprecated and removed in the next major release. Please migrate to Hugging Face's transformers.PreTrainedTokenizer. use class QWenTokenizer(PaddleTokenizerMixin, hf.PreTrainedTokenizer) to support multisource download and Paddle tokenizer operations.
+INFO     2026-06-01 20:28:05,026 24555 engine.py[line:159] Waiting for worker processes to be ready...
+Loading Weights:   0%|                                                                                                                                  | 0/100 [02:28<?, ?it/s]INFO:legacy.config:Reset block num, the total_block_num:21845, prefill_kvcache_block_num:21845
+Loading Weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:29<00:00,  1.50s/it]
+Loading Layers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 199.70it/s]
+INFO     2026-06-01 20:30:41,291 24555 engine.py[line:218] Worker processes are launched with 190.76685070991516 seconds.
+INFO     2026-06-01 20:30:41,291 24555 engine.py[line:229] Detected 21845 gpu blocks and 0 cpu blocks in cache (block size: 16).
+INFO     2026-06-01 20:30:41,291 24555 engine.py[line:232] FastDeploy will be serving 8 running requests if each sequence reaches its maximum length: 4096
+[LLM Worker] LLM model loaded, elapsed: 201.41s
+[LLM Worker] Generating response (max_new_tokens=100)...
+Processed prompts:   0%|                                                                                   | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]INFO:legacy.prefix_cache_manager:req_id:75dcc4f8-d383-413f-bf0f-4ab346246b4f allocate_gpu_blocks: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], len(self.gpu_free_block_list) 21820
+[2026-06-01 20:30:41] [24555] [INFO] Prefill batch, dp_rank: 0, #new-seq: 1, #new-token: 399, #cached-token: 0, token usage: 0.00, #free-block: 21820, #evictable-block: 0, #running-req: 1, #queue-req: 0, 
+INFO:legacy.prefix_cache_manager:req_id:75dcc4f8-d383-413f-bf0f-4ab346246b4f allocate_gpu_blocks: [25, 26], len(self.gpu_free_block_list) 21818
+INFO:legacy.prefix_cache_manager:req_id:75dcc4f8-d383-413f-bf0f-4ab346246b4f allocate_gpu_blocks: [27, 28], len(self.gpu_free_block_list) 21816
+INFO:legacy.prefix_cache_manager:req_id:75dcc4f8-d383-413f-bf0f-4ab346246b4f allocate_gpu_blocks: [29, 30], len(self.gpu_free_block_list) 21814
+INFO:legacy.prefix_cache_manager:req_id:75dcc4f8-d383-413f-bf0f-4ab346246b4f allocate_gpu_blocks: [31, 32], len(self.gpu_free_block_list) 21812
+INFO:legacy.prefix_cache_manager:req_id:75dcc4f8-d383-413f-bf0f-4ab346246b4f recycle_gpu_blocks: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32], len(self.gpu_free_block_list) 21812
+Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [01:20<00:00, 80.82s/it, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
+[LLM Worker] Extraction done, gen elapsed: 80.86s, result length: 118
+[LLM Worker] LLM model released
+/usr/local/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 16 leaked shared_memory objects to clean up at shutdown
+  warnings.warn('resource_tracker: There appear to be %d '
+20:32:11 [drug_ocr] INFO: [LLM Step] LLM extraction done, result length: 118, elapsed: 313.15s
+20:32:11 [drug_ocr] INFO: [TTS Step] TTS synthesis...
+20:32:11 [drug_ocr] INFO: [TTS Step] Starting TTS subprocess...
+I0601 20:32:14.633864 473466 init.cc:254] ENV [CUSTOM_DEVICE_ROOT]=/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device
+I0601 20:32:14.633916 473466 init.cc:162] Try loading custom device libs from: [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device]
+I0601 20:32:14.785195 473466 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device/libpaddle-iluvatar-gpu.so
+I0601 20:32:14.785243 473466 custom_device_load.cc:58] Skipped lib [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.
+I0601 20:32:14.794520 473466 custom_kernel.cc:68] Succeed in loading 913 custom kernel(s) from loaded lib(s), will be used like native ones.
+I0601 20:32:14.795017 473466 init.cc:174] Finished in LoadCustomDevice with libs_path: [/home/aistudio/.local/lib/python3.10/site-packages/paddle/paddle_custom_device]
+I0601 20:32:14.795059 473466 init.cc:260] CustomDevice: iluvatar_gpu, visible devices count: 2
+2026-06-01 20:32:33.043320992 [W:onnxruntime:Default, cpuid_info.cc:91 LogEarlyWarning] Unknown CPU vendor. cpuinfo_vendor value: 16
+[nltk_data] Downloading package averaged_perceptron_tagger to
+[nltk_data]     /home/aistudio/nltk_data...
+[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
+[nltk_data] Downloading package cmudict to /home/aistudio/nltk_data...
+[nltk_data]   Unzipping corpora/cmudict.zip.
+/usr/local/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
+  warnings.warn(
+[TTS Worker] Loading TTS model (PaddleSpeech)...
+[TTS Worker] TTS model loaded, elapsed: 0.00s
+[TTS Worker] Synthesis start, input text length: 118
+100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 489M/489M [00:11<00:00, 42.2MB/s]
+100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 915M/915M [00:33<00:00, 27.2MB/s]
+100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 589M/589M [00:14<00:00, 41.2MB/s]
+20:58:54 [paddlenlp.utils.download.bos_download] INFO: downloading https://bj.bcebos.com/paddle-hapi/models/bert/bert-base-chinese-vocab.txt to /home/aistudio/.paddlenlp/models/tmp1_niy_ma
+(…)/models/bert/bert-base-chinese-vocab.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 110k/110k [00:00<00:00, 35.1MB/s]
+20:58:54 [paddlenlp.utils.download.bos_download] INFO: storing https://bj.bcebos.com/paddle-hapi/models/bert/bert-base-chinese-vocab.txt in cache at /home/aistudio/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
+[2026-06-01 20:58:54,559] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
+[2026-06-01 20:58:54,569] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
+Building prefix dict from the default dictionary ...
+20:58:59 [jieba] DEBUG: Building prefix dict from the default dictionary ...
+Dumping model to file cache /tmp/jieba.cache
+20:59:00 [jieba] DEBUG: Dumping model to file cache /tmp/jieba.cache
+Loading model cost 0.818 seconds.
+20:59:00 [jieba] DEBUG: Loading model cost 0.818 seconds.
+Prefix dict has been built successfully.
+20:59:00 [jieba] DEBUG: Prefix dict has been built successfully.
+[TTS Worker] Synthesis done, audio duration: 23.27s, sample rate: 24000 Hz
+[TTS Worker] TTS model released
+20:59:07 [drug_ocr] INFO: [TTS Step] TTS synthesis done, audio duration: 23.27s, elapsed: 1615.98s
+20:59:07 [drug_ocr] INFO: ============================================================
+20:59:07 [drug_ocr] INFO: Pipeline complete, total elapsed: 2106.08s
+20:59:07 [drug_ocr] INFO: ============================================================
+
+============================================================
+OCR Result:
+============================================================
+核准日期：2010年05月07日
+修改日期：2024年09月29日
+\(^{®}\)
+久正
+脂必妥胶囊说明书
+请仔细阅读说明书并在医师指导下使用
+【药品名称】通用名称：脂必妥胶囊
+汉语拼音：Zhibituo Jiaonang
+【成份】红曲。
+【性状】本品为硬胶囊剂，内容物为紫红色至紫褐色的粉末；气微，味微酸。
+【功能主治】健脾消食，除湿祛痰，活血化瘀。用于脾瘀阻滞，症见气短，乏力，头晕，头痛，胸闷，腹胀，食少纳呆等；高脂血症；也可用于高脂血症及动脉粥样硬化引起的其他心脑血管疾病的辅助治疗。
+【规格】每粒装0.35g
+【用法用量】口服。一次3粒，一日
+
+============================================================
+Extracted Info:
+============================================================
+请确保答案准确清晰易懂
+
+脂必妥胶囊
+
+本品为硬胶囊剂，内容物为紫红色至紫褐色的粉末；气微，味微酸
+
+使用方法：一次3粒，一日
+
+本药能健脾消食，除湿祛痰，活血化瘀，用于脾瘀阻滞，症见气短，乏力，头晕，头痛，胸闷，腹胀，食少纳呆等；高脂
+
+Audio saved to: output.wav (duration: 23.27s)
+aistudio@ssh-942478-10243234-79b6d74556-jggs6:~$ 
+
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/drug_ocr_cli.py b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/drug_ocr_cli.py
new file mode 100644
index 00000000..66dbfd43
--- /dev/null
+++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/drug_ocr_cli.py
@@ -0,0 +1,677 @@
+#!/usr/bin/env python3
+"""Drug instruction leaflet intelligent recognition and voice broadcast pipeline - CLI.
+
+Usage:
+    python drug_ocr_cli.py --image resource/1.jpg
+    python drug_ocr_cli.py --image resource/1.jpg --no-split --ocr-tokens 5120 --llm-tokens 1024
+    python drug_ocr_cli.py --image resource/1.jpg --num-splits 9 --overlap 0.15
+"""
+
+import argparse
+import base64
+import gc
+import io
+import logging
+import math
+import os
+import sys
+import tempfile
+import time
+from multiprocessing import Process, Queue
+from pathlib import Path
+
+import numpy as np
+from PIL import Image
+from scipy.io.wavfile import read as wav_read
+from scipy.io.wavfile import write as wav_write
+
+logger = logging.getLogger("drug_ocr")
+
+
+# ============================================================================
+# patch_aistudio_utils
+# ============================================================================
+"""Patch script to fix aistudio_sdk import in paddlenlp.
+
+Uses importlib.util.find_spec to locate paddlenlp WITHOUT importing it,
+so this can be run before paddlenlp is imported to prevent the ImportError.
+"""
+import importlib.util
+import os
+import subprocess
+
+def _find_paddlenlp_dir():
+    # Method 1: find_spec (no import, just metadata)
+    spec = importlib.util.find_spec("paddlenlp")
+    if spec and spec.origin:
+        return os.path.dirname(spec.origin)
+
+    # Method 2: pip show as fallback
+    result = subprocess.run(
+        ["pip", "show", "paddlenlp"],
+        capture_output=True, text=True,
+    )
+    for line in result.stdout.splitlines():
+        if line.startswith("Location:"):
+            return os.path.join(line.split(":", 1)[1].strip(), "paddlenlp")
+
+    raise RuntimeError("Cannot locate paddlenlp installation directory")
+
+
+def patch_aistudio_utils():
+    pkg_dir = _find_paddlenlp_dir()
+    target_file = os.path.join(pkg_dir, "transformers", "aistudio_utils.py")
+
+    if not os.path.isfile(target_file):
+        raise FileNotFoundError(f"Target file not found: {target_file}")
+
+    old_line = "from aistudio_sdk.hub import download"
+    new_line = "from aistudio_sdk import snapshot_download as download"
+
+    with open(target_file, "r", encoding="utf-8") as f:
+        content = f.read()
+
+    if old_line not in content:
+        if new_line in content:
+            print("File already patched.")
+        else:
+            print(f"Target import not found in {target_file}")
+        return
+
+    patched = content.replace(old_line, new_line)
+
+    with open(target_file, "w", encoding="utf-8") as f:
+        f.write(patched)
+
+    print(f"Patched: {target_file}")
+    print(f"  {old_line}  =>  {new_line}")
+
+
+patch_aistudio_utils()
+
+
+# ============================================================================
+# Image splitting
+# ============================================================================
+
+def split_image(image, num_splits=4, overlap_ratio=0.1):
+    """Split an image into num_splits parts (NxN grid) with overlap."""
+    grid_size = int(math.sqrt(num_splits))
+    if grid_size * grid_size != num_splits:
+        raise ValueError(f"num_splits must be a perfect square (e.g. 4, 9, 16), got: {num_splits}")
+
+    w, h = image.size
+    cell_w = w / grid_size
+    cell_h = h / grid_size
+    overlap_w = cell_w * overlap_ratio
+    overlap_h = cell_h * overlap_ratio
+
+    sub_images = []
+    for row in range(grid_size):
+        for col in range(grid_size):
+            left = max(0, col * cell_w - overlap_w)
+            upper = max(0, row * cell_h - overlap_h)
+            right = min(w, (col + 1) * cell_w + overlap_w)
+            lower = min(h, (row + 1) * cell_h + overlap_h)
+            sub_img = image.crop((int(left), int(upper), int(right), int(lower)))
+            sub_images.append(sub_img)
+
+    return sub_images
+
+
+# ============================================================================
+# OCR module (subprocess)
+# ============================================================================
+
+def ocr_worker_process(ocr_model_dir, image_data_list, max_new_tokens, result_queue):
+    """Worker function for OCR subprocess - loads model, performs OCR, returns result."""
+    try:
+        import time
+        import base64
+        import io
+        from PIL import Image
+        from fastdeploy import LLM, SamplingParams
+
+        # Load OCR model
+        print("[OCR Worker] Loading OCR model (PaddleOCR-VL)...")
+        start = time.perf_counter()
+        ocr_model = LLM(
+            model=ocr_model_dir,
+            tensor_parallel_size=1,
+            max_model_len=8192,
+            block_size=16,
+            quantization="wint8",
+            graph_optimization_config={"use_cudagraph": False},
+        )
+        elapsed = time.perf_counter() - start
+        print(f"[OCR Worker] OCR model loaded, elapsed: {elapsed:.2f}s")
+
+        # Process each image
+        all_ocr_texts = []
+        for i, img_bytes in enumerate(image_data_list):
+            image = Image.open(io.BytesIO(img_bytes)).convert("RGB")
+            print(f"[OCR Worker] Recognizing image {i+1}/{len(image_data_list)}, size: {image.size}")
+
+            # Prepare image for OCR
+            buf = io.BytesIO()
+            image.save(buf, format="PNG")
+            base64_image = base64.b64encode(buf.getvalue()).decode("utf-8")
+            image_url = f"data:image/png;base64,{base64_image}"
+
+            prompts = [{
+                "messages": [{
+                    "role": "user",
+                    "content": [
+                        {"type": "image_url", "image_url": {"url": image_url}},
+                        {"type": "text", "text": "OCR:"},
+                    ],
+                }]
+            }]
+            sampling_params = SamplingParams(
+                temperature=0.8, top_p=0.95, max_tokens=max_new_tokens,
+            )
+            outputs = ocr_model.generate(prompts, sampling_params)
+            response = outputs[0].outputs.text
+            all_ocr_texts.append(response)
+            print(f"[OCR Worker] Image {i+1} done, text length: {len(response)}")
+
+        # Combine results
+        combined_text = "\n\n".join(all_ocr_texts)
+        print(f"[OCR Worker] All images done, total text length: {len(combined_text)}")
+
+        # Put result in queue
+        result_queue.put(("success", combined_text))
+
+        # Clean up
+        del ocr_model
+        import gc
+        gc.collect()
+        print("[OCR Worker] OCR model released")
+
+    except Exception as e:
+        import traceback
+        result_queue.put(("error", str(e) + "\n" + traceback.format_exc()))
+
+
+def ocr_step(
+    ocr_model_dir,
+    image_path,
+    enable_split=True,
+    num_splits=4,
+    overlap_ratio=0.1,
+    max_new_tokens=5120,
+):
+    """Execute the OCR step in a subprocess: load image, optionally split, and run OCR."""
+    step_start = time.perf_counter()
+    logger.info("[OCR Step] Loading image...")
+    image = Image.open(image_path).convert("RGB")
+    logger.info("[OCR Step] Image loaded, size: %s", image.size)
+
+    if enable_split:
+        logger.info("[OCR Step] Splitting image (num_splits=%d, overlap=%.2f)...", num_splits, overlap_ratio)
+        sub_images = split_image(image, num_splits=num_splits, overlap_ratio=overlap_ratio)
+        ocr_images = [image] + sub_images
+        logger.info("[OCR Step] Split done, 1 original + %d split = %d total", len(sub_images), len(ocr_images))
+    else:
+        logger.info("[OCR Step] Skipping image split")
+        ocr_images = [image]
+
+    # Serialize images to bytes for subprocess
+    image_data_list = []
+    for img in ocr_images:
+        buf = io.BytesIO()
+        img.save(buf, format="PNG")
+        image_data_list.append(buf.getvalue())
+
+    # Create subprocess for OCR
+    logger.info("[OCR Step] Starting OCR subprocess...")
+    result_queue = Queue()
+    ocr_process = Process(
+        target=ocr_worker_process,
+        args=(str(ocr_model_dir), image_data_list, max_new_tokens, result_queue)
+    )
+    ocr_process.start()
+
+    # Wait for result
+    status, result = result_queue.get()
+    ocr_process.join()
+    ocr_process.close()
+
+    if status == "error":
+        logger.error("[OCR Step] OCR subprocess failed: %s", result)
+        raise RuntimeError(f"OCR subprocess failed: {result}")
+
+    combined_ocr_text = result
+    logger.info("[OCR Step] OCR complete, total text length: %d, elapsed: %.2fs", len(combined_ocr_text), time.perf_counter() - step_start)
+
+    return {"ocr_text": combined_ocr_text, "ocr_images": ocr_images}
+
+
+# ============================================================================
+# LLM module (subprocess)
+# ============================================================================
+
+def clean_for_tts(text):
+    """Clean text for TTS synthesis by removing emojis and markdown formatting."""
+    import re
+    # Remove emojis (Unicode ranges for common emojis)
+    # NOTE: Must avoid ranges that overlap with CJK characters (U+4E00-U+9FFF)
+    text = re.sub(
+        r"[\U0001F600-\U0001F64F"  # emoticons
+        r"\U0001F300-\U0001F5FF"   # symbols & pictographs
+        r"\U0001F680-\U0001F6FF"   # transport & map
+        r"\U0001F1E0-\U0001F1FF"   # flags
+        r"\U00002702-\U000027B0"   # dingbats
+        r"\U000024C2-\U0000324F"   # enclosed alphanumerics (stop before CJK)
+        r"\U0001F200-\U0001F251"   # enclosed CJK supplement (above CJK range)
+        r"\U0001F900-\U0001F9FF"   # supplemental symbols
+        r"\U0001FA00-\U0001FA6F"   # chess symbols
+        r"\U0001FA70-\U0001FAFF"   # symbols extended-A
+        r"\U00002600-\U000026FF"   # misc symbols
+        r"\U0000FE00-\U0000FE0F"   # variation selectors
+        r"\U0000200D"              # zero-width joiner
+        r"]+",
+        "",
+        text,
+    )
+    # Remove  thinking blocks (including bare </think>)
+    text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
+    text = re.sub(r"^.*?</think>\s*", "", text, flags=re.DOTALL)
+    # Remove markdown code blocks (```...```)
+    text = re.sub(r"```.*?```", "", text, flags=re.DOTALL)
+    # Remove inline code (`...`) -> content
+    text = re.sub(r"`([^`\n]+)`", r"\1", text)
+    # Remove markdown headers (# ## ### etc.) at line start
+    text = re.sub(r"^#{1,6}\s+", "", text, flags=re.MULTILINE)
+    # Remove markdown bold (**text**) -> text
+    text = re.sub(r"\*\*([^*\n]+?)\*\*", r"\1", text)
+    # Remove markdown bold (__text__) -> text
+    text = re.sub(r"__([^_\n]+?)__", r"\1", text)
+    # Remove markdown italic (*text*) -> text
+    text = re.sub(r"\*([^*\n]+?)\*", r"\1", text)
+    # Remove markdown italic (_text_) -> text (only when _ is at word boundary)
+    text = re.sub(r"(?<!\w)_([^_\n]+?)_(?!\w)", r"\1", text)
+    # Remove markdown links [text](url) -> text
+    text = re.sub(r"\[([^\]]+)\]\([^)]+\)", r"\1", text)
+    # Remove markdown images ![alt](url)
+    text = re.sub(r"!\[[^\]]*\]\([^)]+\)", "", text)
+    # Remove markdown horizontal rules (---, ***, ___)
+    text = re.sub(r"^[-*_]{3,}\s*$", "", text, flags=re.MULTILINE)
+    # Remove markdown bullet list markers (- , * , + ) at line start, keep content
+    text = re.sub(r"^(\s*)[-*+]\s+", r"\1", text, flags=re.MULTILINE)
+    # Remove markdown numbered list markers (1. 2. etc.) at line start, keep content
+    text = re.sub(r"^(\s*)\d+\.\s+", r"\1", text, flags=re.MULTILINE)
+    # Remove markdown table pipes
+    text = re.sub(r"\|", " ", text)
+    # Remove markdown table separator lines (---:---:---)
+    text = re.sub(r"^[-: ]+$", "", text, flags=re.MULTILINE)
+    # Collapse multiple blank lines into one
+    text = re.sub(r"\n{3,}", "\n\n", text)
+    # Strip leading/trailing whitespace per line
+    lines = [line.strip() for line in text.splitlines()]
+    text = "\n".join(lines)
+    # Remove leading/trailing whitespace overall
+    text = text.strip()
+    return text
+
+
+def llm_worker_process(llm_model_dir, ocr_text, max_new_tokens, result_queue, tensor_parallel_size=2):
+    """Worker function for LLM subprocess - loads model, extracts info, returns result."""
+    try:
+        import time
+        from fastdeploy import LLM, SamplingParams
+
+        import os
+        gpu_ids = ",".join(str(i) for i in range(tensor_parallel_size))
+        os.environ["ILUVATAR_VISIBLE_DEVICES"] = gpu_ids
+
+        # Load LLM model
+        print(f"[LLM Worker] Loading LLM model (ERNIE) with tensor_parallel_size={tensor_parallel_size}...")
+        start = time.perf_counter()
+        llm_model = LLM(
+            model=llm_model_dir,
+            tensor_parallel_size=tensor_parallel_size,
+            max_model_len=4096,
+            block_size=16,
+            quantization="wint8",
+            graph_optimization_config={"use_cudagraph": False},
+        )
+        elapsed = time.perf_counter() - start
+        print(f"[LLM Worker] LLM model loaded, elapsed: {elapsed:.2f}s")
+
+        # Prepare prompt
+        prompt_text = f"""以下是药品说明书的 OCR 识别结果，供参考：
+
+{ocr_text}
+
+请根据以上 OCR 识别结果，提取并整理以下关键信息，用清晰易懂的语言重新表述，方便老年人阅读理解：
+
+1. 药品名称
+2. 药品适应症（这个药治什么病）
+3. 药品的用法与用量（怎么吃、吃多少）
+4. 药品的禁忌（什么人不能吃、什么情况不能吃）
+5. 药品的不良反应（吃药后可能出现的不舒服）
+
+要求：
+- 只输出整理后的关键信息，不要重复或复述 OCR 原文
+- 用简洁、通俗的语言回答，避免使用专业术语
+- 不要使用表情符号、emoji
+- 不要使用markdown格式符号（如#、**、-等），直接用纯文本输出
+- 用自然流畅的口语化表达，方便语音播报
+- 总字数控制在 {max_new_tokens} 字以内"""
+
+        prompts = [prompt_text]
+        sampling_params = SamplingParams(
+            temperature=0.8, top_p=0.95, max_tokens=4096,
+        )
+
+        print(f"[LLM Worker] Generating response (max_new_tokens={max_new_tokens})...")
+        gen_start = time.perf_counter()
+        outputs = llm_model.generate(prompts, sampling_params)
+        result = outputs[0].outputs.text
+        gen_elapsed = time.perf_counter() - gen_start
+
+        # Clean result
+        result = clean_for_tts(result)
+        print(f"[LLM Worker] Extraction done, gen elapsed: {gen_elapsed:.2f}s, result length: {len(result)}")
+
+        # Put result in queue
+        result_queue.put(("success", result))
+
+        # Clean up
+        del llm_model
+        import gc
+        gc.collect()
+        print("[LLM Worker] LLM model released")
+
+    except Exception as e:
+        import traceback
+        result_queue.put(("error", str(e) + "\n" + traceback.format_exc()))
+
+
+def llm_step(
+    llm_model_dir,
+    ocr_text,
+    max_new_tokens=1024,
+    tensor_parallel_size=2,
+):
+    """Execute the LLM extraction step in a subprocess."""
+    step_start = time.perf_counter()
+    logger.info("[LLM Step] LLM extraction...")
+
+    # Create subprocess for LLM
+    logger.info("[LLM Step] Starting LLM subprocess...")
+    result_queue = Queue()
+    llm_process = Process(
+        target=llm_worker_process,
+        args=(str(llm_model_dir), ocr_text, max_new_tokens, result_queue, tensor_parallel_size)
+    )
+    llm_process.start()
+
+    # Wait for result
+    status, result = result_queue.get()
+    llm_process.join()
+    llm_process.close()
+
+    if status == "error":
+        logger.error("[LLM Step] LLM subprocess failed: %s", result)
+        raise RuntimeError(f"LLM subprocess failed: {result}")
+
+    extracted_info = result
+    logger.info("[LLM Step] LLM extraction done, result length: %d, elapsed: %.2fs", len(extracted_info), time.perf_counter() - step_start)
+
+    return {"extracted_info": extracted_info}
+
+
+# ============================================================================
+# TTS module (subprocess)
+# ============================================================================
+
+def tts_worker_process(text, output_path, result_queue, reduce_volume=False):
+    """Worker function for TTS subprocess - loads model, synthesizes speech, returns result."""
+    try:
+        import time
+        import subprocess as _sp
+        from paddlespeech.cli.tts.infer import TTSExecutor
+        from scipy.io.wavfile import read as wav_read
+
+        # Load TTS model
+        print("[TTS Worker] Loading TTS model (PaddleSpeech)...")
+        start = time.perf_counter()
+        tts_model = TTSExecutor()
+        elapsed = time.perf_counter() - start
+        print(f"[TTS Worker] TTS model loaded, elapsed: {elapsed:.2f}s")
+
+        # Synthesize speech
+        print(f"[TTS Worker] Synthesis start, input text length: {len(text)}")
+        tts_model(text=text, output=output_path)
+
+        if reduce_volume:
+            temp_path = output_path + ".tmp.wav"
+            os.rename(output_path, temp_path)
+            print("[TTS Worker] Reducing volume by -90dB via ffmpeg...")
+            _sp.run(
+                ["ffmpeg", "-i", temp_path, "-af", "volume=-90dB", output_path],
+                check=True, capture_output=True,
+            )
+            os.remove(temp_path)
+            print("[TTS Worker] Volume reduction done")
+
+        # Read audio data
+        sr, wav_data = wav_read(output_path)
+
+        if wav_data is not None:
+            audio_duration = len(wav_data) / sr
+            print(f"[TTS Worker] Synthesis done, audio duration: {audio_duration:.2f}s, sample rate: {sr} Hz")
+            result_queue.put(("success", (sr, wav_data.tolist())))  # Convert to list for serialization
+        else:
+            print("[TTS Worker] Synthesis failed")
+            result_queue.put(("error", "TTS synthesis failed"))
+
+        # Clean up
+        del tts_model
+        import gc
+        gc.collect()
+        print("[TTS Worker] TTS model released")
+
+    except Exception as e:
+        import traceback
+        result_queue.put(("error", str(e) + "\n" + traceback.format_exc()))
+
+
+def tts_step(
+    text,
+    output_path="output.wav",
+    reduce_volume=False,
+):
+    """Execute the TTS synthesis step in a subprocess."""
+    step_start = time.perf_counter()
+    logger.info("[TTS Step] TTS synthesis...")
+
+    # Create subprocess for TTS
+    logger.info("[TTS Step] Starting TTS subprocess...")
+    result_queue = Queue()
+    tts_process = Process(
+        target=tts_worker_process,
+        args=(text, output_path, result_queue, reduce_volume)
+    )
+    tts_process.start()
+
+    # Wait for result
+    status, result = result_queue.get()
+    tts_process.join()
+    tts_process.close()
+
+    if status == "error":
+        logger.error("[TTS Step] TTS subprocess failed: %s", result)
+        logger.warning("[TTS Step] TTS synthesis failed")
+        return {"audio": None}
+
+    sr, wav_data_list = result
+    wav_data = np.array(wav_data_list, dtype=np.int16)  # Convert back from list
+
+    audio_duration = len(wav_data) / sr
+    logger.info("[TTS Step] TTS synthesis done, audio duration: %.2fs, elapsed: %.2fs", audio_duration, time.perf_counter() - step_start)
+
+    return {"audio": (sr, wav_data)}
+
+
+# ============================================================================
+# Pipeline
+# ============================================================================
+
+def drug_ocr_pipeline(
+    ocr_model_dir,
+    llm_model_dir,
+    image_path,
+    enable_split=True,
+    num_splits=4,
+    overlap_ratio=0.1,
+    ocr_max_new_tokens=5120,
+    llm_max_new_tokens=1024,
+    tensor_parallel_size=2,
+    reduce_volume=False,
+):
+    """Drug instruction leaflet intelligent recognition and voice broadcast pipeline.
+
+    Uses subprocess for each model to ensure proper memory cleanup.
+    """
+    pipeline_start = time.perf_counter()
+    logger.info("=" * 60)
+    logger.info("Drug OCR pipeline started (subprocess mode)")
+    logger.info("  Image path: %s", image_path)
+    logger.info("  Image split: %s (num_splits=%d, overlap=%.2f)", enable_split, num_splits, overlap_ratio)
+    logger.info("=" * 60)
+
+    result = {}
+
+    # Step 1: OCR (runs in subprocess, automatically cleaned up)
+    ocr_result = ocr_step(
+        ocr_model_dir=ocr_model_dir,
+        image_path=image_path,
+        enable_split=enable_split,
+        num_splits=num_splits,
+        overlap_ratio=overlap_ratio,
+        max_new_tokens=ocr_max_new_tokens,
+    )
+    result["ocr_text"] = ocr_result["ocr_text"]
+
+    # Step 2: LLM extraction (runs in subprocess, automatically cleaned up)
+    llm_result = llm_step(
+        llm_model_dir=llm_model_dir,
+        ocr_text=ocr_result["ocr_text"],
+        max_new_tokens=llm_max_new_tokens,
+        tensor_parallel_size=tensor_parallel_size,
+    )
+    result["extracted_info"] = llm_result["extracted_info"]
+
+    # Step 3: TTS synthesis (runs in subprocess, automatically cleaned up)
+    tts_result = tts_step(
+        text=llm_result["extracted_info"],
+        reduce_volume=reduce_volume,
+    )
+    result["audio"] = tts_result["audio"]
+
+    pipeline_elapsed = time.perf_counter() - pipeline_start
+    logger.info("=" * 60)
+    logger.info("Pipeline complete, total elapsed: %.2fs", pipeline_elapsed)
+    logger.info("=" * 60)
+
+    return result
+
+
+# ============================================================================
+# CLI entry point
+# ============================================================================
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Drug instruction leaflet intelligent recognition and voice broadcast pipeline",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""Examples:
+  python drug_ocr_cli.py --image resource/1.jpg
+  python drug_ocr_cli.py --image resource/1.jpg --no-split
+  python drug_ocr_cli.py --image resource/1.jpg --num-splits 9 --overlap 0.15
+  python drug_ocr_cli.py --image resource/1.jpg --ocr-tokens 5120 --llm-tokens 1024
+""",
+    )
+    parser.add_argument("--image", required=True, help="Path to the drug instruction leaflet image")
+    parser.add_argument("--ocr-model", default="baidu/PaddleOCR-VL-1.5", help="OCR model directory (default: baidu/PaddleOCR-VL-1.5)")
+    parser.add_argument("--llm-model", default="baidu/ERNIE-4.5-0.3B-Paddle", help="LLM model directory (default: baidu/ERNIE-4.5-0.3B-Paddle)")
+    parser.add_argument("--no-split", dest="enable_split", action="store_false", help="Disable image splitting")
+    parser.add_argument("--num-splits", type=int, default=4, choices=[4, 9, 16], help="Number of image splits (must be perfect square, default: 4)")
+    parser.add_argument("--overlap", type=float, default=0.1, help="Overlap ratio for image splits (default: 0.1)")
+    parser.add_argument("--ocr-tokens", type=int, default=5120, help="OCR max new tokens (default: 5120)")
+    parser.add_argument("--llm-tokens", type=int, default=1024, help="LLM max new tokens (default: 1024)")
+    parser.add_argument("--tensor-parallel-size", type=int, default=2, choices=[1, 2], help="Tensor parallel size for LLM (default: 2)")
+    parser.add_argument("--reduce-volume", action="store_true", help="Apply ffmpeg volume=-90dB to TTS output audio")
+    parser.add_argument("--output-audio", default=None, help="Output audio file path (default: output.wav in current directory)")
+    parser.add_argument("--output-text", default=None, help="Output extracted text file path (default: print to stdout only)")
+
+    args = parser.parse_args()
+
+    # Configure logging
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s [%(name)s] %(levelname)s: %(message)s",
+        datefmt="%H:%M:%S",
+    )
+
+    # Validate image path
+    if not os.path.isfile(args.image):
+        print(f"Error: Image file not found: {args.image}", file=sys.stderr)
+        sys.exit(1)
+
+    # Validate model directories
+    # if not Path(args.ocr_model).exists():
+    #     print(f"Error: OCR model directory not found: {args.ocr_model}", file=sys.stderr)
+    #     sys.exit(1)
+    # if not Path(args.llm_model).exists():
+    #     print(f"Error: LLM model directory not found: {args.llm_model}", file=sys.stderr)
+    #     sys.exit(1)
+
+    # Run pipeline
+    result = drug_ocr_pipeline(
+        ocr_model_dir=args.ocr_model,
+        llm_model_dir=args.llm_model,
+        image_path=args.image,
+        enable_split=args.enable_split,
+        num_splits=args.num_splits,
+        overlap_ratio=args.overlap,
+        ocr_max_new_tokens=args.ocr_tokens,
+        llm_max_new_tokens=args.llm_tokens,
+        tensor_parallel_size=args.tensor_parallel_size,
+        reduce_volume=args.reduce_volume,
+    )
+
+    # Print results
+    print("\n" + "=" * 60)
+    print("OCR Result:")
+    print("=" * 60)
+    print(result["ocr_text"])
+
+    print("\n" + "=" * 60)
+    print("Extracted Info:")
+    print("=" * 60)
+    print(result["extracted_info"])
+
+    # Save extracted text if requested
+    if args.output_text:
+        with open(args.output_text, "w", encoding="utf-8") as f:
+            f.write(result["extracted_info"])
+        print(f"\nExtracted text saved to: {args.output_text}")
+
+    # Save audio
+    if result["audio"] is not None:
+        sr, wav_data = result["audio"]
+        audio_path = args.output_audio or "output.wav"
+        wav_write(audio_path, sr, wav_data.astype(np.float32))
+        audio_duration = len(wav_data) / sr
+        print(f"\nAudio saved to: {audio_path} (duration: {audio_duration:.2f}s)")
+    else:
+        print("\nTTS synthesis failed, no audio output.")
+
+
+if __name__ == "__main__":
+    main()
+
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png
new file mode 100644
index 00000000..d1b3e20f
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png
new file mode 100644
index 00000000..d2efd019
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/ernie_28b.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/ernie_28b.png
new file mode 100644
index 00000000..07a5aadc
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/ernie_28b.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook.png
new file mode 100644
index 00000000..1363c70a
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png
new file mode 100644
index 00000000..84853809
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png
new file mode 100644
index 00000000..d2f6e2af
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb
new file mode 100644
index 00000000..ec34bd1e
--- /dev/null
+++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb
@@ -0,0 +1,1512 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a1b2c3d4",
+   "metadata": {},
+   "source": [
+    "# 药品说明书智能识别与语音播报系统\n",
+    "\n",
+    "## 项目说明\n",
+    "\n",
+    "针对药品说明书字体太小、老年人看不清读不懂的问题，本项目通过以下三个步骤，将药品说明书中的重点内容识别提取并语音播报：\n",
+    "\n",
+    "1. **OCR 识别**：使用 PaddleOCR-VL-1.5 模型对药品说明书图片进行文字识别\n",
+    "2. **大模型整理**：使用 ERNIE-4.5 大模型对识别的文字进行整理，提取关键信息\n",
+    "3. **语音合成播报**：使用 PaddleSpeech 语音合成模型将整理后的文字转为音频文件\n",
+    "\n",
+    "### 提取的关键信息包括：\n",
+    "1. 药品名称\n",
+    "2. 药品适应症\n",
+    "3. 药品的用法与用量\n",
+    "4. 药品的禁忌\n",
+    "5. 药品的不良反应\n",
+    "\n",
+    "### 技术栈：\n",
+    "- OCR: PaddleOCR-VL-1.5\n",
+    "- LLM: ERNIE-4.5-0.3B-Paddle\n",
+    "- TTS: PaddleSpeech bert-base-chinese\n",
+    "\n",
+    "### 内存优化（子进程模式）：\n",
+    "为确保内存完全释放，本系统采用**子进程模式**运行每个模型：\n",
+    "- 每个模型在独立的子进程中加载和执行\n",
+    "- 子进程完成后自动销毁，确保内存完全释放\n",
+    "- 主进程仅负责数据传递和流程控制，不加载模型\n",
+    "- 例如：OCR 在子进程运行，完成后子进程销毁，再启动 LLM 子进程\n",
+    "\n",
+    "#### 目录：\n",
+    "- [模型下载与检查](#模型下载与检查)\n",
+    "- [生成参数设置](#生成参数设置)\n",
+    "- [OCR 模块](#OCR-模块)\n",
+    "- [LLM 模块](#LLM-模块)\n",
+    "- [TTS 模块](#TTS-模块)\n",
+    "- [管线编排与模型管理](#管线编排与模型管理)\n",
+    "- [主流程](#主流程)\n",
+    "- [Gradio 交互界面](#Gradio-交互界面)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "088dfe7b-8df9-47d3-b94d-70db4eb1a2a9",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:08:00.811267Z",
+     "iopub.status.busy": "2026-05-03T06:08:00.811134Z"
+    }
+   },
+   "source": [
+    "%pip install -r requirements.txt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "bdb8d7d5",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:11:06.332147Z",
+     "iopub.status.busy": "2026-05-03T06:11:06.332023Z",
+     "iopub.status.idle": "2026-05-03T06:11:11.425006Z",
+     "shell.execute_reply": "2026-05-03T06:11:11.423507Z",
+     "shell.execute_reply.started": "2026-05-03T06:11:06.332128Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Found existing installation: opencc-python-reimplemented 0.1.6\r\n",
+      "Uninstalling opencc-python-reimplemented-0.1.6:\r\n",
+      "  Successfully uninstalled opencc-python-reimplemented-0.1.6\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\r\n",
+      "Looking in indexes: http://mirrors.baidubce.com/pypi/simple/\r\n",
+      "Collecting opencc-python-reimplemented==0.1.6\r\n",
+      "  Using cached opencc_python_reimplemented-0.1.6-py2.py3-none-any.whl\r\n",
+      "Installing collected packages: opencc-python-reimplemented\r\n",
+      "Successfully installed opencc-python-reimplemented-0.1.6\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\r\n",
+      "Found existing installation: aistudio-sdk 0.3.8\r\n",
+      "Uninstalling aistudio-sdk-0.3.8:\r\n",
+      "  Successfully uninstalled aistudio-sdk-0.3.8\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\r\n",
+      "Looking in indexes: http://mirrors.baidubce.com/pypi/simple/\r\n",
+      "Collecting aistudio-sdk==0.3.8\r\n",
+      "  Using cached http://mirrors.baidubce.com/pypi/packages/cb/77/cd71a481bb7a76b0e9d0b6bf47711c627b1dd079001ea246893f19a9d04c/aistudio_sdk-0.3.8-py3-none-any.whl (62 kB)\r\n",
+      "Requirement already satisfied: psutil in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (7.2.1)\r\n",
+      "Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (2.32.5)\r\n",
+      "Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (4.67.1)\r\n",
+      "Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (0.9.59)\r\n",
+      "Requirement already satisfied: prettytable in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (3.17.0)\r\n",
+      "Requirement already satisfied: click in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (8.3.1)\r\n",
+      "Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (3.23.0)\r\n",
+      "Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (1.0.0)\r\n",
+      "Requirement already satisfied: six>=1.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (1.17.0)\r\n",
+      "Requirement already satisfied: wcwidth in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from prettytable->aistudio-sdk==0.3.8) (0.2.14)\r\n",
+      "Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (3.4.4)\r\n",
+      "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (3.11)\r\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in ./external-libraries/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (1.26.20)\r\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (2026.1.4)\r\n",
+      "Installing collected packages: aistudio-sdk\r\n",
+      "\u001b[33m  WARNING: The script aistudio is installed in '/home/aistudio/external-libraries/bin' which is not on PATH.\r\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\r\n",
+      "\u001b[0mSuccessfully installed aistudio-sdk-0.3.8\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip uninstall opencc-python-reimplemented -y\n",
+    "%pip install opencc-python-reimplemented==0.1.6\n",
+    "%pip uninstall aistudio-sdk -y\n",
+    "%pip install aistudio-sdk==0.3.8\n",
+    "# PaddleSpeech use 0.2.6 with should be patched"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "71b16cd1",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:11:11.426579Z",
+     "iopub.status.busy": "2026-05-03T06:11:11.426259Z",
+     "iopub.status.idle": "2026-05-03T06:11:11.436812Z",
+     "shell.execute_reply": "2026-05-03T06:11:11.435719Z",
+     "shell.execute_reply.started": "2026-05-03T06:11:11.426550Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "File already patched.\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "\"\"\"Patch script to fix aistudio_sdk import in paddlenlp.\n",
+    "\n",
+    "Uses importlib.util.find_spec to locate paddlenlp WITHOUT importing it,\n",
+    "so this can be run before paddlenlp is imported to prevent the ImportError.\n",
+    "\"\"\"\n",
+    "\n",
+    "import importlib.util\n",
+    "import os\n",
+    "import subprocess\n",
+    "\n",
+    "\n",
+    "def _find_paddlenlp_dir():\n",
+    "    # Method 1: find_spec (no import, just metadata)\n",
+    "    spec = importlib.util.find_spec(\"paddlenlp\")\n",
+    "    if spec and spec.origin:\n",
+    "        return os.path.dirname(spec.origin)\n",
+    "\n",
+    "    # Method 2: pip show as fallback\n",
+    "    result = subprocess.run(\n",
+    "        [\"pip\", \"show\", \"paddlenlp\"],\n",
+    "        capture_output=True, text=True,\n",
+    "    )\n",
+    "    for line in result.stdout.splitlines():\n",
+    "        if line.startswith(\"Location:\"):\n",
+    "            return os.path.join(line.split(\":\", 1)[1].strip(), \"paddlenlp\")\n",
+    "\n",
+    "    raise RuntimeError(\"Cannot locate paddlenlp installation directory\")\n",
+    "\n",
+    "\n",
+    "def patch_aistudio_utils():\n",
+    "    pkg_dir = _find_paddlenlp_dir()\n",
+    "    target_file = os.path.join(pkg_dir, \"transformers\", \"aistudio_utils.py\")\n",
+    "\n",
+    "    if not os.path.isfile(target_file):\n",
+    "        raise FileNotFoundError(f\"Target file not found: {target_file}\")\n",
+    "\n",
+    "    old_line = \"from aistudio_sdk.hub import download\"\n",
+    "    new_line = \"from aistudio_sdk import snapshot_download as download\"\n",
+    "\n",
+    "    with open(target_file, \"r\", encoding=\"utf-8\") as f:\n",
+    "        content = f.read()\n",
+    "\n",
+    "    if old_line not in content:\n",
+    "        if new_line in content:\n",
+    "            print(\"File already patched.\")\n",
+    "        else:\n",
+    "            print(f\"Target import not found in {target_file}\")\n",
+    "        return\n",
+    "\n",
+    "    patched = content.replace(old_line, new_line)\n",
+    "\n",
+    "    with open(target_file, \"w\", encoding=\"utf-8\") as f:\n",
+    "        f.write(patched)\n",
+    "\n",
+    "    print(f\"Patched: {target_file}\")\n",
+    "    print(f\"  {old_line}  =>  {new_line}\")\n",
+    "\n",
+    "\n",
+    "patch_aistudio_utils()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9d0e1f2",
+   "metadata": {},
+   "source": [
+    "## 模型下载与检查\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "从 AIStudio 下载三个模型（如果已存在则跳过），并检查模型文件是否完整。\n",
+    "\n",
+    "> **注意**：此步骤仅下载和检查模型，**不加载模型到内存**。模型将在管线运行时按需加载。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a3b4c5d6",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:12.765664Z",
+     "iopub.status.busy": "2026-05-03T06:14:12.765530Z",
+     "iopub.status.idle": "2026-05-03T06:14:12.771812Z",
+     "shell.execute_reply": "2026-05-03T06:14:12.770795Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:12.765642Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "OCR 模型已存在: baidu/PaddleOCR-VL-1.5，跳过下载\r\n",
+      "LLM 模型已存在: baidu/ERNIE-4.5-0.3B-Paddle，跳过下载\r\n",
+      "TTS 模型将在首次使用时自动下载\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "from pathlib import Path\n",
+    "import subprocess\n",
+    "\n",
+    "# --- OCR 模型 ---\n",
+    "ocr_model_dir = Path(\"baidu/PaddleOCR-VL-1.5\")\n",
+    "\n",
+    "if not ocr_model_dir.exists():\n",
+    "    subprocess.run([\"aistudio\", \"download\", \"--model\", \"PaddlePaddle/PaddleOCR-VL-1.5\", \"--local_dir\", str(ocr_model_dir)], check=True)\n",
+    "    print(f\"OCR 模型已下载到: {ocr_model_dir}\")\n",
+    "else:\n",
+    "    print(f\"OCR 模型已存在: {ocr_model_dir}，跳过下载\")\n",
+    "\n",
+    "# --- LLM 模型 ---\n",
+    "llm_model_dir = Path(\"baidu/ERNIE-4.5-0.3B-Paddle\")\n",
+    "\n",
+    "if not llm_model_dir.exists():\n",
+    "    subprocess.run([\"aistudio\", \"download\", \"--model\", \"PaddlePaddle/ERNIE-4.5-0.3B-Paddle\", \"--local_dir\", str(llm_model_dir)], check=True)\n",
+    "    print(f\"LLM 模型已下载到: {llm_model_dir}\")\n",
+    "else:\n",
+    "    print(f\"LLM 模型已存在: {llm_model_dir}，跳过下载\")\n",
+    "\n",
+    "# --- TTS 模型 ---\n",
+    "# PaddleSpeech bert-base-chinese 会在首次使用时自动下载\n",
+    "print(\"TTS 模型将在首次使用时自动下载\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7f8a9b0",
+   "metadata": {},
+   "source": [
+    "## 生成参数设置\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "设置模型的 `max_new_tokens` 参数，控制每个模型生成的最大 token 数量：\n",
+    "- **OCR max_new_tokens**：PaddleOCR-VL 识别文字时的最大生成长度，说明书内容多时建议调大\n",
+    "- **LLM max_new_tokens**：ERNIE 提取信息时的最大生成长度，需要更详细整理时可调大"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "c1d2e3f4",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:12.773035Z",
+     "iopub.status.busy": "2026-05-03T06:14:12.772880Z",
+     "iopub.status.idle": "2026-05-03T06:14:12.777084Z",
+     "shell.execute_reply": "2026-05-03T06:14:12.775954Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:12.773015Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "OCR max_new_tokens: 200\r\n",
+      "LLM max_new_tokens: 200\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "# OCR 最大生成 token 数（说明书内容多时建议调大，默认 5120）\n",
+    "ocr_max_new_tokens = 200\n",
+    "\n",
+    "# LLM 最大生成 token 数（需要更详细整理时可调大，默认 1024）\n",
+    "llm_max_new_tokens = 200\n",
+    "\n",
+    "print(f\"OCR max_new_tokens: {ocr_max_new_tokens}\")\n",
+    "print(f\"LLM max_new_tokens: {llm_max_new_tokens}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "md_ocr_module",
+   "metadata": {},
+   "source": [
+    "## OCR 模块\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "包含图片分割、OCR 子进程工作函数，以及可独立执行的 `ocr_step`。\n",
+    "\n",
+    "**子进程模式**：OCR 模型在独立子进程中加载和执行，完成后子进程自动销毁，确保内存完全释放。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "code_ocr_module",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:12.777863Z",
+     "iopub.status.busy": "2026-05-03T06:14:12.777725Z",
+     "iopub.status.idle": "2026-05-03T06:14:12.993437Z",
+     "shell.execute_reply": "2026-05-03T06:14:12.992174Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:12.777846Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ OCR 模块定义完成 (子进程模式)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import base64\n",
+    "import gc\n",
+    "import io\n",
+    "import logging\n",
+    "import math\n",
+    "import time\n",
+    "import multiprocessing as mp\n",
+    "from multiprocessing import Process, Queue\n",
+    "\n",
+    "from PIL import Image\n",
+    "\n",
+    "logger = logging.getLogger(\"drug_ocr\")\n",
+    "\n",
+    "\n",
+    "# ---- 图片分割 ----\n",
+    "\n",
+    "def split_image(image, num_splits=4, overlap_ratio=0.1):\n",
+    "    \"\"\"Split an image into num_splits parts (NxN grid) with overlap.\"\"\"\n",
+    "    grid_size = int(math.sqrt(num_splits))\n",
+    "    if grid_size * grid_size != num_splits:\n",
+    "        raise ValueError(f\"num_splits must be a perfect square (e.g. 4, 9, 16), got: {num_splits}\")\n",
+    "\n",
+    "    w, h = image.size\n",
+    "    cell_w = w / grid_size\n",
+    "    cell_h = h / grid_size\n",
+    "    overlap_w = cell_w * overlap_ratio\n",
+    "    overlap_h = cell_h * overlap_ratio\n",
+    "\n",
+    "    sub_images = []\n",
+    "    for row in range(grid_size):\n",
+    "        for col in range(grid_size):\n",
+    "            left = max(0, col * cell_w - overlap_w)\n",
+    "            upper = max(0, row * cell_h - overlap_h)\n",
+    "            right = min(w, (col + 1) * cell_w + overlap_w)\n",
+    "            lower = min(h, (row + 1) * cell_h + overlap_h)\n",
+    "            sub_img = image.crop((int(left), int(upper), int(right), int(lower)))\n",
+    "            sub_images.append(sub_img)\n",
+    "\n",
+    "    return sub_images\n",
+    "\n",
+    "\n",
+    "# ---- OCR 子进程工作函数 ----\n",
+    "\n",
+    "def ocr_worker_process(ocr_model_dir, image_data_list, max_new_tokens, result_queue):\n",
+    "    \"\"\"Worker function for OCR subprocess - loads model, performs OCR, returns result.\"\"\"\n",
+    "    try:\n",
+    "        import time\n",
+    "        import base64\n",
+    "        import io\n",
+    "        from PIL import Image\n",
+    "        from fastdeploy import LLM, SamplingParams\n",
+    "\n",
+    "        # Load OCR model\n",
+    "        print(\"[OCR Worker] 加载 OCR 模型 (PaddleOCR-VL)...\")\n",
+    "        start = time.perf_counter()\n",
+    "        ocr_model = LLM(\n",
+    "            model=ocr_model_dir,\n",
+    "            tensor_parallel_size=1,\n",
+    "            max_model_len=8192,\n",
+    "            block_size=16,\n",
+    "            quantization=\"wint8\",\n",
+    "            graph_optimization_config={\"use_cudagraph\": False},\n",
+    "        )\n",
+    "        elapsed = time.perf_counter() - start\n",
+    "        print(f\"[OCR Worker] OCR 模型加载完成, 耗时: {elapsed:.2f}s\")\n",
+    "\n",
+    "        # Process each image\n",
+    "        all_ocr_texts = []\n",
+    "        for i, img_bytes in enumerate(image_data_list):\n",
+    "            image = Image.open(io.BytesIO(img_bytes)).convert(\"RGB\")\n",
+    "            print(f\"[OCR Worker] 识别图片 {i+1}/{len(image_data_list)}, 尺寸: {image.size}\")\n",
+    "\n",
+    "            # Prepare image for OCR\n",
+    "            buf = io.BytesIO()\n",
+    "            image.save(buf, format=\"PNG\")\n",
+    "            base64_image = base64.b64encode(buf.getvalue()).decode(\"utf-8\")\n",
+    "            image_url = f\"data:image/png;base64,{base64_image}\"\n",
+    "\n",
+    "            prompts = [{\n",
+    "                \"messages\": [{\n",
+    "                    \"role\": \"user\",\n",
+    "                    \"content\": [\n",
+    "                        {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n",
+    "                        {\"type\": \"text\", \"text\": \"OCR:\"},\n",
+    "                    ],\n",
+    "                }]\n",
+    "            }]\n",
+    "            sampling_params = SamplingParams(\n",
+    "                temperature=0.8, top_p=0.95, max_tokens=max_new_tokens,\n",
+    "            )\n",
+    "            outputs = ocr_model.generate(prompts, sampling_params)\n",
+    "            response = outputs[0].outputs.text\n",
+    "            all_ocr_texts.append(response)\n",
+    "            print(f\"[OCR Worker] 图片 {i+1} 识别完成, 文字长度: {len(response)}\")\n",
+    "\n",
+    "        # Combine results\n",
+    "        combined_text = \"\\n\\n\".join(all_ocr_texts)\n",
+    "        print(f\"[OCR Worker] 全部识别完成, 总文字长度: {len(combined_text)}\")\n",
+    "\n",
+    "        # Put result in queue\n",
+    "        result_queue.put((\"success\", combined_text))\n",
+    "\n",
+    "        # Clean up\n",
+    "        del ocr_model\n",
+    "        import gc\n",
+    "        gc.collect()\n",
+    "        print(\"[OCR Worker] OCR 模型已释放\")\n",
+    "\n",
+    "    except Exception as e:\n",
+    "        import traceback\n",
+    "        result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n",
+    "\n",
+    "\n",
+    "# ---- 独立 OCR 步骤 (使用子进程) ----\n",
+    "\n",
+    "def ocr_step(\n",
+    "    ocr_model_dir,\n",
+    "    image_path,\n",
+    "    enable_split=True,\n",
+    "    num_splits=4,\n",
+    "    overlap_ratio=0.1,\n",
+    "    max_new_tokens=5120,\n",
+    "):\n",
+    "    \"\"\"Execute the OCR step in a subprocess: load image, optionally split, and run OCR.\"\"\"\n",
+    "    step_start = time.perf_counter()\n",
+    "    logger.info(\"[OCR Step] 加载图片...\")\n",
+    "    image = Image.open(image_path).convert(\"RGB\")\n",
+    "    logger.info(\"[OCR Step] 图片加载完成, 尺寸: %s\", image.size)\n",
+    "\n",
+    "    if enable_split:\n",
+    "        logger.info(\"[OCR Step] 图片分割 (num_splits=%d, overlap=%.2f)...\", num_splits, overlap_ratio)\n",
+    "        sub_images = split_image(image, num_splits=num_splits, overlap_ratio=overlap_ratio)\n",
+    "        ocr_images = [image] + sub_images\n",
+    "        logger.info(\"[OCR Step] 图片分割完成, 原始1张 + 分割%d张 = 共%d张\", len(sub_images), len(ocr_images))\n",
+    "    else:\n",
+    "        logger.info(\"[OCR Step] 跳过图片分割\")\n",
+    "        ocr_images = [image]\n",
+    "\n",
+    "    # Serialize images to bytes for subprocess\n",
+    "    image_data_list = []\n",
+    "    for img in ocr_images:\n",
+    "        buf = io.BytesIO()\n",
+    "        img.save(buf, format=\"PNG\")\n",
+    "        image_data_list.append(buf.getvalue())\n",
+    "\n",
+    "    # Create subprocess for OCR\n",
+    "    logger.info(\"[OCR Step] 启动 OCR 子进程...\")\n",
+    "    result_queue = Queue()\n",
+    "    ocr_process = Process(\n",
+    "        target=ocr_worker_process,\n",
+    "        args=(str(ocr_model_dir), image_data_list, max_new_tokens, result_queue)\n",
+    "    )\n",
+    "    ocr_process.start()\n",
+    "\n",
+    "    # Wait for result\n",
+    "    status, result = result_queue.get()\n",
+    "    ocr_process.join()\n",
+    "    ocr_process.close()\n",
+    "\n",
+    "    if status == \"error\":\n",
+    "        logger.error(\"[OCR Step] OCR 子进程执行失败: %s\", result)\n",
+    "        raise RuntimeError(f\"OCR subprocess failed: {result}\")\n",
+    "\n",
+    "    combined_ocr_text = result\n",
+    "    logger.info(\"[OCR Step] OCR 识别全部完成, 总文字长度: %d, 耗时: %.2fs\", len(combined_ocr_text), time.perf_counter() - step_start)\n",
+    "\n",
+    "    return {\"ocr_text\": combined_ocr_text, \"ocr_images\": ocr_images}\n",
+    "\n",
+    "print(\"✅ OCR 模块定义完成 (子进程模式)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "md_llm_module",
+   "metadata": {},
+   "source": [
+    "## LLM 模块\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "包含文本清洗（`clean_for_tts`）、LLM 子进程工作函数，以及可独立执行的 `llm_step`。\n",
+    "\n",
+    "**子进程模式**：LLM 模型在独立子进程中加载和执行，完成后子进程自动销毁，确保内存完全释放。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "code_llm_module",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:12.994715Z",
+     "iopub.status.busy": "2026-05-03T06:14:12.994434Z",
+     "iopub.status.idle": "2026-05-03T06:14:13.009434Z",
+     "shell.execute_reply": "2026-05-03T06:14:13.008415Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:12.994692Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ LLM 模块定义完成 (子进程模式)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import re\n",
+    "from multiprocessing import Process, Queue\n",
+    "\n",
+    "\n",
+    "# ---- 文本清洗 ----\n",
+    "\n",
+    "def clean_for_tts(text):\n",
+    "    \"\"\"Clean text for TTS synthesis by removing emojis and markdown formatting.\"\"\"\n",
+    "    # Remove emojis (Unicode ranges for common emojis)\n",
+    "    # NOTE: Must avoid ranges that overlap with CJK characters (U+4E00-U+9FFF)\n",
+    "    text = re.sub(\n",
+    "        r\"[\\U0001F600-\\U0001F64F\"  # emoticons\n",
+    "        r\"\\U0001F300-\\U0001F5FF\"   # symbols & pictographs\n",
+    "        r\"\\U0001F680-\\U0001F6FF\"   # transport & map\n",
+    "        r\"\\U0001F1E0-\\U0001F1FF\"   # flags\n",
+    "        r\"\\U00002702-\\U000027B0\"   # dingbats\n",
+    "        r\"\\U000024C2-\\U0000324F\"   # enclosed alphanumerics (stop before CJK)\n",
+    "        r\"\\U0001F200-\\U0001F251\"   # enclosed CJK supplement (above CJK range)\n",
+    "        r\"\\U0001F900-\\U0001F9FF\"   # supplemental symbols\n",
+    "        r\"\\U0001FA00-\\U0001FA6F\"   # chess symbols\n",
+    "        r\"\\U0001FA70-\\U0001FAFF\"   # symbols extended-A\n",
+    "        r\"\\U00002600-\\U000026FF\"   # misc symbols\n",
+    "        r\"\\U0000FE00-\\U0000FE0F\"   # variation selectors\n",
+    "        r\"\\U0000200D\"              # zero-width joiner\n",
+    "        r\"]+\",\n",
+    "        \"\",\n",
+    "        text,\n",
+    "    )\n",
+    "    # Remove markdown code blocks (```...```)\n",
+    "    text = re.sub(r\"```.*?```\", \"\", text, flags=re.DOTALL)\n",
+    "    # Remove inline code (`...`) -> content\n",
+    "    text = re.sub(r\"`([^`\\n]+)`\", r\"\\1\", text)\n",
+    "    # Remove markdown headers (# ## ### etc.) at line start\n",
+    "    text = re.sub(r\"^#{1,6}\\s+\", \"\", text, flags=re.MULTILINE)\n",
+    "    # Remove markdown bold (**text**) -> text\n",
+    "    text = re.sub(r\"\\*\\*([^*\\n]+?)\\*\\*\", r\"\\1\", text)\n",
+    "    # Remove markdown bold (__text__) -> text\n",
+    "    text = re.sub(r\"__([^_\\n]+?)__\", r\"\\1\", text)\n",
+    "    # Remove markdown italic (*text*) -> text\n",
+    "    text = re.sub(r\"\\*([^*\\n]+?)\\*\", r\"\\1\", text)\n",
+    "    # Remove markdown italic (_text_) -> text (only when _ is at word boundary)\n",
+    "    text = re.sub(r\"(?<!\\w)_([^_\\n]+?)_(?!\\w)\", r\"\\1\", text)\n",
+    "    # Remove markdown links [text](url) -> text\n",
+    "    text = re.sub(r\"\\[([^\\]]+)\\]\\([^)]+\\)\", r\"\\1\", text)\n",
+    "    # Remove markdown images ![alt](url)\n",
+    "    text = re.sub(r\"!\\[[^\\]]*\\]\\([^)]+\\)\", \"\", text)\n",
+    "    # Remove markdown horizontal rules (---, ***, ___)\n",
+    "    text = re.sub(r\"^[-*_]{3,}\\s*$\", \"\", text, flags=re.MULTILINE)\n",
+    "    # Remove markdown bullet list markers (- , * , + ) at line start, keep content\n",
+    "    text = re.sub(r\"^(\\s*)[-*+]\\s+\", r\"\\1\", text, flags=re.MULTILINE)\n",
+    "    # Remove markdown numbered list markers (1. 2. etc.) at line start, keep content\n",
+    "    text = re.sub(r\"^(\\s*)\\d+\\.\\s+\", r\"\\1\", text, flags=re.MULTILINE)\n",
+    "    # Remove markdown table pipes\n",
+    "    text = re.sub(r\"\\|\", \" \", text)\n",
+    "    # Remove markdown table separator lines (---:---:---)\n",
+    "    text = re.sub(r\"^[-: ]+$\", \"\", text, flags=re.MULTILINE)\n",
+    "    # Collapse multiple blank lines into one\n",
+    "    text = re.sub(r\"\\n{3,}\", \"\\n\\n\", text)\n",
+    "    # Strip leading/trailing whitespace per line\n",
+    "    lines = [line.strip() for line in text.splitlines()]\n",
+    "    text = \"\\n\".join(lines)\n",
+    "    # Remove leading/trailing whitespace overall\n",
+    "    text = text.strip()\n",
+    "    return text\n",
+    "\n",
+    "\n",
+    "# ---- LLM 子进程工作函数 ----\n",
+    "\n",
+    "def llm_worker_process(llm_model_dir, ocr_text, max_new_tokens, result_queue):\n",
+    "    \"\"\"Worker function for LLM subprocess - loads model, extracts info, returns result.\"\"\"\n",
+    "    try:\n",
+    "        import time\n",
+    "        from fastdeploy import LLM, SamplingParams\n",
+    "\n",
+    "        # Load LLM model\n",
+    "        print(\"[LLM Worker] 加载 LLM 模型 (ERNIE)...\")\n",
+    "        start = time.perf_counter()\n",
+    "        llm_model = LLM(\n",
+    "            model=llm_model_dir,\n",
+    "            tensor_parallel_size=1,\n",
+    "            max_model_len=8192,\n",
+    "            block_size=16,\n",
+    "            quantization=\"wint8\",\n",
+    "            graph_optimization_config={\"use_cudagraph\": False},\n",
+    "        )\n",
+    "        elapsed = time.perf_counter() - start\n",
+    "        print(f\"[LLM Worker] LLM 模型加载完成, 耗时: {elapsed:.2f}s\")\n",
+    "\n",
+    "        # Prepare prompt\n",
+    "        prompt_text = f\"\"\"以下是药品说明书的 OCR 识别结果，供参考：\n",
+    "\n",
+    "{ocr_text}\n",
+    "\n",
+    "请根据以上 OCR 识别结果，提取并整理以下关键信息，用清晰易懂的语言重新表述，方便老年人阅读理解：\n",
+    "\n",
+    "1. 药品名称\n",
+    "2. 药品适应症（这个药治什么病）\n",
+    "3. 药品的用法与用量（怎么吃、吃多少）\n",
+    "4. 药品的禁忌（什么人不能吃、什么情况不能吃）\n",
+    "5. 药品的不良反应（吃药后可能出现的不舒服）\n",
+    "\n",
+    "要求：\n",
+    "- 只输出整理后的关键信息，不要重复或复述 OCR 原文\n",
+    "- 用简洁、通俗的语言回答，避免使用专业术语\n",
+    "- 不要使用表情符号、emoji\n",
+    "- 不要使用markdown格式符号（如#、**、-等），直接用纯文本输出\n",
+    "- 用自然流畅的口语化表达，方便语音播报\n",
+    "- 总字数控制在 {max_new_tokens} 字以内\"\"\"\n",
+    "\n",
+    "\n",
+    "\n",
+    "        # todo\n",
+    "        prompt_text = \"你是谁\"\n",
+    "\n",
+    "        prompts = [prompt_text]\n",
+    "        sampling_params = SamplingParams(\n",
+    "            temperature=0.8, top_p=0.95, max_tokens=max_new_tokens,\n",
+    "        )\n",
+    "\n",
+    "        print(f\"[LLM Worker] 正在生成回复 (max_new_tokens={max_new_tokens})...\")\n",
+    "        gen_start = time.perf_counter()\n",
+    "        outputs = llm_model.generate(prompts, sampling_params)\n",
+    "        result = outputs[0].outputs.text\n",
+    "        gen_elapsed = time.perf_counter() - gen_start\n",
+    "\n",
+    "        # Clean result\n",
+    "        result = clean_for_tts(result)\n",
+    "        print(f\"[LLM Worker] 信息提取完成, 生成耗时: {gen_elapsed:.2f}s, 结果长度: {len(result)}\")\n",
+    "\n",
+    "        # todo\n",
+    "        print(\">>>\", result)\n",
+    "\n",
+    "        # Put result in queue\n",
+    "        result_queue.put((\"success\", result))\n",
+    "\n",
+    "        # Clean up\n",
+    "        del llm_model\n",
+    "        import gc\n",
+    "        gc.collect()\n",
+    "        print(\"[LLM Worker] LLM 模型已释放\")\n",
+    "\n",
+    "    except Exception as e:\n",
+    "        import traceback\n",
+    "        result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n",
+    "\n",
+    "\n",
+    "# ---- 独立 LLM 步骤 (使用子进程) ----\n",
+    "\n",
+    "def llm_step(\n",
+    "    llm_model_dir,\n",
+    "    ocr_text,\n",
+    "    max_new_tokens=1024,\n",
+    "):\n",
+    "    \"\"\"Execute the LLM extraction step in a subprocess.\"\"\"\n",
+    "    step_start = time.perf_counter()\n",
+    "    logger.info(\"[LLM Step] LLM 大模型信息提取...\")\n",
+    "\n",
+    "    # Create subprocess for LLM\n",
+    "    logger.info(\"[LLM Step] 启动 LLM 子进程...\")\n",
+    "    result_queue = Queue()\n",
+    "    llm_process = Process(\n",
+    "        target=llm_worker_process,\n",
+    "        args=(str(llm_model_dir), ocr_text, max_new_tokens, result_queue)\n",
+    "    )\n",
+    "    llm_process.start()\n",
+    "\n",
+    "    # Wait for result\n",
+    "    status, result = result_queue.get()\n",
+    "    llm_process.join()\n",
+    "    llm_process.close()\n",
+    "\n",
+    "    if status == \"error\":\n",
+    "        logger.error(\"[LLM Step] LLM 子进程执行失败: %s\", result)\n",
+    "        raise RuntimeError(f\"LLM subprocess failed: {result}\")\n",
+    "\n",
+    "    extracted_info = result\n",
+    "    logger.info(\"[LLM Step] LLM 信息提取完成, 结果长度: %d, 耗时: %.2fs\", len(extracted_info), time.perf_counter() - step_start)\n",
+    "\n",
+    "    return {\"extracted_info\": extracted_info}\n",
+    "\n",
+    "print(\"✅ LLM 模块定义完成 (子进程模式)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "md_tts_module",
+   "metadata": {},
+   "source": [
+    "## TTS 模块\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "包含 TTS 子进程工作函数，以及可独立执行的 `tts_step`。\n",
+    "\n",
+    "**子进程模式**：TTS 模型在独立子进程中加载和执行，完成后子进程自动销毁，确保内存完全释放。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "code_tts_module",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:13.010352Z",
+     "iopub.status.busy": "2026-05-03T06:14:13.010203Z",
+     "iopub.status.idle": "2026-05-03T06:14:13.390794Z",
+     "shell.execute_reply": "2026-05-03T06:14:13.389438Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:13.010334Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ TTS 模块定义完成 (子进程模式)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "from multiprocessing import Process, Queue\n",
+    "from scipy.io.wavfile import read as wav_read\n",
+    "\n",
+    "\n",
+    "# ---- TTS 子进程工作函数 ----\n",
+    "\n",
+    "def tts_worker_process(text, output_path, result_queue):\n",
+    "    \"\"\"Worker function for TTS subprocess - loads model, synthesizes speech, returns result.\"\"\"\n",
+    "    try:\n",
+    "        import time\n",
+    "        from paddlespeech.cli.tts.infer import TTSExecutor\n",
+    "        from scipy.io.wavfile import read as wav_read\n",
+    "\n",
+    "        # Load TTS model\n",
+    "        print(\"[TTS Worker] 加载 TTS 模型 (PaddleSpeech)...\")\n",
+    "        start = time.perf_counter()\n",
+    "        tts_model = TTSExecutor()\n",
+    "        elapsed = time.perf_counter() - start\n",
+    "        print(f\"[TTS Worker] TTS 模型加载完成, 耗时: {elapsed:.2f}s\")\n",
+    "\n",
+    "        # Synthesize speech\n",
+    "        print(f\"[TTS Worker] 语音合成开始, 输入文字长度: {len(text)}\")\n",
+    "        tts_model(text=text, output=output_path)\n",
+    "\n",
+    "        # Read audio data\n",
+    "        sr, wav_data = wav_read(output_path)\n",
+    "\n",
+    "        if wav_data is not None:\n",
+    "            audio_duration = len(wav_data) / sr\n",
+    "            print(f\"[TTS Worker] 语音合成完成, 音频时长: {audio_duration:.2f}s, 采样率: {sr} Hz\")\n",
+    "            result_queue.put((\"success\", (sr, wav_data.tolist())))  # Convert to list for serialization\n",
+    "        else:\n",
+    "            print(\"[TTS Worker] 语音合成失败\")\n",
+    "            result_queue.put((\"error\", \"TTS synthesis failed\"))\n",
+    "\n",
+    "        # Clean up\n",
+    "        del tts_model\n",
+    "        import gc\n",
+    "        gc.collect()\n",
+    "        print(\"[TTS Worker] TTS 模型已释放\")\n",
+    "\n",
+    "    except Exception as e:\n",
+    "        import traceback\n",
+    "        result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n",
+    "\n",
+    "\n",
+    "# ---- 独立 TTS 步骤 (使用子进程) ----\n",
+    "\n",
+    "def tts_step(\n",
+    "    text,\n",
+    "    output_path=\"output.wav\",\n",
+    "):\n",
+    "    \"\"\"Execute the TTS synthesis step in a subprocess.\"\"\"\n",
+    "    step_start = time.perf_counter()\n",
+    "    logger.info(\"[TTS Step] TTS 语音合成...\")\n",
+    "\n",
+    "    # Create subprocess for TTS\n",
+    "    logger.info(\"[TTS Step] 启动 TTS 子进程...\")\n",
+    "    result_queue = Queue()\n",
+    "    tts_process = Process(\n",
+    "        target=tts_worker_process,\n",
+    "        args=(text, output_path, result_queue)\n",
+    "    )\n",
+    "    tts_process.start()\n",
+    "\n",
+    "    # Wait for result\n",
+    "    status, result = result_queue.get()\n",
+    "    tts_process.join()\n",
+    "    tts_process.close()\n",
+    "\n",
+    "    if status == \"error\":\n",
+    "        logger.error(\"[TTS Step] TTS 子进程执行失败: %s\", result)\n",
+    "        logger.warning(\"[TTS Step] TTS 语音合成失败\")\n",
+    "        return {\"audio\": None}\n",
+    "\n",
+    "    sr, wav_data_list = result\n",
+    "    import numpy as np\n",
+    "    wav_data = np.array(wav_data_list, dtype=np.int16)  # Convert back from list\n",
+    "\n",
+    "    audio_duration = len(wav_data) / sr\n",
+    "    logger.info(\"[TTS Step] TTS 语音合成完成, 音频时长: %.2fs, 耗时: %.2fs\", audio_duration, time.perf_counter() - step_start)\n",
+    "\n",
+    "    return {\"audio\": (sr, wav_data)}\n",
+    "\n",
+    "print(\"✅ TTS 模块定义完成 (子进程模式)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "md_orchestration",
+   "metadata": {},
+   "source": [
+    "## 管线编排\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "`drug_ocr_pipeline` 串联 OCR → LLM → TTS 三个步骤，每个步骤在独立子进程中执行，`make_demo` 构建 Gradio 界面。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "code_orchestration",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:13.674124Z",
+     "iopub.status.busy": "2026-05-03T06:14:13.673982Z",
+     "iopub.status.idle": "2026-05-03T06:14:16.051569Z",
+     "shell.execute_reply": "2026-05-03T06:14:16.050351Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:13.674106Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ 管线编排与 Gradio 界面定义完成 (子进程模式)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tempfile\n",
+    "\n",
+    "import numpy as np\n",
+    "import gradio as gr\n",
+    "from scipy.io.wavfile import write as wav_write\n",
+    "\n",
+    "\n",
+    "def drug_ocr_pipeline(\n",
+    "    ocr_model_dir,\n",
+    "    llm_model_dir,\n",
+    "    image_path,\n",
+    "    enable_split=True,\n",
+    "    num_splits=4,\n",
+    "    overlap_ratio=0.1,\n",
+    "    ocr_max_new_tokens=5120,\n",
+    "    llm_max_new_tokens=1024,\n",
+    "):\n",
+    "    \"\"\"Drug instruction leaflet intelligent recognition and voice broadcast pipeline.\n",
+    "    \n",
+    "    Uses subprocess for each model to ensure proper memory cleanup.\n",
+    "    \"\"\"\n",
+    "    pipeline_start = time.perf_counter()\n",
+    "    logger.info(\"=\" * 60)\n",
+    "    logger.info(\"药品说明书识别管线启动 (子进程模式)\")\n",
+    "    logger.info(\"  图片路径: %s\", image_path)\n",
+    "    logger.info(\"  图片分割: %s (num_splits=%d, overlap=%.2f)\", enable_split, num_splits, overlap_ratio)\n",
+    "    logger.info(\"=\" * 60)\n",
+    "\n",
+    "    result = {}\n",
+    "\n",
+    "    # Step 1: OCR (runs in subprocess, automatically cleaned up)\n",
+    "    ocr_result = ocr_step(\n",
+    "        ocr_model_dir=ocr_model_dir,\n",
+    "        image_path=image_path,\n",
+    "        enable_split=enable_split,\n",
+    "        num_splits=num_splits,\n",
+    "        overlap_ratio=overlap_ratio,\n",
+    "        max_new_tokens=ocr_max_new_tokens,\n",
+    "    )\n",
+    "    result[\"ocr_text\"] = ocr_result[\"ocr_text\"]\n",
+    "\n",
+    "    # Step 2: LLM extraction (runs in subprocess, automatically cleaned up)\n",
+    "    llm_result = llm_step(\n",
+    "        llm_model_dir=llm_model_dir,\n",
+    "        ocr_text=ocr_result[\"ocr_text\"],\n",
+    "        max_new_tokens=llm_max_new_tokens,\n",
+    "    )\n",
+    "    result[\"extracted_info\"] = llm_result[\"extracted_info\"]\n",
+    "\n",
+    "    # Step 3: TTS synthesis (runs in subprocess, automatically cleaned up)\n",
+    "    tts_result = tts_step(\n",
+    "        text=llm_result[\"extracted_info\"],\n",
+    "    )\n",
+    "    result[\"audio\"] = tts_result[\"audio\"]\n",
+    "\n",
+    "    pipeline_elapsed = time.perf_counter() - pipeline_start\n",
+    "    logger.info(\"=\" * 60)\n",
+    "    logger.info(\"管线执行完成, 总耗时: %.2fs\", pipeline_elapsed)\n",
+    "    logger.info(\"=\" * 60)\n",
+    "\n",
+    "    return result\n",
+    "\n",
+    "\n",
+    "def make_demo(ocr_model_dir, llm_model_dir, ocr_max_new_tokens=5120, llm_max_new_tokens=1024):\n",
+    "    \"\"\"Create Gradio demo for Drug OCR Pipeline.\"\"\"\n",
+    "\n",
+    "    def gradio_pipeline(\n",
+    "        image_input,\n",
+    "        enable_split,\n",
+    "        num_splits,\n",
+    "        overlap_ratio,\n",
+    "        ocr_max_tokens,\n",
+    "        llm_max_tokens,\n",
+    "        progress=gr.Progress(track_tqdm=True),\n",
+    "    ):\n",
+    "        \"\"\"Gradio interface main processing function\"\"\"\n",
+    "        if image_input is None:\n",
+    "            return \"请上传药品说明书图片\", \"\", None\n",
+    "\n",
+    "        # Convert uploaded image to PIL Image\n",
+    "        if isinstance(image_input, str):\n",
+    "            image = Image.open(image_input).convert(\"RGB\")\n",
+    "        else:\n",
+    "            image = Image.fromarray(image_input).convert(\"RGB\") if not isinstance(image_input, Image.Image) else image_input\n",
+    "\n",
+    "        # Save as temp file for pipeline\n",
+    "        with tempfile.NamedTemporaryFile(suffix=\".jpg\", delete=False) as tmp:\n",
+    "            image.save(tmp.name)\n",
+    "            tmp_path = tmp.name\n",
+    "\n",
+    "        try:\n",
+    "            result = drug_ocr_pipeline(\n",
+    "                ocr_model_dir=ocr_model_dir,\n",
+    "                llm_model_dir=llm_model_dir,\n",
+    "                image_path=tmp_path,\n",
+    "                enable_split=enable_split,\n",
+    "                num_splits=int(num_splits),\n",
+    "                overlap_ratio=overlap_ratio,\n",
+    "                ocr_max_new_tokens=int(ocr_max_tokens),\n",
+    "                llm_max_new_tokens=int(llm_max_tokens),\n",
+    "            )\n",
+    "\n",
+    "            ocr_text = result[\"ocr_text\"]\n",
+    "            extracted_info = result[\"extracted_info\"]\n",
+    "\n",
+    "            # Save audio as temp file\n",
+    "            audio_path = None\n",
+    "            if result[\"audio\"] is not None:\n",
+    "                sr, wav_data = result[\"audio\"]\n",
+    "                audio_tmp = tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False)\n",
+    "                wav_write(audio_tmp.name, sr, wav_data.astype(np.float32))\n",
+    "                audio_path = audio_tmp.name\n",
+    "\n",
+    "            return ocr_text, extracted_info, audio_path\n",
+    "        finally:\n",
+    "            import os\n",
+    "            os.unlink(tmp_path)\n",
+    "\n",
+    "    with gr.Blocks(title=\"药品说明书智能识别与语音播报\") as demo:\n",
+    "        gr.Markdown(\"# 药品说明书智能识别与语音播报系统\")\n",
+    "        gr.Markdown(\"上传药品说明书图片，系统将自动识别文字、提取关键信息并语音播报，帮助老年人看清读懂药品说明书。\")\n",
+    "\n",
+    "        with gr.Row():\n",
+    "            with gr.Column(scale=1):\n",
+    "                image_input = gr.Image(label=\"药品说明书图片\", type=\"filepath\")\n",
+    "\n",
+    "                with gr.Accordion(\"图片分割设置\", open=True):\n",
+    "                    enable_split = gr.Checkbox(value=True, label=\"启用图片分割（文字太小时建议开启）\")\n",
+    "                    num_splits = gr.Dropdown(choices=[4, 9, 16], value=4, label=\"分割数量\")\n",
+    "                    overlap_ratio = gr.Slider(minimum=0.0, maximum=0.3, value=0.1, step=0.05, label=\"重叠比例\")\n",
+    "\n",
+    "                with gr.Accordion(\"生成参数设置\", open=True):\n",
+    "                    ocr_max_tokens = gr.Slider(minimum=100, maximum=8192, value=ocr_max_new_tokens, step=1, label=\"OCR 最大生成 token 数\")\n",
+    "                    llm_max_tokens = gr.Slider(minimum=100, maximum=4096, value=llm_max_new_tokens, step=1, label=\"LLM 最大生成 token 数\")\n",
+    "\n",
+    "                run_btn = gr.Button(\"开始识别\", variant=\"primary\")\n",
+    "\n",
+    "            with gr.Column(scale=1):\n",
+    "                ocr_output = gr.Textbox(label=\"OCR 识别结果\", lines=10, max_lines=20)\n",
+    "                info_output = gr.Textbox(label=\"关键信息整理\", lines=15, max_lines=30)\n",
+    "                audio_output = gr.Audio(label=\"语音播报\", type=\"filepath\")\n",
+    "\n",
+    "        run_btn.click(\n",
+    "            fn=gradio_pipeline,\n",
+    "            inputs=[\n",
+    "                image_input,\n",
+    "                enable_split,\n",
+    "                num_splits,\n",
+    "                overlap_ratio,\n",
+    "                ocr_max_tokens,\n",
+    "                llm_max_tokens,\n",
+    "            ],\n",
+    "            outputs=[ocr_output, info_output, audio_output],\n",
+    "        )\n",
+    "\n",
+    "    return demo\n",
+    "\n",
+    "print(\"✅ 管线编排与 Gradio 界面定义完成 (子进程模式)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5b6c7d8",
+   "metadata": {},
+   "source": [
+    "## 主流程\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "主流程包含以下步骤：\n",
+    "1. 加载图片\n",
+    "2. 图片分割（可选，针对文字太小的说明书，将图片切割成多部分进行识别，分割的图片有重叠）\n",
+    "3. OCR 文字识别（**在子进程中加载模型，完成后销毁子进程**）\n",
+    "4. 大模型文字整理（**在子进程中加载模型，完成后销毁子进程**）\n",
+    "5. 语音合成（**在子进程中加载模型，完成后销毁子进程**）\n",
+    "\n",
+    "> 每个步骤在独立的子进程中执行，子进程完成后自动销毁，确保模型内存完全释放。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "e9f0a1b2",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:16.146943Z",
+     "iopub.status.busy": "2026-05-03T06:14:16.146807Z",
+     "iopub.status.idle": "2026-05-03T06:14:16.151226Z",
+     "shell.execute_reply": "2026-05-03T06:14:16.150312Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:16.146924Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ 日志配置完成 (级别: INFO)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import logging\n",
+    "\n",
+    "logging.basicConfig(\n",
+    "    level=logging.INFO,\n",
+    "    format=\"%(asctime)s [%(name)s] %(levelname)s: %(message)s\",\n",
+    "    datefmt=\"%H:%M:%S\",\n",
+    ")\n",
+    "\n",
+    "print(\"✅ 日志配置完成 (级别: INFO)\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "c3d4e5f6",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:16.152272Z",
+     "iopub.status.busy": "2026-05-03T06:14:16.152120Z",
+     "iopub.status.idle": "2026-05-03T06:14:16.155642Z",
+     "shell.execute_reply": "2026-05-03T06:14:16.154737Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:16.152254Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ 子进程模式已启用 - 模型将在需要时自动加载和释放\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 模型管理器已移除 - 现在使用子进程模式\n",
+    "# 每个模型在独立的子进程中加载、执行、然后自动销毁\n",
+    "print(\"✅ 子进程模式已启用 - 模型将在需要时自动加载和释放\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a7b8c9d0",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:16.156170Z",
+     "iopub.status.busy": "2026-05-03T06:14:16.156041Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "14:14:16 [drug_ocr] INFO: ============================================================\r\n",
+      "14:14:16 [drug_ocr] INFO: 药品说明书识别管线启动 (子进程模式)\r\n",
+      "14:14:16 [drug_ocr] INFO:   图片路径: resource/1.jpg\r\n",
+      "14:14:16 [drug_ocr] INFO:   图片分割: False (num_splits=4, overlap=0.10)\r\n",
+      "14:14:16 [drug_ocr] INFO: ============================================================\r\n",
+      "14:14:16 [drug_ocr] INFO: [OCR Step] 加载图片...\r\n",
+      "14:14:16 [drug_ocr] INFO: [OCR Step] 图片加载完成, 尺寸: (2014, 2881)\r\n",
+      "14:14:16 [drug_ocr] INFO: [OCR Step] 跳过图片分割\r\n",
+      "14:14:17 [drug_ocr] INFO: [OCR Step] 启动 OCR 子进程...\r\n",
+      "I0503 14:14:18.096459 1035908 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device\r\n",
+      "I0503 14:14:18.096537 1035908 init.cc:146] Try loading custom device libs from: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:14:18.217633 1035908 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so\r\n",
+      "I0503 14:14:18.217679 1035908 custom_device_load.cc:58] Skipped lib [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.\r\n",
+      "I0503 14:14:18.224740 1035908 custom_kernel.cc:68] Succeed in loading 887 custom kernel(s) from loaded lib(s), will be used like native ones.\r\n",
+      "I0503 14:14:18.225076 1035908 init.cc:158] Finished in LoadCustomDevice with libs_path: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:14:18.225135 1035908 init.cc:244] CustomDevice: iluvatar_gpu, visible devices count: 1\r\n",
+      "WARNING  2026-05-03 14:14:18,795 1035908 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_dad76550-a346-4423-aa82-44018eeaf3ba was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n",
+      "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\r\n",
+      "\u001b[33m[2026-05-03 14:14:19,226] [ WARNING]\u001b[0m - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults `transformers.utils.import_utils.is_torch_available` and `transformers.utils.import_utils.is_torchvision_available` to False. If you need to use PyTorch in transformers or torchvision, please add `del sys.modules['transformers']` before using them.\u001b[0m\r\n",
+      "WARNING  2026-05-03 14:14:19,740 1035908 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_dad76550-a346-4423-aa82-44018eeaf3ba was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n",
+      "WARNING  2026-05-03 14:14:19,750 1035908 ops.py[line:125] Failed to import cache manager ops: Prefix cache ops only supported CUDA nor XPU platform \r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[OCR Worker] 加载 OCR 模型 (PaddleOCR-VL)...\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO     2026-05-03 14:14:21,132 1035908 args_utils.py[line:639] Parameter `engine_worker_queue_port` is not specified, found available ports for possible use: [28305]\r\n",
+      "INFO     2026-05-03 14:14:21,134 1035908 args_utils.py[line:639] Parameter `cache_queue_port` is not specified, found available ports for possible use: [38724]\r\n",
+      "INFO     2026-05-03 14:14:21,136 1035908 args_utils.py[line:639] Parameter `rdma_comm_ports` is not specified, found available ports for possible use: [14751]\r\n",
+      "INFO     2026-05-03 14:14:21,139 1035908 args_utils.py[line:639] Parameter `pd_comm_port` is not specified, found available ports for possible use: [19484]\r\n",
+      "INFO     2026-05-03 14:14:21,140 1035908 download.py[line:142] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:14:21,141 1035908 configuration_utils.py[line:1215] Loading configuration file baidu/PaddleOCR-VL-1.5/config.json\r\n",
+      "WARNING  2026-05-03 14:14:21,143 1035908 configuration_utils.py[line:1246] You are using a model of type paddleocr_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n",
+      "WARNING  2026-05-03 14:14:21,144 1035908 configuration_utils.py[line:1246] You are using a model of type paddleocr_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n",
+      "INFO     2026-05-03 14:14:22,130 1035908 flash_attn_backend.py[line:105] Only support CUDA version flash attention.\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "current sm_version=71\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING  2026-05-03 14:14:22,285 1035908 moe.py[line:41] import noaux_tc Failed!\r\n",
+      "INFO     2026-05-03 14:14:24,261 1035908 download.py[line:142] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:14:24,264 1035908 configuration_utils.py[line:425] Loading configuration file baidu/PaddleOCR-VL-1.5/generation_config.json\r\n",
+      "INFO     2026-05-03 14:14:24,284 1035908 tokenizer_utils.py[line:257] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:14:25,941 1035908 engine.py[line:151] Waiting for worker processes to be ready...\r\n",
+      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\r\n",
+      "To disable this warning, you can either:\r\n",
+      "\t- Avoid using `tokenizers` before the fork if possible\r\n",
+      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\r\n",
+      "Loading Weights: 100%|██████████| 100/100 [00:07<00:00, 13.23it/s] \r\n",
+      "Loading Layers: 100%|██████████| 100/100 [00:00<00:00, 198.88it/s]   \r\n",
+      "INFO     2026-05-03 14:14:39,443 1035908 engine.py[line:209] Worker processes are launched with 16.92835831642151 seconds.\r\n",
+      "INFO     2026-05-03 14:14:39,445 1035908 engine.py[line:220] Detected 10922 gpu blocks and 0 cpu blocks in cache (block size: 16).\r\n",
+      "INFO     2026-05-03 14:14:39,446 1035908 engine.py[line:223] FastDeploy will be serving 8 running requests if each sequence reaches its maximum length: 8192\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[OCR Worker] OCR 模型加载完成, 耗时: 18.32s\r\n",
+      "[OCR Worker] 识别图片 1/1, 尺寸: (2014, 2881)\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][2026-05-03 14:14:41] [1035908] [INFO] Prefill batch, dp_rank: 0, #new-seq: 1, #new-token: 1231, #cached-token: 0, token usage: 0.01, #running-req: 1, #queue-req: 0, \r\n",
+      "[2026-05-03 14:14:44] [1035908] [INFO] Decode batch, dp_rank: 0, #running-req: 1, #token: 1392, token usage: 0.01, cuda graph: False, gen throughput (token/s): 5.34, #queue-req: 0, \r\n",
+      "Processed prompts: 100%|██████████| 1/1 [00:18<00:00, 18.79s/it, est. speed input: 0.00 toks/s, output: 0.00 toks/s]\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[OCR Worker] 图片 1 识别完成, 文字长度: 294\r\n",
+      "[OCR Worker] 全部识别完成, 总文字长度: 294\r\n",
+      "[OCR Worker] OCR 模型已释放\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "14:15:01 [drug_ocr] INFO: [OCR Step] OCR 识别全部完成, 总文字长度: 294, 耗时: 45.70s\r\n",
+      "14:15:01 [drug_ocr] INFO: [LLM Step] LLM 大模型信息提取...\r\n",
+      "14:15:01 [drug_ocr] INFO: [LLM Step] 启动 LLM 子进程...\r\n",
+      "I0503 14:15:02.384891 1076624 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device\r\n",
+      "I0503 14:15:02.384958 1076624 init.cc:146] Try loading custom device libs from: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:15:02.493517 1076624 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so\r\n",
+      "I0503 14:15:02.493556 1076624 custom_device_load.cc:58] Skipped lib [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.\r\n",
+      "I0503 14:15:02.502454 1076624 custom_kernel.cc:68] Succeed in loading 887 custom kernel(s) from loaded lib(s), will be used like native ones.\r\n",
+      "I0503 14:15:02.502794 1076624 init.cc:158] Finished in LoadCustomDevice with libs_path: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:15:02.502851 1076624 init.cc:244] CustomDevice: iluvatar_gpu, visible devices count: 1\r\n",
+      "WARNING  2026-05-03 14:15:03,083 1076624 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_24ad65e3-2498-460c-97d2-9a88e46fe8f6 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n",
+      "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\r\n",
+      "\u001b[33m[2026-05-03 14:15:03,510] [ WARNING]\u001b[0m - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults `transformers.utils.import_utils.is_torch_available` and `transformers.utils.import_utils.is_torchvision_available` to False. If you need to use PyTorch in transformers or torchvision, please add `del sys.modules['transformers']` before using them.\u001b[0m\r\n",
+      "WARNING  2026-05-03 14:15:03,878 1076624 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_24ad65e3-2498-460c-97d2-9a88e46fe8f6 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n",
+      "WARNING  2026-05-03 14:15:03,886 1076624 ops.py[line:125] Failed to import cache manager ops: Prefix cache ops only supported CUDA nor XPU platform \r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[LLM Worker] 加载 LLM 模型 (ERNIE)...\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO     2026-05-03 14:15:04,961 1076624 args_utils.py[line:639] Parameter `engine_worker_queue_port` is not specified, found available ports for possible use: [58094]\r\n",
+      "INFO     2026-05-03 14:15:04,964 1076624 args_utils.py[line:639] Parameter `cache_queue_port` is not specified, found available ports for possible use: [56896]\r\n",
+      "INFO     2026-05-03 14:15:04,967 1076624 args_utils.py[line:639] Parameter `rdma_comm_ports` is not specified, found available ports for possible use: [41390]\r\n",
+      "INFO     2026-05-03 14:15:04,970 1076624 args_utils.py[line:639] Parameter `pd_comm_port` is not specified, found available ports for possible use: [19643]\r\n",
+      "INFO     2026-05-03 14:15:04,972 1076624 download.py[line:142] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:15:04,973 1076624 configuration_utils.py[line:1215] Loading configuration file baidu/ERNIE-4.5-0.3B-Paddle/config.json\r\n",
+      "WARNING  2026-05-03 14:15:04,975 1076624 configuration_utils.py[line:1246] You are using a model of type ernie4_5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n",
+      "INFO     2026-05-03 14:15:06,143 1076624 flash_attn_backend.py[line:105] Only support CUDA version flash attention.\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "current sm_version=71\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING  2026-05-03 14:15:06,307 1076624 moe.py[line:41] import noaux_tc Failed!\r\n",
+      "INFO     2026-05-03 14:15:07,224 1076624 download.py[line:142] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:15:07,227 1076624 configuration_utils.py[line:425] Loading configuration file baidu/ERNIE-4.5-0.3B-Paddle/generation_config.json\r\n",
+      "WARNING  2026-05-03 14:15:07,229 1076624 log.py[line:135] PretrainedTokenizer will be deprecated and removed in the next major release. Please migrate to Hugging Face's transformers.PreTrainedTokenizer. use class QWenTokenizer(PaddleTokenizerMixin, hf.PreTrainedTokenizer) to support multisource download and Paddle tokenizer operations.\r\n",
+      "INFO     2026-05-03 14:15:09,492 1076624 engine.py[line:151] Waiting for worker processes to be ready...\r\n",
+      "Loading Weights: 100%|██████████| 100/100 [00:04<00:00, 24.85it/s] \r\n",
+      "Loading Layers: 100%|██████████| 100/100 [00:00<00:00, 199.46it/s]    \r\n",
+      "INFO     2026-05-03 14:15:20,035 1076624 engine.py[line:209] Worker processes are launched with 13.396349906921387 seconds.\r\n",
+      "INFO     2026-05-03 14:15:20,036 1076624 engine.py[line:220] Detected 10922 gpu blocks and 0 cpu blocks in cache (block size: 16).\r\n",
+      "INFO     2026-05-03 14:15:20,037 1076624 engine.py[line:223] FastDeploy will be serving 8 running requests if each sequence reaches its maximum length: 8192\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[LLM Worker] LLM 模型加载完成, 耗时: 15.08s\r\n",
+      "[LLM Worker] 正在生成回复 (max_new_tokens=200)...\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][2026-05-03 14:15:20] [1076624] [INFO] Prefill batch, dp_rank: 0, #new-seq: 1, #new-token: 1, #cached-token: 0, token usage: 0.00, #running-req: 1, #queue-req: 0, \r\n",
+      "[2026-05-03 14:15:22] [1076624] [INFO] Decode batch, dp_rank: 0, #running-req: 1, #token: 176, token usage: 0.00, cuda graph: False, gen throughput (token/s): 8.23, #queue-req: 0, \r\n",
+      "Processed prompts: 100%|██████████| 1/1 [00:03<00:00,  3.45s/it, est. speed input: 0.00 toks/s, output: 0.00 toks/s]\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[LLM Worker] 信息提取完成, 生成耗时: 3.46s, 结果长度: 289\r\n",
+      ">>> 在这样一支粉色的手指往前一拉，我像一只蝴蝶似的飞到了你的身边\r\n",
+      "\r\n",
+      "你轻轻地将我的手贴在脸颊，柔软的触感瞬间让我一下子陷了进去\r\n",
+      "\r\n",
+      "“喜欢就好，别舍不得，我们一起去海边好不好？”\r\n",
+      "\r\n",
+      "我微微一笑，眼神带着一丝甜蜜，嘴角不自觉地扬起了\r\n",
+      "\r\n",
+      "“好，那就一起去，我保证不弄疼你，我们一起海边，好不好？”\r\n",
+      "\r\n",
+      "我环住你，紧紧地靠在你的身上，感受着你的温度和怀抱的柔软\r\n",
+      "\r\n",
+      "你轻轻地将我搂入怀中，仿佛一只受伤的小动物，任由我紧紧地依靠着你\r\n",
+      "\r\n",
+      "随着一阵海风轻拂，我们来到了海边\r\n",
+      "\r\n",
+      "风轻轻掀起了我的长发，海浪一波一波地涌来\r\n",
+      "\r\n",
+      "我仰头看着那片广阔无垠的蓝，心中满是向往\r\n",
+      "\r\n",
+      "“这就是我想要的，这是我第一次来这里\r\n",
+      "[LLM Worker] LLM 模型已释放\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "14:15:30 [drug_ocr] INFO: [LLM Step] LLM 信息提取完成, 结果长度: 289, 耗时: 28.24s\r\n",
+      "14:15:30 [drug_ocr] INFO: [TTS Step] TTS 语音合成...\r\n",
+      "14:15:30 [drug_ocr] INFO: [TTS Step] 启动 TTS 子进程...\r\n",
+      "I0503 14:15:30.627210 1088334 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device\r\n",
+      "I0503 14:15:30.627287 1088334 init.cc:146] Try loading custom device libs from: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:15:30.751516 1088334 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so\r\n",
+      "I0503 14:15:30.751560 1088334 custom_device_load.cc:58] Skipped lib [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.\r\n",
+      "I0503 14:15:30.759230 1088334 custom_kernel.cc:68] Succeed in loading 887 custom kernel(s) from loaded lib(s), will be used like native ones.\r\n",
+      "I0503 14:15:30.759569 1088334 init.cc:158] Finished in LoadCustomDevice with libs_path: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:15:30.759625 1088334 init.cc:244] CustomDevice: iluvatar_gpu, visible devices count: 1\r\n",
+      "\u001b[0;93m2026-05-03 14:15:34.944031381 [W:onnxruntime:Default, cpuid_info.cc:91 LogEarlyWarning] Unknown CPU vendor. cpuinfo_vendor value: 16\u001b[m\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[TTS Worker] 加载 TTS 模型 (PaddleSpeech)...\r\n",
+      "[TTS Worker] TTS 模型加载完成, 耗时: 0.00s\r\n",
+      "[TTS Worker] 语音合成开始, 输入文字长度: 289\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "from pathlib import Path\n",
+    "\n",
+    "sample_image_path = str(Path(\"resource/1.jpg\"))\n",
+    "\n",
+    "result = drug_ocr_pipeline(\n",
+    "    ocr_model_dir=ocr_model_dir,\n",
+    "    llm_model_dir=llm_model_dir,\n",
+    "    image_path=sample_image_path,\n",
+    "    enable_split=False,\n",
+    "    num_splits=4,\n",
+    "    overlap_ratio=0.1,\n",
+    "    ocr_max_new_tokens=ocr_max_new_tokens,\n",
+    "    llm_max_new_tokens=llm_max_new_tokens,\n",
+    ")\n",
+    "\n",
+    "print(\"\\n\" + \"=\" * 60)\n",
+    "print(\"📋 OCR 识别结果:\")\n",
+    "print(\"=\" * 60)\n",
+    "print(result[\"ocr_text\"][:500] + \"...\" if len(result[\"ocr_text\"]) > 500 else result[\"ocr_text\"])\n",
+    "\n",
+    "print(\"\\n\" + \"=\" * 60)\n",
+    "print(\"📝 大模型整理结果:\")\n",
+    "print(\"=\" * 60)\n",
+    "print(result[\"extracted_info\"])\n",
+    "\n",
+    "# 播放音频\n",
+    "if result[\"audio\"] is not None:\n",
+    "    import IPython.display as ipd\n",
+    "    sr, wav_data = result[\"audio\"]\n",
+    "    print(\"\\n🔊 播放语音...\")\n",
+    "    ipd.display(ipd.Audio(wav_data, rate=sr))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1f2a3b4",
+   "metadata": {},
+   "source": [
+    "## Gradio 交互界面\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "通过 Gradio 界面，用户可以：\n",
+    "- 上传药品说明书图片\n",
+    "- 设置是否启用图片分割及分割数量\n",
+    "- 调整各模型的生成参数（max_new_tokens）\n",
+    "- 查看识别和整理结果\n",
+    "- 播放语音合成的音频\n",
+    "\n",
+    "> 每次点击\"开始识别\"时，各模型在独立子进程中执行，完成后自动销毁子进程释放内存。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c5d6e7f8",
+   "metadata": {
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "demo = make_demo(\n",
+    "    ocr_model_dir=ocr_model_dir,\n",
+    "    llm_model_dir=llm_model_dir,\n",
+    "    ocr_max_new_tokens=ocr_max_new_tokens,\n",
+    "    llm_max_new_tokens=llm_max_new_tokens,\n",
+    ")\n",
+    "\n",
+    "try:\n",
+    "    demo.launch(server_name=\"0.0.0.0\", server_port=7860, debug=True)\n",
+    "except Exception:\n",
+    "    demo.launch(debug=True, share=True)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "py35-paddle1.2.0"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}