diff --git a/README.md b/README.md
index 021ae116..b8811815 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@
 
 
 <p align="center">
-   MiniCPM-V 4.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-4.6">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-V-4.6-Demo">🤖</a> <a href="https://github.com/OpenBMB/MiniCPM-V-Apps/blob/main/DOWNLOAD.md">📱</a> | MiniCPM-o 4.5 <a href="https://huggingface.co/openbmb/MiniCPM-o-4_5">🤗</a> <a href="https://openbmb.github.io/MiniCPM-o-Demo/">📞</a> <a href="http://211.93.21.133:18121/">🤖</a> | <a href="https://huggingface.co/papers/2604.27393">📄 Technical Report</a> | <a href="https://github.com/OpenSQZ/MiniCPM-V-Cookbook">🍳 Cookbook</a>
+   MiniCPM-V 4.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-4.6">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-V-4.6-Demo">🤖</a> <a href="https://github.com/OpenBMB/MiniCPM-V-Apps/blob/main/DOWNLOAD.md">📱</a> | MiniCPM-o 4.5 <a href="https://huggingface.co/openbmb/MiniCPM-o-4_5">🤗</a> <a href="https://openbmb.github.io/MiniCPM-o-Demo/">📞</a> <a href="https://minicpmo45.modelbest.cn">🤖</a> | <a href="https://huggingface.co/papers/2604.27393">📄 Technical Report</a> | <a href="https://github.com/OpenSQZ/MiniCPM-V-Cookbook">🍳 Cookbook</a> | <a href="./docs/api.md">🌐 API Guide</a>
 </p>
 
 </div>
@@ -2424,6 +2424,8 @@ We would like to thank the following projects:
 
 The [MiniCPM-V & o Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) provides scenario-based recipes for deploying, fine-tuning, quantizing, and building demos with MiniCPM-V and MiniCPM-o. The [documentation website](https://minicpm-o.readthedocs.io/en/latest/index.html) presents these recipes in a structured way for quick lookup.
 
+To quickly use MiniCPM-V 4.6 or MiniCPM-o 4.5, see the [API Guide](./docs/api.md).
+
 It is organized for:
 
 * **Individuals**: Local inference, quantized deployment, and end-device demos.
diff --git a/README_zh.md b/README_zh.md
index cedb15e6..00cb8b73 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -20,7 +20,7 @@
 
 <!-- <br> -->
 <p align="center">
-   MiniCPM-V 4.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-4.6">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-V-4.6-Demo">🤖</a> <a href="https://github.com/OpenBMB/MiniCPM-V-Apps/blob/main/DOWNLOAD_zh.md">📱</a> | MiniCPM-o 4.5 <a href="https://huggingface.co/openbmb/MiniCPM-o-4_5">🤗</a> <a href="https://openbmb.github.io/MiniCPM-o-Demo/">📞</a> <a href="http://211.93.21.133:18121/">🤖</a> | <a href="https://huggingface.co/papers/2604.27393">📄 技术报告</a> | <a href="https://github.com/OpenSQZ/MiniCPM-V-Cookbook">🍳 使用指南</a>
+   MiniCPM-V 4.6 <a href="https://huggingface.co/openbmb/MiniCPM-V-4.6">🤗</a> <a href="https://huggingface.co/spaces/openbmb/MiniCPM-V-4.6-Demo">🤖</a> <a href="https://github.com/OpenBMB/MiniCPM-V-Apps/blob/main/DOWNLOAD_zh.md">📱</a> | MiniCPM-o 4.5 <a href="https://huggingface.co/openbmb/MiniCPM-o-4_5">🤗</a> <a href="https://openbmb.github.io/MiniCPM-o-Demo/">📞</a> <a href="https://minicpmo45.modelbest.cn">🤖</a> | <a href="https://huggingface.co/papers/2604.27393">📄 技术报告</a> | <a href="https://github.com/OpenSQZ/MiniCPM-V-Cookbook">🍳 使用指南</a> | <a href="./docs/api.md">🌐 API 指南</a>
 </p>
 
 </div>
@@ -2405,6 +2405,8 @@ MiniCPM-o 4.5 支持 vLLM, SGLang, llama.cpp, Ollama 等[推理框架](#训练
 
 欢迎探索我们整理的 [MiniCPM-V & o 使用手册 (Cookbook)](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。Cookbook 提供面向场景的教程，覆盖 MiniCPM-V 和 MiniCPM-o 的部署、微调、量化和 Demo 构建等常见任务；配套的[文档网站](https://minicpm-o.readthedocs.io/en/latest/index.html)以结构化方式呈现这些方案，便于快速查找。
 
+如需快速使用 MiniCPM-V 4.6 或 MiniCPM-o 4.5，可参考 [API Guide](./docs/api.md)。
+
 它面向以下使用场景组织内容：
 
 * **个人用户**：本地推理、量化部署和端侧 Demo。
diff --git a/docs/api.md b/docs/api.md
new file mode 100644
index 00000000..b3b7a919
--- /dev/null
+++ b/docs/api.md
@@ -0,0 +1,120 @@
+# API Guide
+
+This document introduces how to use the APIs for MiniCPM-V 4.6 and MiniCPM-o 4.5.
+
+## MiniCPM-V 4.6
+
+MiniCPM-V 4.6 can be called through the Chat Completions API. The interface supports both text-only and vision-language requests.
+
+### Endpoint
+
+```text
+Base URL: https://api.modelbest.cn/v1
+Chat API: POST /chat/completions
+Authorization: Bearer <API_KEY>
+Content-Type: application/json
+```
+
+A **free** public API key is currently available for trying the service:
+
+```text
+sk-pQ8L2zF3XmR5kY9wV4jB7hN1tC6vM0xG3aD5sH2bJ9lK4cZ8
+```
+
+Available model IDs:
+
+```text
+MiniCPM-V-4.6-1.3B-Instruct
+MiniCPM-V-4.6-1.3B-think-0506_tau2_rl
+```
+
+### Text-Only Request
+
+```bash
+curl https://api.modelbest.cn/v1/chat/completions \
+  -H "Authorization: Bearer $API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "MiniCPM-V-4.6-1.3B-Instruct",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Introduce yourself in one sentence."
+      }
+    ]
+  }'
+```
+
+To use the Thinking model, replace the `model` value with:
+
+```json
+"MiniCPM-V-4.6-1.3B-think-0506_tau2_rl"
+```
+
+### Vision-Language Request
+
+Images are passed as base64 data URLs in the `image_url` content format.
+
+```bash
+curl https://api.modelbest.cn/v1/chat/completions \
+  -H "Authorization: Bearer $API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "MiniCPM-V-4.6-1.3B-Instruct",
+    "messages": [
+      {
+        "role": "user",
+        "content": [
+          {
+            "type": "text",
+            "text": "Describe this image."
+          },
+          {
+            "type": "image_url",
+            "image_url": {
+              "url": "data:image/png;base64,<BASE64_IMAGE>"
+            }
+          }
+        ]
+      }
+    ]
+  }'
+```
+
+
+### Python Example
+
+```python
+import json
+import urllib.request
+
+api_key = "<API_KEY>"
+payload = {
+    "model": "MiniCPM-V-4.6-1.3B-Instruct",
+    "messages": [
+        {
+            "role": "user",
+            "content": "List three use cases for MiniCPM-V.",
+        }
+    ],
+}
+
+request = urllib.request.Request(
+    "https://api.modelbest.cn/v1/chat/completions",
+    data=json.dumps(payload).encode("utf-8"),
+    headers={
+        "Authorization": f"Bearer {api_key}",
+        "Content-Type": "application/json",
+    },
+    method="POST",
+)
+
+with urllib.request.urlopen(request) as response:
+    data = json.loads(response.read().decode("utf-8"))
+
+print(data["choices"][0]["message"]["content"])
+```
+
+## MiniCPM-o 4.5
+
+MiniCPM-o 4.5 provides a Realtime API for full-duplex multimodal interaction. For the current MiniCPM-o 4.5 API documentation, see the [Realtime API Overview](https://minicpmo45.modelbest.cn/docs/en/realtime-api/overview/).
diff --git a/docs/minicpm_v4dot5_en.md b/docs/minicpm_v4dot5_en.md
index 458c5917..677dbad4 100644
--- a/docs/minicpm_v4dot5_en.md
+++ b/docs/minicpm_v4dot5_en.md
@@ -16,7 +16,7 @@ Based on [LLaVA-UHD](https://arxiv.org/pdf/2403.11703) architecture, MiniCPM-V 4
 
 
 -  💫  **Easy Usage.**
-MiniCPM-V 4.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/Support-MiniCPM-V-4.5/docs/multimodal/minicpmv4.5.md) and [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4), [GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf) and [AWQ](https://github.com/tc-mb/AutoAWQ) format quantized models in 16 sizes, (3) [SGLang](https://github.com/tc-mb/sglang/tree/main) and [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [Transformers](https://github.com/tc-mb/transformers/tree/main) and [LLaMA-Factory](./docs/llamafactory_train_and_infer.md), (5) quick [local WebUI demo](#chat-with-our-demo-on-gradio), (6) optimized [local iOS app](https://github.com/OpenBMB/MiniCPM-V-Apps) on iPhone and iPad, and (7) online web demo on [server](http://101.126.42.235:30910/). See our [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) for full usage!
+MiniCPM-V 4.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/Support-MiniCPM-V-4.5/docs/multimodal/minicpmv4.5.md) and [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4), [GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf) and [AWQ](https://github.com/tc-mb/AutoAWQ) format quantized models in 16 sizes, (3) [SGLang](https://github.com/tc-mb/sglang/tree/main) and [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [Transformers](https://github.com/tc-mb/transformers/tree/main) and [LLaMA-Factory](./docs/llamafactory_train_and_infer.md), (5) quick [local WebUI demo](#chat-with-our-demo-on-gradio), (6) optimized [local iOS app](https://github.com/OpenBMB/MiniCPM-V-Apps) on iPhone and iPad, and (7) online web demo on [server](https://huggingface.co/spaces/openbmb/MiniCPM-V-4_5-Demo). See our [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) for full usage!
 
 
 ### Key Techniques <!-- omit in toc -->
diff --git a/docs/minicpm_v4dot5_zh.md b/docs/minicpm_v4dot5_zh.md
index 3534ff4d..c3e3ac59 100644
--- a/docs/minicpm_v4dot5_zh.md
+++ b/docs/minicpm_v4dot5_zh.md
@@ -18,7 +18,7 @@
   基于 [LLaVA-UHD](https://arxiv.org/pdf/2403.11703) 架构，MiniCPM-V 4.5 能处理任意长宽比、最高达 180 万像素（如 1344x1344） 的高分辨率图像，同时使用的视觉 token 数仅为多数 MLLM 的 1/4。其在 OCRBench 上取得超越 GPT-4o-latest 与 Gemini 2.5 等闭源模型的性能，并在 OmniDocBench 上展现了业界顶尖的 PDF 文档解析能力。借助最新的 [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) 和 [VisCPM](https://github.com/OpenBMB/VisCPM) 技术，模型在可靠性上表现优异，在 MMHal-Bench 上超越 GPT-4o-latest，并支持 30+ 种语言的多语言能力。
 
 -  💫 **便捷易用的部署方式**
-  MiniCPM-V 4.5 提供丰富灵活的使用方式：(1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/master/docs/multimodal/minicpmo4.5.md) 与 [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) 支持本地 CPU 高效推理；(2) 提供 [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4)、[GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf)、[AWQ](https://github.com/tc-mb/AutoAWQ) 等 16 种规格的量化模型；(3)兼容 SGLang 与 [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) (4) 借助 [Transformers](https://github.com/tc-mb/transformers/tree/main) 与 [LLaMA-Factory](./docs/llamafactory_train_and_infer.md) 在新领域与任务上进行微调；(5) 快速启动本地 [WebUI demo](#chat-with-our-demo-on-gradio)；(6) 优化适配的 [iOS 本地应用](https://github.com/OpenBMB/MiniCPM-V-Apps)，可在 iPhone 与 iPad 上高效运行；(7) 在线 [Web demo](http://101.126.42.235:30910/) 体验。更多使用方式请见 [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。
+  MiniCPM-V 4.5 提供丰富灵活的使用方式：(1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/master/docs/multimodal/minicpmo4.5.md) 与 [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) 支持本地 CPU 高效推理；(2) 提供 [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4)、[GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf)、[AWQ](https://github.com/tc-mb/AutoAWQ) 等 16 种规格的量化模型；(3)兼容 SGLang 与 [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) (4) 借助 [Transformers](https://github.com/tc-mb/transformers/tree/main) 与 [LLaMA-Factory](./docs/llamafactory_train_and_infer.md) 在新领域与任务上进行微调；(5) 快速启动本地 [WebUI demo](#chat-with-our-demo-on-gradio)；(6) 优化适配的 [iOS 本地应用](https://github.com/OpenBMB/MiniCPM-V-Apps)，可在 iPhone 与 iPad 上高效运行；(7) 在线 [Web demo](https://huggingface.co/spaces/openbmb/MiniCPM-V-4_5-Demo) 体验。更多使用方式请见 [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。
 
 ### 技术亮点 <!-- omit in toc -->