diff --git a/README.md b/README.md
index 021ae116..b8811815 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@
- MiniCPM-V 4.6 🤗 🤖 📱 | MiniCPM-o 4.5 🤗 📞 🤖 | 📄 Technical Report | 🍳 Cookbook
+ MiniCPM-V 4.6 🤗 🤖 📱 | MiniCPM-o 4.5 🤗 📞 🤖 | 📄 Technical Report | 🍳 Cookbook | 🌐 API Guide
@@ -2424,6 +2424,8 @@ We would like to thank the following projects:
The [MiniCPM-V & o Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) provides scenario-based recipes for deploying, fine-tuning, quantizing, and building demos with MiniCPM-V and MiniCPM-o. The [documentation website](https://minicpm-o.readthedocs.io/en/latest/index.html) presents these recipes in a structured way for quick lookup.
+To quickly use MiniCPM-V 4.6 or MiniCPM-o 4.5, see the [API Guide](./docs/api.md).
+
It is organized for:
* **Individuals**: Local inference, quantized deployment, and end-device demos.
diff --git a/README_zh.md b/README_zh.md
index cedb15e6..00cb8b73 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -20,7 +20,7 @@
- MiniCPM-V 4.6 🤗 🤖 📱 | MiniCPM-o 4.5 🤗 📞 🤖 | 📄 技术报告 | 🍳 使用指南
+ MiniCPM-V 4.6 🤗 🤖 📱 | MiniCPM-o 4.5 🤗 📞 🤖 | 📄 技术报告 | 🍳 使用指南 | 🌐 API 指南
@@ -2405,6 +2405,8 @@ MiniCPM-o 4.5 支持 vLLM, SGLang, llama.cpp, Ollama 等[推理框架](#训练
欢迎探索我们整理的 [MiniCPM-V & o 使用手册 (Cookbook)](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。Cookbook 提供面向场景的教程,覆盖 MiniCPM-V 和 MiniCPM-o 的部署、微调、量化和 Demo 构建等常见任务;配套的[文档网站](https://minicpm-o.readthedocs.io/en/latest/index.html)以结构化方式呈现这些方案,便于快速查找。
+如需快速使用 MiniCPM-V 4.6 或 MiniCPM-o 4.5,可参考 [API Guide](./docs/api.md)。
+
它面向以下使用场景组织内容:
* **个人用户**:本地推理、量化部署和端侧 Demo。
diff --git a/docs/api.md b/docs/api.md
new file mode 100644
index 00000000..b3b7a919
--- /dev/null
+++ b/docs/api.md
@@ -0,0 +1,120 @@
+# API Guide
+
+This document introduces how to use the APIs for MiniCPM-V 4.6 and MiniCPM-o 4.5.
+
+## MiniCPM-V 4.6
+
+MiniCPM-V 4.6 can be called through the Chat Completions API. The interface supports both text-only and vision-language requests.
+
+### Endpoint
+
+```text
+Base URL: https://api.modelbest.cn/v1
+Chat API: POST /chat/completions
+Authorization: Bearer
+Content-Type: application/json
+```
+
+A **free** public API key is currently available for trying the service:
+
+```text
+sk-pQ8L2zF3XmR5kY9wV4jB7hN1tC6vM0xG3aD5sH2bJ9lK4cZ8
+```
+
+Available model IDs:
+
+```text
+MiniCPM-V-4.6-1.3B-Instruct
+MiniCPM-V-4.6-1.3B-think-0506_tau2_rl
+```
+
+### Text-Only Request
+
+```bash
+curl https://api.modelbest.cn/v1/chat/completions \
+ -H "Authorization: Bearer $API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "MiniCPM-V-4.6-1.3B-Instruct",
+ "messages": [
+ {
+ "role": "user",
+ "content": "Introduce yourself in one sentence."
+ }
+ ]
+ }'
+```
+
+To use the Thinking model, replace the `model` value with:
+
+```json
+"MiniCPM-V-4.6-1.3B-think-0506_tau2_rl"
+```
+
+### Vision-Language Request
+
+Images are passed as base64 data URLs in the `image_url` content format.
+
+```bash
+curl https://api.modelbest.cn/v1/chat/completions \
+ -H "Authorization: Bearer $API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "MiniCPM-V-4.6-1.3B-Instruct",
+ "messages": [
+ {
+ "role": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "Describe this image."
+ },
+ {
+ "type": "image_url",
+ "image_url": {
+ "url": "data:image/png;base64,"
+ }
+ }
+ ]
+ }
+ ]
+ }'
+```
+
+
+### Python Example
+
+```python
+import json
+import urllib.request
+
+api_key = ""
+payload = {
+ "model": "MiniCPM-V-4.6-1.3B-Instruct",
+ "messages": [
+ {
+ "role": "user",
+ "content": "List three use cases for MiniCPM-V.",
+ }
+ ],
+}
+
+request = urllib.request.Request(
+ "https://api.modelbest.cn/v1/chat/completions",
+ data=json.dumps(payload).encode("utf-8"),
+ headers={
+ "Authorization": f"Bearer {api_key}",
+ "Content-Type": "application/json",
+ },
+ method="POST",
+)
+
+with urllib.request.urlopen(request) as response:
+ data = json.loads(response.read().decode("utf-8"))
+
+print(data["choices"][0]["message"]["content"])
+```
+
+## MiniCPM-o 4.5
+
+MiniCPM-o 4.5 provides a Realtime API for full-duplex multimodal interaction. For the current MiniCPM-o 4.5 API documentation, see the [Realtime API Overview](https://minicpmo45.modelbest.cn/docs/en/realtime-api/overview/).
diff --git a/docs/minicpm_v4dot5_en.md b/docs/minicpm_v4dot5_en.md
index 458c5917..677dbad4 100644
--- a/docs/minicpm_v4dot5_en.md
+++ b/docs/minicpm_v4dot5_en.md
@@ -16,7 +16,7 @@ Based on [LLaVA-UHD](https://arxiv.org/pdf/2403.11703) architecture, MiniCPM-V 4
- 💫 **Easy Usage.**
-MiniCPM-V 4.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/Support-MiniCPM-V-4.5/docs/multimodal/minicpmv4.5.md) and [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4), [GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf) and [AWQ](https://github.com/tc-mb/AutoAWQ) format quantized models in 16 sizes, (3) [SGLang](https://github.com/tc-mb/sglang/tree/main) and [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [Transformers](https://github.com/tc-mb/transformers/tree/main) and [LLaMA-Factory](./docs/llamafactory_train_and_infer.md), (5) quick [local WebUI demo](#chat-with-our-demo-on-gradio), (6) optimized [local iOS app](https://github.com/OpenBMB/MiniCPM-V-Apps) on iPhone and iPad, and (7) online web demo on [server](http://101.126.42.235:30910/). See our [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) for full usage!
+MiniCPM-V 4.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/Support-MiniCPM-V-4.5/docs/multimodal/minicpmv4.5.md) and [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4), [GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf) and [AWQ](https://github.com/tc-mb/AutoAWQ) format quantized models in 16 sizes, (3) [SGLang](https://github.com/tc-mb/sglang/tree/main) and [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [Transformers](https://github.com/tc-mb/transformers/tree/main) and [LLaMA-Factory](./docs/llamafactory_train_and_infer.md), (5) quick [local WebUI demo](#chat-with-our-demo-on-gradio), (6) optimized [local iOS app](https://github.com/OpenBMB/MiniCPM-V-Apps) on iPhone and iPad, and (7) online web demo on [server](https://huggingface.co/spaces/openbmb/MiniCPM-V-4_5-Demo). See our [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) for full usage!
### Key Techniques
diff --git a/docs/minicpm_v4dot5_zh.md b/docs/minicpm_v4dot5_zh.md
index 3534ff4d..c3e3ac59 100644
--- a/docs/minicpm_v4dot5_zh.md
+++ b/docs/minicpm_v4dot5_zh.md
@@ -18,7 +18,7 @@
基于 [LLaVA-UHD](https://arxiv.org/pdf/2403.11703) 架构,MiniCPM-V 4.5 能处理任意长宽比、最高达 180 万像素(如 1344x1344) 的高分辨率图像,同时使用的视觉 token 数仅为多数 MLLM 的 1/4。其在 OCRBench 上取得超越 GPT-4o-latest 与 Gemini 2.5 等闭源模型的性能,并在 OmniDocBench 上展现了业界顶尖的 PDF 文档解析能力。借助最新的 [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) 和 [VisCPM](https://github.com/OpenBMB/VisCPM) 技术,模型在可靠性上表现优异,在 MMHal-Bench 上超越 GPT-4o-latest,并支持 30+ 种语言的多语言能力。
- 💫 **便捷易用的部署方式**
- MiniCPM-V 4.5 提供丰富灵活的使用方式:(1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/master/docs/multimodal/minicpmo4.5.md) 与 [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) 支持本地 CPU 高效推理;(2) 提供 [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4)、[GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf)、[AWQ](https://github.com/tc-mb/AutoAWQ) 等 16 种规格的量化模型;(3)兼容 SGLang 与 [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) (4) 借助 [Transformers](https://github.com/tc-mb/transformers/tree/main) 与 [LLaMA-Factory](./docs/llamafactory_train_and_infer.md) 在新领域与任务上进行微调;(5) 快速启动本地 [WebUI demo](#chat-with-our-demo-on-gradio);(6) 优化适配的 [iOS 本地应用](https://github.com/OpenBMB/MiniCPM-V-Apps),可在 iPhone 与 iPad 上高效运行;(7) 在线 [Web demo](http://101.126.42.235:30910/) 体验。更多使用方式请见 [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。
+ MiniCPM-V 4.5 提供丰富灵活的使用方式:(1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/master/docs/multimodal/minicpmo4.5.md) 与 [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) 支持本地 CPU 高效推理;(2) 提供 [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4)、[GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf)、[AWQ](https://github.com/tc-mb/AutoAWQ) 等 16 种规格的量化模型;(3)兼容 SGLang 与 [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) (4) 借助 [Transformers](https://github.com/tc-mb/transformers/tree/main) 与 [LLaMA-Factory](./docs/llamafactory_train_and_infer.md) 在新领域与任务上进行微调;(5) 快速启动本地 [WebUI demo](#chat-with-our-demo-on-gradio);(6) 优化适配的 [iOS 本地应用](https://github.com/OpenBMB/MiniCPM-V-Apps),可在 iPhone 与 iPad 上高效运行;(7) 在线 [Web demo](https://huggingface.co/spaces/openbmb/MiniCPM-V-4_5-Demo) 体验。更多使用方式请见 [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。
### 技术亮点