diff --git a/README.md b/README.md index 021ae116..b8811815 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@

- MiniCPM-V 4.6 🤗 🤖 📱 | MiniCPM-o 4.5 🤗 📞 🤖 | 📄 Technical Report | 🍳 Cookbook + MiniCPM-V 4.6 🤗 🤖 📱 | MiniCPM-o 4.5 🤗 📞 🤖 | 📄 Technical Report | 🍳 Cookbook | 🌐 API Guide

@@ -2424,6 +2424,8 @@ We would like to thank the following projects: The [MiniCPM-V & o Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) provides scenario-based recipes for deploying, fine-tuning, quantizing, and building demos with MiniCPM-V and MiniCPM-o. The [documentation website](https://minicpm-o.readthedocs.io/en/latest/index.html) presents these recipes in a structured way for quick lookup. +To quickly use MiniCPM-V 4.6 or MiniCPM-o 4.5, see the [API Guide](./docs/api.md). + It is organized for: * **Individuals**: Local inference, quantized deployment, and end-device demos. diff --git a/README_zh.md b/README_zh.md index cedb15e6..00cb8b73 100644 --- a/README_zh.md +++ b/README_zh.md @@ -20,7 +20,7 @@

- MiniCPM-V 4.6 🤗 🤖 📱 | MiniCPM-o 4.5 🤗 📞 🤖 | 📄 技术报告 | 🍳 使用指南 + MiniCPM-V 4.6 🤗 🤖 📱 | MiniCPM-o 4.5 🤗 📞 🤖 | 📄 技术报告 | 🍳 使用指南 | 🌐 API 指南

@@ -2405,6 +2405,8 @@ MiniCPM-o 4.5 支持 vLLM, SGLang, llama.cpp, Ollama 等[推理框架](#训练 欢迎探索我们整理的 [MiniCPM-V & o 使用手册 (Cookbook)](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。Cookbook 提供面向场景的教程,覆盖 MiniCPM-V 和 MiniCPM-o 的部署、微调、量化和 Demo 构建等常见任务;配套的[文档网站](https://minicpm-o.readthedocs.io/en/latest/index.html)以结构化方式呈现这些方案,便于快速查找。 +如需快速使用 MiniCPM-V 4.6 或 MiniCPM-o 4.5,可参考 [API Guide](./docs/api.md)。 + 它面向以下使用场景组织内容: * **个人用户**:本地推理、量化部署和端侧 Demo。 diff --git a/docs/api.md b/docs/api.md new file mode 100644 index 00000000..b3b7a919 --- /dev/null +++ b/docs/api.md @@ -0,0 +1,120 @@ +# API Guide + +This document introduces how to use the APIs for MiniCPM-V 4.6 and MiniCPM-o 4.5. + +## MiniCPM-V 4.6 + +MiniCPM-V 4.6 can be called through the Chat Completions API. The interface supports both text-only and vision-language requests. + +### Endpoint + +```text +Base URL: https://api.modelbest.cn/v1 +Chat API: POST /chat/completions +Authorization: Bearer +Content-Type: application/json +``` + +A **free** public API key is currently available for trying the service: + +```text +sk-pQ8L2zF3XmR5kY9wV4jB7hN1tC6vM0xG3aD5sH2bJ9lK4cZ8 +``` + +Available model IDs: + +```text +MiniCPM-V-4.6-1.3B-Instruct +MiniCPM-V-4.6-1.3B-think-0506_tau2_rl +``` + +### Text-Only Request + +```bash +curl https://api.modelbest.cn/v1/chat/completions \ + -H "Authorization: Bearer $API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "MiniCPM-V-4.6-1.3B-Instruct", + "messages": [ + { + "role": "user", + "content": "Introduce yourself in one sentence." + } + ] + }' +``` + +To use the Thinking model, replace the `model` value with: + +```json +"MiniCPM-V-4.6-1.3B-think-0506_tau2_rl" +``` + +### Vision-Language Request + +Images are passed as base64 data URLs in the `image_url` content format. + +```bash +curl https://api.modelbest.cn/v1/chat/completions \ + -H "Authorization: Bearer $API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "MiniCPM-V-4.6-1.3B-Instruct", + "messages": [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "Describe this image." + }, + { + "type": "image_url", + "image_url": { + "url": "data:image/png;base64," + } + } + ] + } + ] + }' +``` + + +### Python Example + +```python +import json +import urllib.request + +api_key = "" +payload = { + "model": "MiniCPM-V-4.6-1.3B-Instruct", + "messages": [ + { + "role": "user", + "content": "List three use cases for MiniCPM-V.", + } + ], +} + +request = urllib.request.Request( + "https://api.modelbest.cn/v1/chat/completions", + data=json.dumps(payload).encode("utf-8"), + headers={ + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json", + }, + method="POST", +) + +with urllib.request.urlopen(request) as response: + data = json.loads(response.read().decode("utf-8")) + +print(data["choices"][0]["message"]["content"]) +``` + +## MiniCPM-o 4.5 + +MiniCPM-o 4.5 provides a Realtime API for full-duplex multimodal interaction. For the current MiniCPM-o 4.5 API documentation, see the [Realtime API Overview](https://minicpmo45.modelbest.cn/docs/en/realtime-api/overview/). diff --git a/docs/minicpm_v4dot5_en.md b/docs/minicpm_v4dot5_en.md index 458c5917..677dbad4 100644 --- a/docs/minicpm_v4dot5_en.md +++ b/docs/minicpm_v4dot5_en.md @@ -16,7 +16,7 @@ Based on [LLaVA-UHD](https://arxiv.org/pdf/2403.11703) architecture, MiniCPM-V 4 - 💫 **Easy Usage.** -MiniCPM-V 4.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/Support-MiniCPM-V-4.5/docs/multimodal/minicpmv4.5.md) and [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4), [GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf) and [AWQ](https://github.com/tc-mb/AutoAWQ) format quantized models in 16 sizes, (3) [SGLang](https://github.com/tc-mb/sglang/tree/main) and [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [Transformers](https://github.com/tc-mb/transformers/tree/main) and [LLaMA-Factory](./docs/llamafactory_train_and_infer.md), (5) quick [local WebUI demo](#chat-with-our-demo-on-gradio), (6) optimized [local iOS app](https://github.com/OpenBMB/MiniCPM-V-Apps) on iPhone and iPad, and (7) online web demo on [server](http://101.126.42.235:30910/). See our [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) for full usage! +MiniCPM-V 4.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/Support-MiniCPM-V-4.5/docs/multimodal/minicpmv4.5.md) and [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) support for efficient CPU inference on local devices, (2) [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4), [GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf) and [AWQ](https://github.com/tc-mb/AutoAWQ) format quantized models in 16 sizes, (3) [SGLang](https://github.com/tc-mb/sglang/tree/main) and [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with [Transformers](https://github.com/tc-mb/transformers/tree/main) and [LLaMA-Factory](./docs/llamafactory_train_and_infer.md), (5) quick [local WebUI demo](#chat-with-our-demo-on-gradio), (6) optimized [local iOS app](https://github.com/OpenBMB/MiniCPM-V-Apps) on iPhone and iPad, and (7) online web demo on [server](https://huggingface.co/spaces/openbmb/MiniCPM-V-4_5-Demo). See our [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) for full usage! ### Key Techniques diff --git a/docs/minicpm_v4dot5_zh.md b/docs/minicpm_v4dot5_zh.md index 3534ff4d..c3e3ac59 100644 --- a/docs/minicpm_v4dot5_zh.md +++ b/docs/minicpm_v4dot5_zh.md @@ -18,7 +18,7 @@ 基于 [LLaVA-UHD](https://arxiv.org/pdf/2403.11703) 架构,MiniCPM-V 4.5 能处理任意长宽比、最高达 180 万像素(如 1344x1344) 的高分辨率图像,同时使用的视觉 token 数仅为多数 MLLM 的 1/4。其在 OCRBench 上取得超越 GPT-4o-latest 与 Gemini 2.5 等闭源模型的性能,并在 OmniDocBench 上展现了业界顶尖的 PDF 文档解析能力。借助最新的 [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) 和 [VisCPM](https://github.com/OpenBMB/VisCPM) 技术,模型在可靠性上表现优异,在 MMHal-Bench 上超越 GPT-4o-latest,并支持 30+ 种语言的多语言能力。 - 💫 **便捷易用的部署方式** - MiniCPM-V 4.5 提供丰富灵活的使用方式:(1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/master/docs/multimodal/minicpmo4.5.md) 与 [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) 支持本地 CPU 高效推理;(2) 提供 [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4)、[GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf)、[AWQ](https://github.com/tc-mb/AutoAWQ) 等 16 种规格的量化模型;(3)兼容 SGLang 与 [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) (4) 借助 [Transformers](https://github.com/tc-mb/transformers/tree/main) 与 [LLaMA-Factory](./docs/llamafactory_train_and_infer.md) 在新领域与任务上进行微调;(5) 快速启动本地 [WebUI demo](#chat-with-our-demo-on-gradio);(6) 优化适配的 [iOS 本地应用](https://github.com/OpenBMB/MiniCPM-V-Apps),可在 iPhone 与 iPad 上高效运行;(7) 在线 [Web demo](http://101.126.42.235:30910/) 体验。更多使用方式请见 [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。 + MiniCPM-V 4.5 提供丰富灵活的使用方式:(1) [llama.cpp](https://github.com/tc-mb/llama.cpp/blob/master/docs/multimodal/minicpmo4.5.md) 与 [ollama](https://github.com/tc-mb/ollama/tree/MIniCPM-V) 支持本地 CPU 高效推理;(2) 提供 [int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4)、[GGUF](https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf)、[AWQ](https://github.com/tc-mb/AutoAWQ) 等 16 种规格的量化模型;(3)兼容 SGLang 与 [vLLM](#efficient-inference-with-llamacpp-ollama-vllm) (4) 借助 [Transformers](https://github.com/tc-mb/transformers/tree/main) 与 [LLaMA-Factory](./docs/llamafactory_train_and_infer.md) 在新领域与任务上进行微调;(5) 快速启动本地 [WebUI demo](#chat-with-our-demo-on-gradio);(6) 优化适配的 [iOS 本地应用](https://github.com/OpenBMB/MiniCPM-V-Apps),可在 iPhone 与 iPad 上高效运行;(7) 在线 [Web demo](https://huggingface.co/spaces/openbmb/MiniCPM-V-4_5-Demo) 体验。更多使用方式请见 [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook)。 ### 技术亮点