Introduction

This repository utilizes Docker to package large language models and multimodal models optimized for Rockchip platforms. It provides a unified calling interface that is compatible with the OpenAI API, making it easy for users to integrate and use these models.

Hardware Prepare

For reComputer RK3588 and reComputer RK3576.

LLM

Fast start

Device	Model
RK3588	rk3588-deepseek-r1-distill-qwen:7b-w8a8-latest rk3588-deepseek-r1-distill-qwen:1.5b-fp16-latest rk3588-deepseek-r1-distill-qwen:1.5b-w8a8-latest
RK3576	rk3576-deepseek-r1-distill-qwen:7b-w4a16-g128-latest rk3576-deepseek-r1-distill-qwen:7b-w4a16-latest rk3576-deepseek-r1-distill-qwen:1.5b-fp16-latest rk3576-deepseek-r1-distill-qwen:1.5b-w4a16-g128-latest rk3576-deepseek-r1-distill-qwen:1.5b-w4a16-latest

VLM

Fast start

Device	Model
RK3588	rk3588-qwen2-vl:7b-w8a8-latest rk3588-qwen2-vl:2b-w8a8-latest
RK3576	rk3576-qwen2.5-vl:3b-w4a16-latest

Speed test

Note: A rough estimate of a model's inference speed includes both TTFT and TPOT. Note: You can use python test_inference_speed.py --help to view the help function.

python -m venv .env && source .env/bin/activate
pip install requests
python llm_speed_test.py

💞 Top contributors:

🌟 Star History

Reference: rknn-llm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction

Hardware Prepare

LLM

VLM

Speed test

💞 Top contributors:

🌟 Star History

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

Hardware Prepare

LLM

VLM

Speed test

💞 Top contributors:

🌟 Star History