Skip to content

Latest commit

 

History

History
65 lines (46 loc) · 4.34 KB

File metadata and controls

65 lines (46 loc) · 4.34 KB

Contributors Forks Stargazers Issues MIT License

Introduction

This repository utilizes Docker to package large language models and multimodal models optimized for Rockchip platforms. It provides a unified calling interface that is compatible with the OpenAI API, making it easy for users to integrate and use these models.

Hardware Prepare

For reComputer RK3588 and reComputer RK3576.

LLM

Fast start

Device Model
RK3588 rk3588-deepseek-r1-distill-qwen:7b-w8a8-latest
rk3588-deepseek-r1-distill-qwen:1.5b-fp16-latest
rk3588-deepseek-r1-distill-qwen:1.5b-w8a8-latest
RK3576 rk3576-deepseek-r1-distill-qwen:7b-w4a16-g128-latest
rk3576-deepseek-r1-distill-qwen:7b-w4a16-latest
rk3576-deepseek-r1-distill-qwen:1.5b-fp16-latest
rk3576-deepseek-r1-distill-qwen:1.5b-w4a16-g128-latest
rk3576-deepseek-r1-distill-qwen:1.5b-w4a16-latest

VLM

Fast start

Device Model
RK3588 rk3588-qwen2-vl:7b-w8a8-latest
rk3588-qwen2-vl:2b-w8a8-latest
RK3576 rk3576-qwen2.5-vl:3b-w4a16-latest

Speed test

Note: A rough estimate of a model's inference speed includes both TTFT and TPOT. Note: You can use python test_inference_speed.py --help to view the help function.

python -m venv .env && source .env/bin/activate
pip install requests
python llm_speed_test.py

💞 Top contributors:

contrib.rocks image

🌟 Star History

Star History Chart

Reference: rknn-llm