-
Notifications
You must be signed in to change notification settings - Fork 239
[Release 2026/0] Documentation cherry-picks #3996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
przepeck
merged 10 commits into
releases/2026/0
from
przepeck/release/2026/0-cherrypicks
Feb 23, 2026
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
13ae37b
flag for export model to enable export InternVL2 (#3970)
przepeck af0eb40
Cherry-pick:
przepeck 01bbf54
Fixing tests for agentic demo (#3980)
przepeck c235192
Missing option in help (#3963)
przepeck 9f3a09c
Cherry-pick: Trivy/dependabot issues fix (#3964)
przepeck 6e0a586
fix
przepeck b8007cc
Adding env var for Qwen3-coder - continue demo (#3991)
przepeck 056ad5d
Merge branch 'releases/2026/0' into przepeck/release/2026/0-cherrypicks
przepeck ba7fcee
Update README.md
przepeck ba07aaf
Merge branch 'releases/2026/0' into przepeck/release/2026/0-cherrypicks
przepeck File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| tritonclient[grpc]==2.41.0 | ||
| tritonclient[grpc] | ||
| ffmpeg-python==0.2.0 | ||
| opencv-python==4.9.0.80 | ||
| protobuf==4.25.8 | ||
| protobuf==5.29.6 | ||
| numpy<2.0.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -237,13 +237,14 @@ ovms.exe --rest_port 8000 --source_model OpenVINO/Phi-4-mini-instruct-int4-ov -- | |
| :::{tab-item} Qwen3-Coder-30B-A3B-Instruct | ||
| :sync: Qwen3-Coder-30B-A3B-Instruct | ||
| ```bat | ||
| set MOE_USE_MICRO_GEMM_PREFILL=0 | ||
| ovms.exe --rest_port 8000 --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --model_repository_path models --tool_parser qwen3coder --target_device GPU --task text_generation --cache_dir .cache --enable_prefix_caching true | ||
| ``` | ||
| ::: | ||
| :::{tab-item} gpt-oss-20b | ||
| :sync: gpt-oss-20b | ||
| ```bat | ||
| ovms.exe --rest_port 8000 --source_model openai/gpt-oss-20b --model_repository_path models --tool_parser gptoss --reasoning_parser gptoss --target_device GPU --task text_generation --enable_prefix_caching true --target_device GPU | ||
| ovms.exe --rest_port 8000 --source_model openai/gpt-oss-20b --model_repository_path models --tool_parser gptoss --reasoning_parser gptoss --task text_generation --enable_prefix_caching true --target_device GPU | ||
| ``` | ||
| > **Note:**: Use `--pipeline_type LM` parameter in export command, for version 2025.4.*. It disables continuous batching. Not needed in last weekly or 2026.0+ releases. | ||
| ::: | ||
|
|
@@ -294,12 +295,6 @@ ovms.exe --rest_port 8000 --source_model OpenVINO/Qwen3-4B-int4-ov --model_repos | |
| ovms.exe --rest_port 8000 --source_model OpenVINO/Mistral-7B-Instruct-v0.3-int4-cw-ov --model_repository_path models --tool_parser mistral --target_device NPU --task text_generation --enable_prefix_caching true --cache_dir .cache --max_prompt_len 4000 | ||
| ``` | ||
| ::: | ||
| :::{tab-item} Phi-3-mini-4k-instruct-int4-cw-ov | ||
| :sync: Phi-3-mini-4k-instruct-int4-cw-ov | ||
| ```bat | ||
| ovms.exe --rest_port 8000 --source_model OpenVINO/Phi-3-mini-4k-instruct-int4-cw-ov --model_repository_path models --tool_parser phi4 --target_device NPU --task text_generation --enable_tool_guided_generation true --enable_prefix_caching true --cache_dir .cache --max_prompt_len 4000 | ||
| ``` | ||
| ::: | ||
| :::: | ||
|
|
||
| > **Note:** Setting the `--max_prompt_len` parameter too high may lead to performance degradation. It is recommended to use the smallest value that meets your requirements. | ||
|
|
@@ -380,8 +375,8 @@ docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/model | |
| :::{tab-item} Qwen3-Coder-30B-A3B-Instruct | ||
| :sync: Qwen3-Coder-30B-A3B-Instruct | ||
| ```bash | ||
| docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/models openvino/model_server:weekly \ | ||
| --rest_port 8000 --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --model_repository_path models --tool_parser qwen3coder --task text_generation --cache_dir .cache --enable_prefix_caching true | ||
| docker run -d --user $(id -u):$(id -g) --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 -p 8000:8000 -v $(pwd)/models:/models openvino/model_server:weekly \ | ||
| --rest_port 8000 --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --model_repository_path models --tool_parser qwen3coder --task text_generation --enable_prefix_caching true | ||
| ``` | ||
| ::: | ||
| :::: | ||
|
|
@@ -467,7 +462,7 @@ docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/model | |
| :::{tab-item} Qwen3-Coder-30B-A3B-Instruct | ||
| :sync: Qwen3-Coder-30B-A3B-Instruct | ||
| ```bash | ||
| docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/models --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \ | ||
| docker run -d --user $(id -u):$(id -g) -e MOE_USE_MICRO_GEMM_PREFILL=0 --rm -p 8000:8000 -v $(pwd)/models:/models --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \ | ||
| --rest_port 8000 --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --model_repository_path models --tool_parser qwen3coder --target_device GPU --task text_generation --enable_tool_guided_generation true --enable_prefix_caching true | ||
| ``` | ||
| ::: | ||
|
|
@@ -538,13 +533,6 @@ docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/model | |
| --rest_port 8000 --model_repository_path models --source_model OpenVINO/Mistral-7B-Instruct-v0.3-int4-cw-ov --tool_parser mistral --target_device NPU --task text_generation --enable_prefix_caching true --max_prompt_len 4000 | ||
| ``` | ||
| ::: | ||
| :::{tab-item} Phi-3-mini-4k-instruct-int4-cw-ov | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did we stop supporting phi?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its non agentic model, it got there by mistake |
||
| :sync: Phi-3-mini-4k-instruct-int4-cw-ov | ||
| ```bash | ||
| docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/models --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \ | ||
| --rest_port 8000 --model_repository_path models --source_model OpenVINO/Phi-3-mini-4k-instruct-int4-cw-ov --tool_parser phi4 --target_device NPU --task text_generation --enable_tool_guided_generation true --enable_prefix_caching true --max_prompt_len 4000 | ||
| ``` | ||
| ::: | ||
| :::: | ||
|
|
||
| ### Deploy all models in a single container | ||
|
|
@@ -621,7 +609,7 @@ python openai_agent.py --query "List the files in folder /root" --model meta-lla | |
| ::: | ||
| :::{tab-item} Phi-4-mini-instruct | ||
| :sync: Phi-4-mini-instruct | ||
| ```console | ||
| ```bash | ||
| python openai_agent.py --query "What is the current weather in Tokyo?" --model microsoft/Phi-4-mini-instruct --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather | ||
| ``` | ||
| ::: | ||
|
|
@@ -651,7 +639,7 @@ python openai_agent.py --query "What is the current weather in Tokyo?" --model Q | |
| ::: | ||
| :::{tab-item} gpt-oss-20b | ||
| :sync: gpt-oss-20b | ||
| ```bash | ||
| ```console | ||
| python openai_agent.py --query "What is the current weather in Tokyo?" --model openai/gpt-oss-20b --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather | ||
| ``` | ||
| ::: | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,5 +4,5 @@ | |
| openvino==2025.4.* | ||
| numpy<2.0 | ||
| transformers<=4.53.0 | ||
| pillow==10.3.0 | ||
| pillow==12.1.1 | ||
| torch==2.8.0+cpu | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,72 +1,65 @@ | ||
| # Prediction Example with an ONNX Model {#ovms_demo_using_onnx_model} | ||
|
|
||
| Steps are similar to when you work with IR model format. Model Server accepts ONNX models as well with no differences in versioning. Locate ONNX model file in separate model version directory. | ||
| This demo demonstrates the steps required to deploy an ONNX‑based vision model. The workflow is optimized for rapid integration and ease of use: no model‑conversion step is needed, as the model is provided directly in ONNX format. | ||
| To further simplify deployment, the server applies all necessary image‑preprocessing operations, removing the need for the client to implement preprocessing pipelines such as normalization or color‑space transformation. This approach reduces development effort, ensures consistency with the model’s training configuration, and accelerates end‑to‑end deployment. | ||
| The server accepts image data in multiple formats, offering flexibility depending on the client environment. Images can be sent as: | ||
|
|
||
| Raw arrays directly obtained from OpenCV or Pillow | ||
| Encoded images, including JPEG or PNG formats | ||
|
|
||
| This enables seamless integration with a wide range of applications and client libraries. | ||
| Below is a complete functional use case using Python 3.7 or higher. | ||
| For this example let's use a public [ONNX ResNet](https://github.com/onnx/models/tree/main/validated/vision/classification/resnet) model - resnet50-caffe2-v1-9.onnx model. | ||
|
|
||
| This model requires additional [preprocessing function](https://github.com/onnx/models/tree/main/validated/vision/classification/resnet#preprocessing). Preprocessing can be performed in the client by manipulating data before sending the request. Preprocessing can be also delegated to the server by setting preprocessing parameters. Both methods will be explained below. | ||
| This model was trained using an additional [preprocessing](https://github.com/onnx/models/tree/main/validated/vision/classification/resnet#preprocessing). For inference, preprocessing can be executed on the client side by transforming the input data before sending the request. However, a more efficient approach is to delegate preprocessing to the server by configuring the appropriate preprocessing parameters. | ||
| Here will be adjusted `mean`, `scale`, `color` and `layout`. In addition to that, input precision conversion from fp32 to uint8 can improve performance and bandwidth efficiency. Raw images can be transmitted using more compact uint8 data, significantly reducing the payload size and lowering client‑side compute requirements. | ||
| More details about [parameters](../../../docs/parameters.md). | ||
|
|
||
| [Option 1: Adding preprocessing to the client side](#option-1-adding-preprocessing-to-the-client-side) | ||
| [Option 2: Adding preprocessing to the server side](#option-2-adding-preprocessing-to-the-server-side) | ||
|
|
||
| ## Option 1: Adding preprocessing to the client side | ||
| ## Model deployment with preprocessing | ||
|
|
||
| Clone the repository and enter using_onnx_model directory | ||
|
|
||
| ```bash | ||
| git clone https://github.com/openvinotoolkit/model_server.git | ||
| cd model_server/demos/using_onnx_model/python | ||
| ``` | ||
|
|
||
| Download classification model | ||
| Prepare environment | ||
| ```bash | ||
| curl --fail -L --create-dirs https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet50-caffe2-v1-9.onnx -o workspace/resnet50-onnx/1/resnet50-caffe2-v1-9.onnx | ||
| ``` | ||
|
|
||
| You should see `workspace` directory created with the following content: | ||
| ```bash | ||
| workspace/ | ||
| └── resnet50-onnx | ||
| └── 1 | ||
| └── resnet50-caffe2-v1-9.onnx | ||
|
|
||
| ``` | ||
|
|
||
| Start the OVMS container with single model instance: | ||
| Start the OVMS container with additional preprocessing options: | ||
| ```bash | ||
| docker run -d -u $(id -u):$(id -g) -v $(pwd)/workspace:/workspace -p 9001:9001 openvino/model_server:latest \ | ||
| --model_path /workspace/resnet50-onnx --model_name resnet --port 9001 | ||
| --model_path /workspace/resnet50-onnx --model_name resnet --port 9001 --layout NHWC:NCHW --mean "[123.675,116.28,103.53]" --scale "[58.395,57.12,57.375]" --shape "(1,224,224,3)" --color_format BGR:RGB --precision uint8:fp32 | ||
| ``` | ||
|
|
||
| Install python client dependencies: | ||
| ```bash | ||
| pip3 install -r requirements.txt | ||
| ``` | ||
| ## Running the client: | ||
|
|
||
| The `onnx_model_demo.py` script can run inference both with and without performing preprocessing. Since in this variant we want to run preprocessing on the client side let's set `--run_preprocessing` flag. | ||
| The `onnx_model_demo.py` script can run inference both with and without performing preprocessing. Since in this variant preprocessing is done by the model server, there's no need to perform any image preprocessing on the client side. In that case, run without `--run_preprocessing` option. See [preprocessing function](https://github.com/openvinotoolkit/model_server/blob/releases/2026/0/demos/using_onnx_model/python/onnx_model_demo.py#L26-L33) run in the client. | ||
|
|
||
| Run the client without preprocessing: | ||
|
|
||
| Run the client with preprocessing: | ||
| ```bash | ||
| python onnx_model_demo.py --service_url localhost:9001 --run_preprocessing | ||
| Running with preprocessing on client side | ||
| ../../common/static/images/bee.jpeg (1, 3, 224, 224) ; data range: -2.117904 : 2.64 | ||
| Class is with highest score: 309 | ||
| pip3 install -r requirements.txt | ||
| python onnx_model_demo.py --service_url localhost:9001 | ||
| ``` | ||
| Output: | ||
| ``` | ||
| Running inference with image: ../../common/static/images/bee.jpeg | ||
| Class with highest score: 309 | ||
| Detected class name: bee | ||
| ``` | ||
|
|
||
| ## Option 2: Adding preprocessing to the server side | ||
|
|
||
| Start the OVMS container with additional preprocessing options: | ||
| The client can be also run with flag `--send_tensor` which reads encoded input image and sends it with uint8 precision. | ||
| ```bash | ||
| docker run -d -u $(id -u):$(id -g) -v $(pwd)/workspace:/workspace -p 9001:9001 openvino/model_server:latest \ | ||
| --model_path /workspace/resnet50-onnx --model_name resnet --port 9001 --layout NHWC:NCHW --mean "[123.675,116.28,103.53]" --scale "[58.395,57.12,57.375]" --shape "(1,224,224,3)" --color_format BGR:RGB | ||
| python onnx_model_demo.py --service_url localhost:9001 --send_tensor | ||
| ``` | ||
|
|
||
| The `onnx_model_demo.py` script can run inference both with and without performing preprocessing. Since in this variant preprocessing is done by the model server, there's no need to perform any image preprocessing on the client side. In that case, run without `--run_preprocessing` option. See [preprocessing function](https://github.com/openvinotoolkit/model_server/blob/releases/2026/0/demos/using_onnx_model/python/onnx_model_demo.py#L26-L33) run in the client. | ||
|
|
||
| Run the client without preprocessing: | ||
| ```bash | ||
| python onnx_model_demo.py --service_url localhost:9001 | ||
| Running without preprocessing on client side | ||
| Class is with highest score: 309 | ||
| Output: | ||
| ``` | ||
| Running inference with image: ../../common/static/images/bee.jpeg | ||
| Class with highest score: 309 | ||
| Detected class name: bee | ||
| ``` | ||
| > **Note:** While adding preprocessing to the model input, shape needs to be set as static. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzegla isnt python client eol? when and why did we change it? @przepeck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivy tests raised this issue and we needed to update version of this package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean ovmsclient, it's eol, but these clients I think should be supported