reduce ram&vram usage for vlm calib stage by WeiweiZhang1 · Pull Request #1488 · intel/auto-round

WeiweiZhang1 · 2026-03-03T06:40:31Z

Description

Qwen3-VL-8B-Instruct example:
before:

after:

Type of Change

Related Issues

Fixes or relates to # #1214

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

Copilot

Pull request overview

This PR aims to reduce RAM/VRAM usage during the MLLM/VLM calibration stage by limiting dataset item caching and by enabling earlier forward-stop behavior during input caching.

Changes:

Add an optional bounded (LRU) runtime cache for MLLM dataset samples, configurable via AR_MLLM_DATASET_CACHE_SIZE, and clear it after calibration.
Refactor MLLM dataset instantiation to pass optional cache_size only when supported.
Improve early-stop caching logic in the base compressor by inferring last_cache_name for MLLMs and stopping forward once the last target is reached.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
auto_round/compressors/mllm/dataset.py	Adds bounded LRU caching for dataset `__getitem__` and wiring for `cache_size` via env var.
auto_round/compressors/mllm/compressor.py	Reduces VRAM during calib by passing `use_cache=False` and clears dataset runtime cache post-calib.
auto_round/compressors/base.py	Infers the last cache target for MLLMs and uses early-stop to reduce runtime/memory during caching.

auto_round/compressors/mllm/dataset.py

for more information, see https://pre-commit.ci

auto_round/compressors/base.py

wenhuach21 · 2026-03-03T06:55:01Z

Besides, help test a moe model, like qwen35-35B, the patching code may introduce some issues

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

wenhuach21 · 2026-03-09T07:50:52Z

auto_round/compressors/mllm/compressor.py

        lr (float): The learning rate (default is 0.005).
        minmax_lr (float): The learning rate for min-max tuning (default is None).
-        low_gpu_mem_usage (bool): Whether to use low GPU memory (default is False).
+        low_gpu_mem_usage (bool): Whether to use low GPU memory (default is True).


why set it to True

@yiliu30 please review the pr carefully

Cus your last modification reset this default to True, this is a doc fix.

reduce ram usage for vlm calib stage

b1c79a7

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

Copilot AI review requested due to automatic review settings March 3, 2026 06:40

Copilot started reviewing on behalf of WeiweiZhang1 March 3, 2026 06:40 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

auto_round/compressors/mllm/dataset.py Show resolved Hide resolved

auto_round/compressors/mllm/dataset.py Show resolved Hide resolved

WeiweiZhang1 and others added 2 commits March 3, 2026 14:46

Merge branch 'main' into reduce_vram/ram_usage_for_vlm_in_calib_stage

31aa2f5

[pre-commit.ci] auto fixes from pre-commit.com hooks

0c22523

for more information, see https://pre-commit.ci

wenhuach21 reviewed Mar 3, 2026

View reviewed changes

auto_round/compressors/base.py Outdated Show resolved Hide resolved

WeiweiZhang1 added the WIP label Mar 3, 2026

n1ck-guo and others added 9 commits March 4, 2026 09:09

gguf better support for transformers5.0 and fix bug of Qwen3Next (#1474)

3a49037

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' into reduce_vram/ram_usage_for_vlm_in_calib_stage

e3f1fdb

fix meta issue for patch model

736e7f3

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

54e1a7c

for more information, see https://pre-commit.ci

fix meta issue of patching model like qwen3.5-35B-A3B

576a417

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

fix scan issue

cf668ce

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

refine early stop calib logic, add ut for mllm calib cache check

d7c67d7

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa4f342

for more information, see https://pre-commit.ci

Merge branch 'main' into reduce_vram/ram_usage_for_vlm_in_calib_stage

c7f81d0

WeiweiZhang1 removed the WIP label Mar 6, 2026

WeiweiZhang1 requested review from n1ck-guo and yiliu30 March 6, 2026 07:49

yiliu30 approved these changes Mar 6, 2026

View reviewed changes

Merge branch 'main' into reduce_vram/ram_usage_for_vlm_in_calib_stage

0e8b48e

WeiweiZhang1 merged commit be38713 into main Mar 9, 2026
29 checks passed

WeiweiZhang1 deleted the reduce_vram/ram_usage_for_vlm_in_calib_stage branch March 9, 2026 07:35

wenhuach21 reviewed Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce ram&vram usage for vlm calib stage#1488

reduce ram&vram usage for vlm calib stage#1488
WeiweiZhang1 merged 13 commits intomainfrom
reduce_vram/ram_usage_for_vlm_in_calib_stage

WeiweiZhang1 commented Mar 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 3, 2026

Uh oh!

Uh oh!

wenhuach21 Mar 9, 2026

Uh oh!

wenhuach21 Mar 9, 2026

Uh oh!

WeiweiZhang1 Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

WeiweiZhang1 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 3, 2026

Uh oh!

Uh oh!

wenhuach21 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

WeiweiZhang1 Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

WeiweiZhang1 commented Mar 3, 2026 •

edited

Loading

WeiweiZhang1 Mar 9, 2026 •

edited

Loading