FIX OOM converting large models on 24Gb cards by FearL0rd · Pull Request #29 · silveroxides/convert_to_quant

FearL0rd · 2026-03-26T20:27:18Z

Optimizes the quantization pipeline to handle 1B+ element layers (e.g., Gemma 3
embeddings) on 24GB hardware. Resolves consecutive CUDA OOMs by
dynamically switching processing devices based on layer volume.

Introduced mem_threshold to trigger full CPU optimization for massive layers.

Implemented lazy dequantization in _compute_loss_and_grad to avoid GPU
temporary tensor overhead.

Migrated optimization loops to use in-place arithmetic (div_, add_, mul_).

Added proactive CUDA cache flushing and state offloading to CPU.

fix OOM GPU Memory Error During Model Quantization

Optimizes the quantization pipeline to handle 1B+ element layers (e.g., Gemma 3 embeddings) on 24GB hardware. Resolves consecutive CUDA OOMs by dynamically switching processing devices based on layer volume. Introduced mem_threshold to trigger full CPU optimization for massive layers. Implemented lazy dequantization in _compute_loss_and_grad to avoid GPU temporary tensor overhead. Migrated optimization loops to use in-place arithmetic (div_, add_, mul_). Added proactive CUDA cache flushing and state offloading to CPU.

Fixed a lifecycle bug in _optimize_original where W_float32 was accessed after deletion during subspace optimization. Reordered operations to ensure the initial rounded state is captured before VRAM liberation. Streamlined _optimize_original to calculate P_orig and W_q_refined consecutively before deleting original FP32 weights. Optimized subspace updates using torch.addmm_ for zero-overhead matrix modifications. Fixed variable scoping for is_massive logic to ensure safe execution on large embedding layers.

FearL0rd added 2 commits March 26, 2026 14:44

GPU Memory Error During Model Quantization

5e06f48

fix OOM GPU Memory Error During Model Quantization

FearL0rd mentioned this pull request Mar 26, 2026

3090 OutOfMemoryError #28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX OOM converting large models on 24Gb cards#29

FIX OOM converting large models on 24Gb cards#29
FearL0rd wants to merge 3 commits intosilveroxides:mainfrom
FearL0rd:main

FearL0rd commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FearL0rd commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant