Skip to content

FIX OOM converting large models on 24Gb cards#29

Open
FearL0rd wants to merge 3 commits intosilveroxides:mainfrom
FearL0rd:main
Open

FIX OOM converting large models on 24Gb cards#29
FearL0rd wants to merge 3 commits intosilveroxides:mainfrom
FearL0rd:main

Conversation

@FearL0rd
Copy link
Copy Markdown

Optimizes the quantization pipeline to handle 1B+ element layers (e.g., Gemma 3
embeddings) on 24GB hardware. Resolves consecutive CUDA OOMs by
dynamically switching processing devices based on layer volume.

Introduced mem_threshold to trigger full CPU optimization for massive layers.

Implemented lazy dequantization in _compute_loss_and_grad to avoid GPU
temporary tensor overhead.

Migrated optimization loops to use in-place arithmetic (div_, add_, mul_).

Added proactive CUDA cache flushing and state offloading to CPU.

fix OOM GPU Memory Error During Model Quantization
Optimizes the quantization pipeline to handle 1B+ element layers (e.g., Gemma 3
embeddings) on 24GB hardware. Resolves consecutive CUDA OOMs by
dynamically switching processing devices based on layer volume.

Introduced mem_threshold to trigger full CPU optimization for massive layers.

Implemented lazy dequantization in _compute_loss_and_grad to avoid GPU
temporary tensor overhead.

Migrated optimization loops to use in-place arithmetic (div_, add_, mul_).

Added proactive CUDA cache flushing and state offloading to CPU.
@FearL0rd FearL0rd mentioned this pull request Mar 26, 2026
Fixed a lifecycle bug in _optimize_original where W_float32 was accessed
after deletion during subspace optimization. Reordered operations to
ensure the initial rounded state is captured before VRAM liberation.

Streamlined _optimize_original to calculate P_orig and W_q_refined
consecutively before deleting original FP32 weights.

Optimized subspace updates using torch.addmm_ for zero-overhead matrix
modifications.

Fixed variable scoping for is_massive logic to ensure safe execution
on large embedding layers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant