Skip to content

[quantization] Introduce wrappers for Qwen3VLTextDecoderLayer and Qwen3VLTextModel#535

Draft
dayo09 wants to merge 4 commits intoSamsung:mainfrom
dayo09:0303-text-models
Draft

[quantization] Introduce wrappers for Qwen3VLTextDecoderLayer and Qwen3VLTextModel#535
dayo09 wants to merge 4 commits intoSamsung:mainfrom
dayo09:0303-text-models

Conversation

@dayo09
Copy link
Contributor

@dayo09 dayo09 commented Mar 5, 2026

Let's add wrappers for upper level qwen3vl layers.

TICO-DCO-1.0-Signed-off-by: Dayoung Lee dayoung.lee@samsung.com

…n3VLTextModel

- Add `QuantQwen3VLTextDecoderLayer`: wraps attention, MLP, and layernorm
  blocks; pre-builds static causal mask and RoPE templates to avoid
  dynamic ops in forward pass
- Add `QuantQwen3VLTextModel`: pre-computes shared causal mask and RoPE
  once and passes them to every decoder layer, so they are quantized
  exactly once rather than independently in each layer
- Register both wrappers in `_CORE_MODULES`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +190 to +191
self._fq(cos, self.obs_cos),
self._fq(sin, self.obs_sin),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dayo09
Sorry for disturbance. But

self._fq(cos[:, : hidden_states.size(1), :], self.obs_cos),

will disable dependence on size of inputs. (It proved to be useful for LLama).
It's similar to self.causal_mask_template[..., :seq_len, :seq_len].to(device) above (Ln127).
IMHO.

@dayo09 dayo09 force-pushed the 0303-text-models branch from 4829fa4 to e71a9b1 Compare March 11, 2026 07:15
print(f"│ Mean |diff|: {(q_out - fp_out).abs().mean().item():.6f}")
print(f"│ PEIR : {compute_peir(fp_out, q_out) * 100:.6f} %")
print("└──────────────────────────────────────────────────────")
print(plot_two_outputs(fp_out, q_out))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.071578PEIR       : 9.253764 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 5.1┤                                         •  │
 3.4┤                              • ••••    •   │
 1.7┤                        ••••••••••          │
 0.0┤                 ••••••••••                 │
-1.7┤            • ••••••                        │
-3.4┤   ••••••••                                 │
-5.1┤  •                                         │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -5.1       -2.5        0.0       2.5       5.1 

print(f"│ Mean |diff|: {(q_out - fp_out).abs().mean().item():.6f}")
print(f"│ PEIR : {compute_peir(fp_out, q_out) * 100:.6f} %")
print("└──────────────────────────────────────────────────────")
print(plot_two_outputs(fp_out, q_out))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python3 tico/quantization/wrapq/examples/qwen/quantize_text_model.py 
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.904804PEIR       : 351.709125 %
└──────────────────────────────────────────────────────
      ┌──────────────────────────────────────────┐
  28.2┤                                          │
      │                                          │
      │                              •• •• •     │
   4.7┤                                •••••     │
      │                               •••••      │
      │                              ••••        │
 -18.7┤                             •••          │
      │                                          │
      │                                          │
 -42.2┤                                          │
      │                                          │
      │                                          │
      │                                          │
 -65.6┤                                          │
      │                                          │
      │                                          │
 -89.1┤                                          │
      │                                          │
      │                                       •  │
-112.5┤                                          │
      └┬─────────┬──────────┬─────────┬─────────┬┘
    -112.5     -77.4      -42.2     -7.0     28.2 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one big outlier. 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants