[quantization] Introduce QuantConv3dDecomposed wrapper for Conv3d#516
[quantization] Introduce QuantConv3dDecomposed wrapper for Conv3d#516mhs4670go merged 1 commit intoSamsung:mainfrom
Conversation
35e7ff4 to
c066948
Compare
|
Good! Now I can see that all operators are quantized well. However, we could remove the 'add' operation remaining too. Patch embed operation from this PR
Our final optimized patch embed suggestionHowever, patch embed Conv3d can be lowered to Linear projection when the kernel size perfectly fits the stride size, as below, optimization done within PR #518 class Conv3dWithPerfectFitKernel(torch.nn.Module):
"""Conv3D with perfect fitting kernel"""
def __init__(self):
super().__init__()
self.conv3d = torch.nn.Conv3d(
in_channels=3,
out_channels=1024,
kernel_size=(2, 16, 16),
stride=(2, 16, 16),
padding=(0, 0, 0),
)
def forward(self, input):
return self.conv3d(input)
def get_example_inputs(self):
return (torch.randn(5, 3, 2, 16, 16),), {}
This Conv3dToConv2d optimization logic is implemented on TICO legalization pass, so this need to be performed into quantization wrapper. Could you please perform more optimization likewise above PR does? To grab the full context, see:
Copied from #430 (comment) |
b69449f to
5b32757
Compare
14b659f to
3527997
Compare
Hi @dayo09 , thanks for your feedback!
|
| obs_name = f"{obs_name_prefix}{dict_key}" | ||
| obs = self._make_obs(obs_name) | ||
| obs_dictionary[dict_key] = obs | ||
| self.add_module(obs_name, obs) |
There was a problem hiding this comment.
Could you let me know the full warning message? Because even though I comment out this line, I got no such warnings.
There was a problem hiding this comment.
@mhs4670go Could you please clarify what warning message you are referring to?
Do you mean the 2 warnings in the unit tests (20 passed, 2 warnings in 5.25s)?
There was a problem hiding this comment.
If it's about the 2 warnings in the unit tests, here's the full context.
Warning Message
$ python -m pytest test/quantization/wrapq/wrappers/nn/test_quant_conv3d_decomposed.py -v
...
...
...
test/quantization/wrapq/wrappers/nn/test_quant_conv3d_decomposed.py::TestQuantConv3dDecomposed::test_registration_in_registry
<frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
test/quantization/wrapq/wrappers/nn/test_quant_conv3d_decomposed.py::TestQuantConv3dDecomposed::test_registration_in_registry
<frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
...
...
...
Root Cause
The warnings are triggered during the test execution when lookup(nn.Conv3d) is called, which invokes _lazy_init() in the registry. This lazy initialization imports all core wrapper modules, which in turn import PyTorch components.
The Issue
The warnings come from PyTorch's Swig-generated C++ types (SwigPyPacked and SwigPyObject) that don't have a __module__ attribute. Python's importlib checks for this attribute during imports, and Python 3.12+ is stricter about this check, emitting DeprecationWarning.
Why It Happens
- Test calls
lookup(nn.Conv3d)→ triggers_lazy_init() _lazy_init()imports all_CORE_MODULESincluding PyTorch-dependent modules- During import, PyTorch loads its C++ extensions (Swig-wrapped)
- Swig types
SwigPyPackedandSwigPyObjectdon't have__module__attribute - Python's importlib detects this and emits deprecation warnings
I've Just Fixed It
By adding these ilnes to def test_registration_in_registry(self)::
# Suppress warnings from PyTorch's Swig-generated types
with warnings.catch_warnings():
warnings.filterwarnings("ignore", message="builtin type SwigPyPacked has no __module__ attribute")
warnings.filterwarnings("ignore", message="builtin type SwigPyObject has no __module__ attribute")
There was a problem hiding this comment.
Ah, no. What I referred is about self.add_module(obs_name, obs). Below comments from L100 explains why self.add_module(obs_name, obs) call is needed. But, I can't see the warnings when I run the test.
There was a problem hiding this comment.
Oh, I see now.
Here's the full warning message that is generated by the example script tico/quantization/wrapq/examples/nn/quantize_conv3d.pywhen calling tico.convert unless we add the dynamic observers as submodules through self.add_module(obs_name, obs):
torch/export/_unlift.py:75: UserWarning: Attempted to insert a get_attr Node with no underlying reference in the owning GraphModule! Call GraphModule.add_submodule to add the necessary submodule, GraphModule.add_parameter to add the necessary Parameter, or nn.Module.register_buffer to add the necessary buffer
getattr_node = gm.graph.get_attr(lifted_node)
torch/fx/graph.py:1801: UserWarning: Node l__self___wrapped__input_slice_obs_0__cached_scale target L__self___wrapped__input_slice_obs_0__cached_scale L__self___wrapped__input_slice_obs_0__cached_scale of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target
warnings.warn(
torch/fx/graph.py:1801: UserWarning: Node l__self___wrapped__input_slice_obs_0__cached_zp target L__self___wrapped__input_slice_obs_0__cached_zp L__self___wrapped__input_slice_obs_0__cached_zp of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target
warnings.warn(
torch/fx/graph.py:1801: UserWarning: Node l__self___wrapped__conv2d_obs_0__cached_scale target L__self___wrapped__conv2d_obs_0__cached_scale L__self___wrapped__conv2d_obs_0__cached_scale of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target
warnings.warn(
torch/fx/graph.py:1801: UserWarning: Node l__self___wrapped__conv2d_obs_0__cached_zp target L__self___wrapped__conv2d_obs_0__cached_zp L__self___wrapped__conv2d_obs_0__cached_zp of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target
warnings.warn(
torch/fx/graph.py:1801: UserWarning: Node l__self___wrapped__input_slice_obs_1__cached_scale target L__self___wrapped__input_slice_obs_1__cached_scale L__self___wrapped__input_slice_obs_1__cached_scale of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target
warnings.warn(
torch/fx/graph.py:1810: UserWarning: Additional 9 warnings suppressed about get_attr references
warnings.warn(
There was a problem hiding this comment.
I got no warnings. Could you let me know the versions for transformers and torch? I'll reproduce the warnings with them.
Anyway, the codes makes sense to me. I'll approve this.
There was a problem hiding this comment.
Hi @mhs4670go
Sure. Here we go:
$ pip freeze | grep -e torch -e transformers
torch==2.6.0
torchvision==0.21.0
transformers==5.0.0I've double-checked that the warnings appear if I comment 1 line in tico/quantization/wrapq/wrappers/nn/quant_conv3d_decomposed.py:
def create_observer(obs_name_prefix, obs_dictionary, dict_key):
obs_name = f"{obs_name_prefix}{dict_key}"
obs = self._make_obs(obs_name)
obs_dictionary[dict_key] = obs
#self.add_module(obs_name, obs)Make sure you're running tico/quantization/wrapq/examples/nn/quantize_conv3d.py to check that:
$ python tico/quantization/wrapq/examples/nn/quantize_conv3d.pyThere was a problem hiding this comment.
Ah, thanks for the information. Turns out that I ran tico/quantization/wrapq/examples/nn/quantize_conv3d_special_case.py scirpt instead.
|
|
||
| # Convert to Circle format | ||
| example_input = torch.randn(1, in_channels, depth, height, width) | ||
| circle_model = tico.convert(quantized_model, (example_input,)) |
There was a problem hiding this comment.
| circle_model = tico.convert(quantized_model, (example_input,)) | |
| circle_model = tico.convert(quantized_model.eval(), (example_input,)) |
| # Convert to Circle format | ||
| print("\nConverting to Circle format...") | ||
| example_input = torch.randn(1, in_channels, depth, height, width) | ||
| circle_model = tico.convert(quantized_model, (example_input,)) |
There was a problem hiding this comment.
| circle_model = tico.convert(quantized_model, (example_input,)) | |
| circle_model = tico.convert(quantized_model.eval(), (example_input,)) |
|
Additionally, there's higher PEIR on patch embed script. This should be investigated. python tico/quantization/wrapq/examples/qwen/quantize_vision_patch_embed.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.650984
│ PEIR : 73.866314 %
└──────────────────────────────────────────────────────
┌────────────────────────────────────────────┐
3.2┤ │
│ •• │
│ • •••••••• │
2.2┤ • •••••••••••••••• • │
│ •• ••••••••••••••••••••••• │
│ ••••••••••••••••••••••••••••• │
│ ••••••••••••••••••••••••••••••• │
1.1┤ •••••••••••••••••••••••••••••••••• │
│ ••••••••••••••••••••••••••••••••••• │
│ ••••••••••••••••••••••••••••••••••••• │
0.0┤ • •••••••••••••••••••••••••••••••••••• • │
│ • ••••••••••••••••••••••••••••••••••• │
│ •••••••••••••••••••••••••••••••••••• │
│ ••••••••••••••••••••••••••••••••• • • │
-1.0┤ •••••••••••••••••••••••••••••••••• • │
│ • •••••••••••••••••••••••••••••• • │
│ ••••••••••••••••••••••••••••• • │
-2.1┤ •• ••••••••••••••••••••••••• │
│ •••••••••••••••••••• • │
│ •• ••••••••••• • │
│ • •• • │
-3.2┤ │
└┬──────────┬──────────┬─────────┬──────────┬┘
-3.2 -1.6 0.0 1.6 3.2 |
3527997 to
1b4771d
Compare
@mhs4670go thanks for catching this 👍 . Indeed there was a bug in the special case handling code in I've also modified example script $ python tico/quantization/wrapq/examples/qwen/quantize_vision_patch_embed.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.008683
│ PEIR : 0.744628 %
└──────────────────────────────────────────────────────
┌────────────────────────────────────────────┐
3.8┤ │
│ • │
│ │
2.7┤ • │
│ ••• │
│ ••• │
│ ••• │
1.5┤ ••• │
│ •••• │
│ •••• │
0.3┤ •••• │
│ •••• │
│ •••• │
│ •••• │
-0.8┤ ••• │
│ •••• │
│ ••• │
-2.0┤ ••• │
│ ••• │
│ ••• │
│ •• │
-3.2┤ │
└┬──────────┬──────────┬─────────┬──────────┬┘
-3.2 -1.4 0.3 2.1 3.8
Circle model saved as 'quantized_vision_patch_embed.circle' |
This change introduces QuantConv3dDecomposed wrapper to support post-training quantization of Conv3d operation that uses Conv2d and Add operations internally. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
1b4771d to
2484e0f
Compare



Why?
Conv3dis decomposed intoConv2d+Addoperations during the conversion to Circle (tico.convert) viaConvertConv3dToConv2dpass. TheConvertConv3dToConv2dpass introduces theConv2dandAddoperations that don't participate in quantization (their outputs are not calibrated and quantized) and therefore remain floating-point which undermines the whole task of model quantization (#489). Therefore we need to decomposeConv3dat theQuantwrapper level (before calibration/quantization) to inject observers forConv2dandAddoutputs.What
This change introduces:
QuantConv3dDecomposed(tico/quantization/wrapq/wrappers/nn/quant_conv3d_decomposed.py).class TestQuantConv3dDecomposed(test/quantization/wrapq/wrappers/nn/test_quant_conv3d_decomposed.py)._CORE_MODULES(tico/quantization/wrapq/wrappers/registry.py).Conv3dquantization and conversion to Circle:tico/quantization/wrapq/examples/nn/quantize_conv3d.pytico/quantization/wrapq/examples/nn/quantize_conv3d_special_case.pyUnit Tests
Coverage info (irrelevant files skipped):
Not covered lines are related to invalid padding scheme exception generation in
QuantConv3dDecomposed._parse_padding. These lines are not covered because the creation ofConv3dwith an invalid padding scheme fails before we have a chance to reachQuantConv3dDecomposed._parse_padding.Example Script (quantize Conv3d and convert to Circle)
1. Note For Reviewers
I've decided to intriduce a new
QauntConv3dDecomposedwrapper rather then modify the existingQauntConv3dbecause:QauntConv3dis much simpler.QauntConv3dmight become useful in future (e.g. when Circle starts supportingConv3d).2. Note For Reviewers
Decomposition of
Conv3dassumes applyingConv2din a loop to the temporal slices the of input tensors and the temporal slices ofConv3dkernel. The loop takes place inQauntConv3dDecomposed.forwardmethod and the number of loop iterations depends on:kernel_depth(kT) - fixed in__init__T_out- depends on input size, only known during forwardN, C_in- depends on input shape, only known during forwardThe number of loop iterations determines the number of
Conv2d+Addoperations and hence, the number of observers to be injected. Therefore the observers cannot be created untilQauntConv3dDecomposed.forwardis called for the first time inCALIBmode. Hence, the observers forConv2d+Addoperations are created inQauntConv3dDecomposed.forward.Example Script: Special Case (kernel_size = input_size = stride)