Skip to content

fix(bit_exact): support grouped/depthwise Conv1D/2D in produce_kif#1485

Open
LarocheC wants to merge 1 commit into
fastmachinelearning:mainfrom
LarocheC:fix/grouped-depthwise-produce-kif
Open

fix(bit_exact): support grouped/depthwise Conv1D/2D in produce_kif#1485
LarocheC wants to merge 1 commit into
fastmachinelearning:mainfrom
LarocheC:fix/grouped-depthwise-produce-kif

Conversation

@LarocheC
Copy link
Copy Markdown
Contributor

@LarocheC LarocheC commented Jun 4, 2026

Description

The produce_kif handler for Conv1D/Conv2D in the bit_exact precision propagation pass assumes a non grouped kernel, where the kernel's second to last axis spans every input channel (kernel.shape[-2] == n_chan). A grouped or depthwise convolution stores only in_per_group input channels per filter, so the im2col buffer no longer matches the full channel input and conversion of a model containing a grouped or depthwise quantized conv fails inside the pass:

ValueError: could not broadcast input array from shape (48,) into shape (3,)

This is reproduced with an HGQ2 QConv1D(..., groups=g) for any g > 1 (the pass is triggered by the FixedPointQuantizer the layer inserts).

The fix detects the grouped case (kernel.shape[-2] != n_chan) and processes each group as an independent standard convolution over its own channel slice, then concatenates the per group results along the channel axis. Depthwise is the degenerate groups == n_chan (in_per_group == 1) case. The non grouped branch is the original code, left byte for byte unchanged, so existing models are unaffected.

Scope: this fixes precision propagation only. After this change a grouped or depthwise model converts instead of crashing, but the converter and backend codegen still emit a dense matmul for the grouped case, so the compiled numerics are not yet correct for groups > 1. Routing grouped/depthwise convolutions to a real grouped kernel is a separate follow up; this PR is the prerequisite for it, since any grouped conv support needs the bit_exact pass to handle grouped kernels rather than crash on them. The non grouped path is unaffected.

Type of change

  • Bug fix (non-breaking change that fixes an issue)

Tests

Added test/pytest/test_bit_exact_grouped_conv.py. For grouped, depthwise and dense (control) QConv1D and QConv2D models it runs the full bit_exact conversion flow (which raised ValueError before this fix for groups > 1) and asserts that the output precision the pass assigns represents the quantized Keras output exactly, with no rounding and no saturation. The check is done both on the single assigned result_t and per channel through produce_kif, across the Vivado, Vitis and oneAPI backends.

To reproduce:

pytest test/pytest/test_bit_exact_grouped_conv.py

Test Configuration: Keras 3.14 (torch backend), HGQ2 0.1.8, Python 3.13. Result: 18 passed.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation. (not applicable, internal pass fix)
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

The Conv1D/Conv2D produce_kif handler assumed a non-grouped kernel
(kernel.shape[-2] == n_chan). For a grouped or depthwise convolution the
kernel stores only in_per_group input channels per filter, so the im2col
buffer no longer matched the full-channel input and the bit_exact pass
raised "could not broadcast input array from shape ..." for groups > 1.

Detect the grouped case (kernel.shape[-2] != n_chan) and process each
group as an independent standard convolution over its own channel slice,
then concatenate along the channel axis. Depthwise is the degenerate
groups == n_chan case. The non-grouped path is left byte-for-byte unchanged.

Add a test that runs grouped, depthwise and dense QConv1D/QConv2D models
through the bit_exact flow on the Vivado, Vitis and oneAPI backends and
asserts the assigned output precision represents the quantized Keras
output exactly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant