[Contribution] Gemma3-27B VLM by plienhar · Pull Request #34 · aws-neuron/neuronx-distributed-inference

plienhar · 2026-02-05T17:24:35Z

Description

This PR contributes the Gemma3 VLM model family (4B, 12B and 27B) to NxDI.

Model Information

Model Name: Gemma3-4b, 12B & 27B

Model Architecture: LLaVA-style VLM with fixed-resolution SigLIP (400M) vision encoder and Transformer decoder text backbone.

Purpose: Image+Text to Text.

Checklist

Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.

Required Components

Accuracy Test (ex. test/integration/test_model.py)
- At least one integration test that validates model accuracy
- Uses logit validation or equivalent accuracy verification
- Test can compile and run the model on Neuron
README.md with the following sections:
- Usage Example: Clear code example showing how to use the model
- Compatibility Matrix: Table showing tested Neuron SDK versions and instance types (Trn1/Trn2/Inf2)
- Example Checkpoints: Links to compatible model checkpoints (e.g., HuggingFace Hub)
- Testing Instructions: Command to run the test suite for the model
Source Code (src/)
- Modeling code following NxD Inference patterns
- Properly structured in the contrib folder hierarchy

Optional Components

Unit Tests (CPU or Neuron-based)
- Tests for individual modeling components
- Located in test/unit/ directory

Folder Structure

/contrib/models/gemma3-vision/
  README.md
  /src
    <modeling code>
  /test
    /unit
        /gemma3
        /siglip
    /integration
      test_model.py
 /vllm

Testing

How did you test this change?
Tests included:

Unit tests for all model components of both the vision and text model.
Integration test validating token & logits matching and performance.
End-to-end tests validating vLLM integration (cf. vllm folder)

Test Results:

End-to-end tests using actual checkpoints show consistent output.
Integration test: Using 4-layer vision & text model, random FP16 weights, only achieves partial logits matching (only 12 out of 16 checked positions fully match):

neuronx_distributed_inference.utils.exceptions.LogitMatchingValidationError: No divergence. Validating the remaining 16 tokens in each batch.
Test failed at batch 0 token 0. Top k = None error 0.09357890486717224 > 0.05. Top k = 1000 error 0.06944305449724197 > 0.03. Top k = 50 error 0.05631783977150917 > 0.02. Top k = 5 error 0.028122223913669586 > 0.01.
Test failed at batch 0 token 5. Top k = 5 error 0.017814606428146362 > 0.01.
Test failed at batch 0 token 14. Top k = 5 error 0.022793518379330635 > 0.01.
Test failed at batch 0 token 15. Top k = 5 error 0.01664724200963974 > 0.01.
Summary: Max divergence difference = 0 at index (batch 0 token 0), Top k = None max error = 0.09357890486717224 at index (batch 0 token 0), Top k = 1000 max error = 0.06944305449724197 at index (batch 0 token 0), Top k = 50 max error = 0.05631783977150917 at index (batch 0 token 0), Top k = 5 max error = 0.028122223913669586 at index (batch 0 token 0)
Complete Error Summary
These errors are normalized for each token by the largest expected logit for that token
{
    "max_top_k_errors": {
        "null": {
            "error": 0.09357890486717224,
            "batch_index": 0,
            "token_index": 0
        },
        "1000": {
            "error": 0.06944305449724197,
            "batch_index": 0,
            "token_index": 0
        },
        "50": {
            "error": 0.05631783977150917,
            "batch_index": 0,
            "token_index": 0
        },
        "5": {
            "error": 0.028122223913669586,
            "batch_index": 0,
            "token_index": 0
        }
    },
    "average_over_tokens": {
        "null": {
            "mean_abs_error": 0.006117195017911044,
            "mean_squared_error": 7.054628936696939e-05,
            "max_abs_error": 0.03580721630714834,
            "max_squared_error": 0.0015160405286273435
        },
        "1000": {
            "max_abs_error": 0.026974295498803258,
            "max_squared_error": 0.0008563175740166724
        },
        "50": {
            "max_abs_error": 0.020097059023100883,
            "max_squared_error": 0.0004982783846769678
        },
        "5": {
            "max_abs_error": 0.013028468907577917,
            "max_squared_error": 0.00020805770526506807
        }
    }
}

Compatibility

Tested with:

Neuron SDK Version(s): 2.27.1
Instance Type(s): trn1.32xlarge, trn2.3xlarge
vLLL-Neuron: 0.3.0
PyTorch Version: 2.9.0
Python Version: 3.10.12

Additional Information

A few broken NxDI components require to be patched at runtime. See contrib/models/gemma3-vision/src/gemma3_vision/ndxi_patch.py
4B and 12B models have a head dim of 256 which is not supported by the FA kernel.

Related Issues

N/A

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

For vLLM integration details, see: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/onboarding-models.html#nxdi-onboarding-models-vllm

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

…_position_ids to NeuronGemma3ForCausalLM

…s.py

yidezou · 2026-02-12T17:20:04Z

Reviewed and it looks good.

Pierre Lienhart added 30 commits January 21, 2026 16:45

Initialize contrib folder

584a242

Add cohere2 code to be used as reference

22d2aa8

Add code to be migrated

bfce418

Add Kiro steering doc

95fc1b5

Convert tmp/external-code from submodule to regular directory

4fa42b3

Add Kiro migration spec

2da18e5

Add migrated files

14e1650

Add migrated unit tests

10a566a

Add migrated text-only model

c0beef3

Add passing text model unit tests

57b0490

Fix typos

5557f25

Fix test_vision_model.py

72a9381

Add passing unit tests for vision encoder

1cd85b2

Fix test_encoder_layer.py

041e3a4

Fix test_encoder.py

b4b80e6

Remove SigLIP pooling head implementation

e45ac00

Fix test_siglip_vision_model.py

9a49c4f

Fix test_vision_transformer.py

93dc733

Fix test_attention.py

38155f1

Add offline inference script

31a8b1d

Patch broken NeuronBaseForImageToText.forward

3642909

Enhance docstrings in modeling_gemma3.py

307c62a

Clean NeuronGemma3ForCausalLM get_compiler_args methods

9321639

Remove unused image_sizes arg from get_required_kwargs

407d19f

Set pipeline_execution=True by default for Gemma3 vision encoder

a2efd90

Increase readability of NeuronGemma3ForCausalLM.forward

fb2c904

Add multi-image support in NeuronGemma3ForCausalLM.forward

4549bcc

Simplify NeuronGemma3ForCausalLM.pad_positions

1f0bd60

Remove NeuronGemma3ForCausalLM.concat_causal_lm_outputs

80889d2

Decorate NeuronGemma3ForCausalLM static methods appropriately

192b861

Pierre Lienhart added 18 commits January 30, 2026 12:11

Clean unit tests

8a5c2aa

Add missing load_hf_model, fixed _get_constructed_outputs and _create…

ed03eb4

…_position_ids to NeuronGemma3ForCausalLM

Add integration test

2233992

Patch HuggingFaceGenerationAdapter

7ae7345

Add get_test_name_suffix utility function

614e6d0

Refactor integration test utility functions into new integration/util…

d52ab21

…s.py

Refactor test_model.py to be launched with pytest

62ee893

Revert __main__ entrypoint in integration test (pytest bug)

593c851

Add vLLM offline inference

1eaeecc

Add vLLM online inference

52beeef

Remove temporary helper files

1b7eec4

Remove MIGRATION_STATUS.md

85ae93c

Clean unit tests

893ab8a

Clean imports

56a6e7b

Align with HF naming

1b8e641

Apply pre-commit hooks

cd1fd19

Update README.md

0042f5d

Remove Kiro helper files

fa1a760

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Contribution] Gemma3-27B VLM#34

[Contribution] Gemma3-27B VLM#34
plienhar wants to merge 48 commits intoaws-neuron:mainfrom
plienhar:contrib/gemma3-27b-vlm

plienhar commented Feb 5, 2026 •

edited

Loading

Uh oh!

yidezou commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

plienhar commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Additional Information

Related Issues

vLLM Integration

Uh oh!

yidezou commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

plienhar commented Feb 5, 2026 •

edited

Loading