Skip to content

[Contribution] Gemma3-27B VLM#34

Open
plienhar wants to merge 48 commits intoaws-neuron:mainfrom
plienhar:contrib/gemma3-27b-vlm
Open

[Contribution] Gemma3-27B VLM#34
plienhar wants to merge 48 commits intoaws-neuron:mainfrom
plienhar:contrib/gemma3-27b-vlm

Conversation

@plienhar
Copy link

@plienhar plienhar commented Feb 5, 2026

Description

This PR contributes the Gemma3 VLM model family (4B, 12B and 27B) to NxDI.

Model Information

Model Name: Gemma3-4b, 12B & 27B

Model Architecture: LLaVA-style VLM with fixed-resolution SigLIP (400M) vision encoder and Transformer decoder text backbone.

Purpose: Image+Text to Text.

Checklist

Please ensure your PR includes the following items. Refer to the contrib/CONTRIBUTING.md for detailed guidelines.

Required Components

  • Accuracy Test (ex. test/integration/test_model.py)

    • At least one integration test that validates model accuracy
    • Uses logit validation or equivalent accuracy verification
    • Test can compile and run the model on Neuron
  • README.md with the following sections:

    • Usage Example: Clear code example showing how to use the model
    • Compatibility Matrix: Table showing tested Neuron SDK versions and instance types (Trn1/Trn2/Inf2)
    • Example Checkpoints: Links to compatible model checkpoints (e.g., HuggingFace Hub)
    • Testing Instructions: Command to run the test suite for the model
  • Source Code (src/)

    • Modeling code following NxD Inference patterns
    • Properly structured in the contrib folder hierarchy

Optional Components

  • Unit Tests (CPU or Neuron-based)
    • Tests for individual modeling components
    • Located in test/unit/ directory

Folder Structure

/contrib/models/gemma3-vision/
  README.md
  /src
    <modeling code>
  /test
    /unit
        /gemma3
        /siglip
    /integration
      test_model.py
 /vllm

Testing

How did you test this change?
Tests included:

  • Unit tests for all model components of both the vision and text model.
  • Integration test validating token & logits matching and performance.
  • End-to-end tests validating vLLM integration (cf. vllm folder)

Test Results:

  • End-to-end tests using actual checkpoints show consistent output.
  • Integration test: Using 4-layer vision & text model, random FP16 weights, only achieves partial logits matching (only 12 out of 16 checked positions fully match):
neuronx_distributed_inference.utils.exceptions.LogitMatchingValidationError: No divergence. Validating the remaining 16 tokens in each batch.
Test failed at batch 0 token 0. Top k = None error 0.09357890486717224 > 0.05. Top k = 1000 error 0.06944305449724197 > 0.03. Top k = 50 error 0.05631783977150917 > 0.02. Top k = 5 error 0.028122223913669586 > 0.01.
Test failed at batch 0 token 5. Top k = 5 error 0.017814606428146362 > 0.01.
Test failed at batch 0 token 14. Top k = 5 error 0.022793518379330635 > 0.01.
Test failed at batch 0 token 15. Top k = 5 error 0.01664724200963974 > 0.01.
Summary: Max divergence difference = 0 at index (batch 0 token 0), Top k = None max error = 0.09357890486717224 at index (batch 0 token 0), Top k = 1000 max error = 0.06944305449724197 at index (batch 0 token 0), Top k = 50 max error = 0.05631783977150917 at index (batch 0 token 0), Top k = 5 max error = 0.028122223913669586 at index (batch 0 token 0)
Complete Error Summary
These errors are normalized for each token by the largest expected logit for that token
{
    "max_top_k_errors": {
        "null": {
            "error": 0.09357890486717224,
            "batch_index": 0,
            "token_index": 0
        },
        "1000": {
            "error": 0.06944305449724197,
            "batch_index": 0,
            "token_index": 0
        },
        "50": {
            "error": 0.05631783977150917,
            "batch_index": 0,
            "token_index": 0
        },
        "5": {
            "error": 0.028122223913669586,
            "batch_index": 0,
            "token_index": 0
        }
    },
    "average_over_tokens": {
        "null": {
            "mean_abs_error": 0.006117195017911044,
            "mean_squared_error": 7.054628936696939e-05,
            "max_abs_error": 0.03580721630714834,
            "max_squared_error": 0.0015160405286273435
        },
        "1000": {
            "max_abs_error": 0.026974295498803258,
            "max_squared_error": 0.0008563175740166724
        },
        "50": {
            "max_abs_error": 0.020097059023100883,
            "max_squared_error": 0.0004982783846769678
        },
        "5": {
            "max_abs_error": 0.013028468907577917,
            "max_squared_error": 0.00020805770526506807
        }
    }
}

Compatibility

Tested with:

  • Neuron SDK Version(s): 2.27.1
  • Instance Type(s): trn1.32xlarge, trn2.3xlarge
  • vLLL-Neuron: 0.3.0
  • PyTorch Version: 2.9.0
  • Python Version: 3.10.12

Additional Information

  • A few broken NxDI components require to be patched at runtime. See contrib/models/gemma3-vision/src/gemma3_vision/ndxi_patch.py
  • 4B and 12B models have a head dim of 256 which is not supported by the FA kernel.

Related Issues

N/A

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

For vLLM integration details, see: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/onboarding-models.html#nxdi-onboarding-models-vllm


By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included

@yidezou
Copy link

yidezou commented Feb 12, 2026

Reviewed and it looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants