Skip to content

Support transfomers v5#994

Draft
Tcc0403 wants to merge 25 commits intomainfrom
transformers-5.0.0rc1
Draft

Support transfomers v5#994
Tcc0403 wants to merge 25 commits intomainfrom
transformers-5.0.0rc1

Conversation

@Tcc0403
Copy link
Collaborator

@Tcc0403 Tcc0403 commented Jan 2, 2026

Important

Do not merge this PR before all issues are resolved!

Testing with three versions: 4.49.0 , 4.57.6 and the latest stable version

Note

nvi-ci is split into correctness test ci and convergence test ci to speed up testing in this PR
, and more jobs for testing bc with transformers v4 (4.49.0 and 4.57.6)

Whether keeping this change or not is yet to be discussed

Summary

This is a dev branch for aggregating PRs related to transformers v5 changes.

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@Tcc0403 Tcc0403 mentioned this pull request Jan 2, 2026
12 tasks
@Tcc0403 Tcc0403 marked this pull request as ready for review January 2, 2026 11:11
@Tcc0403 Tcc0403 marked this pull request as draft January 2, 2026 12:17
@Tcc0403 Tcc0403 force-pushed the transformers-5.0.0rc1 branch from 0f3f8eb to 1599bfc Compare January 14, 2026 09:49
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
Fix #1013 

Transformers v5 introduces a new attribute `rope_parameters` in model
config, containing all rope related parameters, and deprecate standalone
rope attribute, such as `rope_scaling`, `rope_theta`, etc.

Most `TokenizerFast`s are now default tokenizers in v5, hence
`tokenization_xxx_fast` paths are removed

This PR 
- replaces deprecated configs with `rope_parameters`
- replaces fast tokenizers path with default ones


<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence

---------

Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
Follow-up to #1014 

Change all occurences in all convergence tests.

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence

Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->
`position_id` has been removed from `apply_rotary_pos_emb` in
huggingface/transformers#43255
## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->
```
❯ python3 -m pytest test/transformers/test_rope.py -q

test/transformers/test_rope.py::test_correctness[True-dtype0-1e-05-1e-05-1-128-32-32-64] PASSED                              [  2%]
test/transformers/test_rope.py::test_correctness[True-dtype0-1e-05-1e-05-2-128-32-32-64] PASSED                              [  5%]
test/transformers/test_rope.py::test_correctness[True-dtype0-1e-05-1e-05-1-128-32-8-64] PASSED                               [  8%]
test/transformers/test_rope.py::test_correctness[True-dtype0-1e-05-1e-05-2-128-32-8-64] PASSED                               [ 11%]
test/transformers/test_rope.py::test_correctness[True-dtype0-1e-05-1e-05-3-423-73-213-92] PASSED                             [ 13%]
test/transformers/test_rope.py::test_correctness[True-dtype0-1e-05-1e-05-3-423-73-155-92] PASSED                             [ 16%]
test/transformers/test_rope.py::test_correctness[True-dtype1-0.1-1e-05-1-128-32-32-64] PASSED                                [ 19%]
test/transformers/test_rope.py::test_correctness[True-dtype1-0.1-1e-05-2-128-32-32-64] PASSED                                [ 22%]
test/transformers/test_rope.py::test_correctness[True-dtype1-0.1-1e-05-1-128-32-8-64] PASSED                                 [ 25%]
test/transformers/test_rope.py::test_correctness[True-dtype1-0.1-1e-05-2-128-32-8-64] PASSED                                 [ 27%]
test/transformers/test_rope.py::test_correctness[True-dtype1-0.1-1e-05-3-423-73-213-92] PASSED                               [ 30%]
test/transformers/test_rope.py::test_correctness[True-dtype1-0.1-1e-05-3-423-73-155-92] PASSED                               [ 33%]
test/transformers/test_rope.py::test_correctness[False-dtype0-1e-05-1e-05-1-128-32-32-64] PASSED                             [ 36%]
test/transformers/test_rope.py::test_correctness[False-dtype0-1e-05-1e-05-2-128-32-32-64] PASSED                             [ 38%]
test/transformers/test_rope.py::test_correctness[False-dtype0-1e-05-1e-05-1-128-32-8-64] PASSED                              [ 41%]
test/transformers/test_rope.py::test_correctness[False-dtype0-1e-05-1e-05-2-128-32-8-64] PASSED                              [ 44%]
test/transformers/test_rope.py::test_correctness[False-dtype0-1e-05-1e-05-3-423-73-213-92] PASSED                            [ 47%]
test/transformers/test_rope.py::test_correctness[False-dtype0-1e-05-1e-05-3-423-73-155-92] PASSED                            [ 50%]
test/transformers/test_rope.py::test_correctness[False-dtype1-0.1-1e-05-1-128-32-32-64] PASSED                               [ 52%]
test/transformers/test_rope.py::test_correctness[False-dtype1-0.1-1e-05-2-128-32-32-64] PASSED                               [ 55%]
test/transformers/test_rope.py::test_correctness[False-dtype1-0.1-1e-05-1-128-32-8-64] PASSED                                [ 58%]
test/transformers/test_rope.py::test_correctness[False-dtype1-0.1-1e-05-2-128-32-8-64] PASSED                                [ 61%]
test/transformers/test_rope.py::test_correctness[False-dtype1-0.1-1e-05-3-423-73-213-92] PASSED                              [ 63%]
test/transformers/test_rope.py::test_correctness[False-dtype1-0.1-1e-05-3-423-73-155-92] PASSED                              [ 66%]
test/transformers/test_rope.py::test_functional_correctness[True-dtype0-1e-05-1e-05-1-2-2-2-8] PASSED                        [ 69%]
test/transformers/test_rope.py::test_functional_correctness[True-dtype0-1e-05-1e-05-1-2-1-2-8] PASSED                        [ 72%]
test/transformers/test_rope.py::test_functional_correctness[True-dtype0-1e-05-1e-05-9-7-41-41-41] PASSED                     [ 75%]
test/transformers/test_rope.py::test_functional_correctness[True-dtype1-0.1-1e-05-1-2-2-2-8] PASSED                          [ 77%]
test/transformers/test_rope.py::test_functional_correctness[True-dtype1-0.1-1e-05-1-2-1-2-8] PASSED                          [ 80%]
test/transformers/test_rope.py::test_functional_correctness[True-dtype1-0.1-1e-05-9-7-41-41-41] PASSED                       [ 83%]
test/transformers/test_rope.py::test_functional_correctness[False-dtype0-1e-05-1e-05-1-2-2-2-8] PASSED                       [ 86%]
test/transformers/test_rope.py::test_functional_correctness[False-dtype0-1e-05-1e-05-1-2-1-2-8] PASSED                       [ 88%]
test/transformers/test_rope.py::test_functional_correctness[False-dtype0-1e-05-1e-05-9-7-41-41-41] PASSED                    [ 91%]
test/transformers/test_rope.py::test_functional_correctness[False-dtype1-0.1-1e-05-1-2-2-2-8] PASSED                         [ 94%]
test/transformers/test_rope.py::test_functional_correctness[False-dtype1-0.1-1e-05-1-2-1-2-8] PASSED                         [ 97%]
test/transformers/test_rope.py::test_functional_correctness[False-dtype1-0.1-1e-05-9-7-41-41-41] PASSED                      [100%]
```
<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence

Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
@Tcc0403 Tcc0403 force-pushed the transformers-5.0.0rc1 branch from df188d7 to 2cd6e39 Compare January 20, 2026 06:44
## Summary
Update Gemma tokenizer usage in convergence tests for Transformers v5 by
removing deprecated `GemmaTokenizerFast` imports and renaming usages to
the supported non-fast tokenizer class. This fixes the `No module named
transformers.models.gemma.tokenization_gemma_fast` error when running
convergence tests under Transformers v5.

## Details
Transformers v5 moves away from parallel “fast” and “slow” tokenizer
implementations and adopts a single tokenizer implementation (see
[huggingface/transformers#40936](huggingface/transformers#40936 (comment))).
- Convergence tests were importing and instantiating the fast tokenizer
class, causing import errors.
- This change updates both: 1) the import path, and 2) the tokenizer
class name used in code (`GemmaTokenizerFast` → `GemmaTokenizer`),
following the new Transformers v5 API.

## Testing Done
- Hardware Type: A100-40G-PCIe
- [ ] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence
@kashif
Copy link
Contributor

kashif commented Jan 26, 2026

@Tcc0403 in the CI tests, in my opinion we should not be setting any rope based setting in the model. The model will read the defaults based on whatever version of transformers one has and setting the rope params. etc. does not do anything in terms of the test. What do you think?

Tcc0403 and others added 5 commits January 27, 2026 22:35
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
Fix multiple failing monkey_patch tests for transformers v5. The tests
failed because of
huggingface/transformers#41541.

These changes are backward compatible.

This change fixes the following failing tests:
```
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe_for_conditional_generation - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe_text - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_llama4_for_conditional_generation - AttributeError: 'Llama4Config' object has no attribute 'pad_token_id'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_glm4v - AttributeError: 'Glm4vTextConfig' object has no attribute 'pad_token_id'
```


Fixes #1059.

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [ ] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence
## Summary

Fixes #1013

- Remove deprecated `rope_theta` and `rope_scaling` from test
configurations
- Retain `mrope_section` in `rope_parameters` for VL models (Qwen2-VL,
Qwen2.5-VL) as it's required by Transformers v5's
`config.rope_parameters["mrope_section"]` direct access pattern
- Remove deprecated `rope_parameters` containing only `rope_theta` from
non-VL models (relying on model defaults)


## Testing Done

- Hardware Type: RTX3090 (Nvidia Ampere)
- [x] run `python -m pytest test/transformers/test_monkey_patch.py -v`
to ensure correctness
- [x] run `make checkstyle` to ensure code style
    - Linter checks pass with no new issues introduced.
- [x] run `make test-convergence` to ensure convergence

### test_monkey_patch.py
- **5 failed** - Unrelated to this PR (MoE experts structure changes in
Transformers v5):
- `test_apply_liger_kernel_to_instance_for_mixtral` -
`'MixtralDecoderLayer' has no attribute 'block_sparse_moe'`
- `test_apply_liger_kernel_to_instance_for_qwen3_moe` -
`'Qwen3MoeExperts' object is not iterable`
- `test_apply_liger_kernel_to_instance_for_glm4v_moe` -
`'Glm4vMoeTextNaiveMoe' object is not iterable`
- `test_apply_liger_kernel_to_instance_for_qwen3_next` -
`'Qwen3NextExperts' object is not iterable`
- `test_apply_liger_kernel_to_instance_for_hunyuan_v1_moe` -
`'HunYuanMoEV1Experts' object is not iterable`

### convergence tests

| Error Type | Affected Models | Root Cause |
|------------|-----------------|------------|
| `AttributeError: '...Config' has no attribute 'pad_token_id'` |
Qwen3VLMoe, Glm4v, Llama4, Exaone4 | Transformers v5 Config structure
changes |
| `RuntimeError: _histc_cuda...deterministic` | Qwen3Moe, GptOss,
Glm4vMoe, Qwen3Next, HunYuanV1Moe | CUDA deterministic implementation
issue |
| `AssertionError: [Loss] Number of mismatched` | Llama4 | Numerical
precision issue |
| `TypeError: 'NoneType' object is not subscriptable` | Llava | Model
initialization issue |


## Out of Scope

The following issues require separate PRs:
1. MoE models' `experts` structure changes (no longer iterable in
Transformers v5)
2. `pad_token_id` attribute location changes in composite configs
4. CUDA deterministic implementation issues
@vaibhavjindal
Copy link
Collaborator

@Mecoli1219 @Tcc0403 Should we fix this branch to make all the tests pass for v4.49.0 and v4.57.6 and then merge it in main? After that we can keep adding new PRs for v5 support in main directly. Once all the issues with v5 are fixed, we can release a new version? Wdyt?

…v5 (#1062)

Fixes #1012
⚠️ Dependency: This PR depends on #1060. Please review and merge #1060
first.

## Summary
- Fix `AttributeError: 'Qwen2VLConfig' object has no attribute
'hidden_size'` for Qwen2-VL and Qwen2.5-VL models
- Update test configurations to use the new `text_config` structure
required by Transformers v5

## Changes
1. **Model code** (`src/liger_kernel/transformers/model/qwen2_vl.py`,
`qwen2_5_vl.py`):
- Changed `self.config.hidden_size` →
`self.config.text_config.hidden_size`
- Changed `self.config.vocab_size` →
`self.config.text_config.vocab_size`

2. **Test configurations** (`test/convergence/bf16/test_mini_models.py`,
`fp32/test_mini_models.py`):
- Restructured `mini_qwen2_vl` and `mini_qwen2_5_vl` configurations to
use `text_config` dictionary for text-related parameters

## Background
In Transformers v5, `Qwen2VLConfig` and `Qwen2_5_VLConfig` moved
text-related parameters (such as `hidden_size`, `vocab_size`) into a
nested `text_config` attribute, following the pattern used by other
multimodal models.

## Test plan
- [x] `python -m pytest test/convergence/bf16/test_mini_models.py -k
"qwen2_vl or qwen2_5_vl"` passes
- [x] `python -m pytest test/convergence/fp32/test_mini_models.py -k
"qwen2_vl or qwen2_5_vl"` passes
@Mecoli1219
Copy link
Contributor

Sounds great. Let me fix this branch for testing in v4.

Mecoli1219 and others added 2 commits February 5, 2026 13:54
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
This PR restores backward compatibility for convergence tests with
transformers v4 (v4.49.0 ~ v4.57.6). During the initial development
phase for transformers v5 support, backward compatibility was
intentionally deprioritized, leading to significant test regressions.
This PR fixes those regressions while maintaining a stable foundation
for the ongoing v5 integration.

## Related Issues & PRs
- #978 
- #994 

## Details

The current codebase assumes transformers v5 conventions, which broke
compatibility with the v4.x series in two major areas:

1. RoPE Parameters: Some model miss some rope parameters
(`rope_scaling`) since they are unified to `rope_parameters` in
transformer v5.
2. Tokenizer Consistency: v5 and v4 handle the Tokenizer interfaces
differently. V5's Tokenizer will select the appropriate backend, while
v4's Tokenizer is the python-based implementation using SentencePiece as
backend.

Key Fixes:

- Added conditional logic to provide different rope parameters for
different transformers versions.
- Enforced TokenizerFast usage for transformers < v5 to resolve
interface mismatches.

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->
I ran `python -m pytest test/convergence/*` on different versions of
transformers on the original branch and after making changes. The result
is shown below:

| Branches| v4.49.0 | v4.57.6 | v5.0.0 |
|---|---|---|---|
| transformer-5.0.0rc1 | 8 failed, 37 passed, 98 skipped, 1 warning | 42
failed, 92 passed, 9 skipped, 3 warnings| 19 failed, 115 passed, 9
skipped, 29 warnings |
| This PR | 0 failed, 45 passed, 98 skipped, 1 warning | 0 failed, 134
passed, 9 skipped, 19 warnings | 19 failed, 115 passed, 9 skipped, 29
warnings |

All of the failed tests in v5 are inspected carefully that all of them
are identical to the previously thrown error.


</div></b>

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: H100
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [x] run `make test-convergence` to ensure convergence
@vaibhavjindal
Copy link
Collaborator

@Tcc0403 @Mecoli1219 should we merge this to main now as #978 is updated now?

@Mecoli1219
Copy link
Contributor

Sure

vaibhavjindal and others added 7 commits February 6, 2026 14:57
)

## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
Fix pad_token_id in convergence tests for transformers v5 support. Fixes
#1081.

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->
Fixes the following tests for transformers v5.0.0 compatibility while
maintaining backward compatibility with < v5.0.0:
```
pytest test/convergence/fp32/test_mini_models.py::test_mini_model[mini_glm4v-32-0.0001-dtype16-1e-08-1e-05-0.005-1e-05-0.005-1e-05] \
    test/convergence/fp32/test_mini_models.py::test_mini_model[mini_exaone4-32-1e-05-dtype30-1e-08-1e-05-0.005-1e-05-0.005-1e-05] \
    test/convergence/fp32/test_mini_models_with_logits.py::test_mini_model[mini_glm4v-32-0.0001-dtype15-1e-08-1e-05-0.005-1e-05-0.005-1e-05] \
    test/convergence/fp32/test_mini_models_with_logits.py::test_mini_model[mini_exaone4-32-1e-05-dtype29-1e-08-1e-05-0.005-1e-05-0.005-1e-05] \
    test/convergence/bf16/test_mini_models.py::test_mini_model[mini_glm4v-32-1e-05-dtype18-0.01-0.02-0.1-0.01-0.01-0.01] \
    test/convergence/bf16/test_mini_models.py::test_mini_model[mini_exaone4-32-1e-05-dtype29-0.01-0.05-0.1-0.01-0.01-0.01] \
    test/convergence/bf16/test_mini_models_with_logits.py::test_mini_model[mini_exaone4-32-1e-05-dtype28-0.01-0.05-0.1-0.01-0.01-0.01] \
    test/convergence/bf16/test_mini_models_with_logits.py::test_mini_model[mini_glm4v-32-1e-05-dtype19-0.01-0.02-0.1-0.01-0.01-0.01]
```

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->
Yes, on 1xH100.

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [x] run `make test-convergence` to ensure convergence
…for Transformers v5 (#1079)

## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->

Fix these unit tests error for Transformers v5.

```
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_mixtral - AttributeError: 'MixtralDecoderLayer' object has no attribute 'block_sparse_moe'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_moe - TypeError: 'Qwen3MoeExperts' object is not iterable
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_glm4v_moe - TypeError: 'Glm4vMoeTextNaiveMoe' object is not iterable
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_next - TypeError: 'Qwen3NextExperts' object is not iterable
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_hunyuan_v1_moe - TypeError: 'HunYuanMoEV1Experts' object is not iterable
```

## Related Issues & PRs
- #978 

## Details
The implementation of MoE experts are updated in transformers v5.
Originally, it is a module list of MLPs, but some models updated their
implementation to a single class `<ModelName>Experts`.

These classes' implementation are identical, so this PR creates a
`LigerExperts` class that utilizes `LigerSiLUMulFunction` to replace the
experts of these models.
- [x] mixtral
- [x] qwen3_moe
- [x] glmv4_moe
- [x] qwen3_next
- [x] hunyuan_v1_moe

> [!NOTE]
> The `LigerExperts` in this PR is the workaround to fix the unit test
for v5. We can replace it with better kernel as mentioned in #958 in the
future

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->
Unit tests are added to check the correctness of original implementation
of `LigerBlockSparseTop2MLP` for v4 and `LigerExperts` for v5.
<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: H100 * 1
- [x] run `make test` to ensure correctness in v5.0.0
- [x] run `make test` to ensure correctness in v4.57.6
- [x] run `make test` to ensure correctness in v4.49.0
- [x] run `make checkstyle` to ensure code style
- [x] run `make test-convergence` to ensure convergence in v5.0.0
    - 5 unrelated existing errors 
- [x] run `make test-convergence` to ensure convergence in v4.57.6
- [x] run `make test-convergence` to ensure convergence in v4.49.0
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [ ] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence

Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
@Tcc0403 Tcc0403 force-pushed the transformers-5.0.0rc1 branch from 545fb48 to 8e3f826 Compare February 7, 2026 11:29
… transformers v5 (#1061)

Fixes #1011

## Summary
- Fix `TypeError: 'NoneType' object is not subscriptable` error in Llava
convergence test caused by `image_outputs.hidden_states` returning
`None`
- Remove unnecessary `importlib.reload(modeling_clip)` from
`revert_liger_kernel_to_llava()` function

## Problem
In transformers v5, calling `importlib.reload(modeling_clip)` breaks
`CLIPVisionModel`'s `output_hidden_states` functionality. When the Llava
convergence test runs:

1. First run (without Liger): creates model, runs test, then calls
`revert_liger_kernel_to_llava()` which reloads `modeling_clip`
2. Second run (with Liger): creates new model, but
`CLIPVisionModel.forward()` now returns `hidden_states=None` even when
`output_hidden_states=True` is passed

This causes the error at:
```python
selected_image_feature = image_outputs.hidden_states[vision_feature_layer]
# TypeError: 'NoneType' object is not subscriptable
```

## Solution
Remove `importlib.reload(modeling_clip)` from
`revert_liger_kernel_to_llava()`. This is safe because:
- Liger kernel does not patch `modeling_clip` when `model=None` (which
is the case in convergence tests)
- Only `modeling_llava` and `modeling_llama` need to be reloaded to
revert Liger patches

## Test plan
- [x] `python -m pytest
test/convergence/bf16/test_mini_models_multimodal.py -k llava` passes
- [x] Verified `hidden_states` is no longer `None` after the fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants