Fix RoPE dimension split issue for balanced head configurations

Currently using 32 RoPE dims + 0 NoPE dims instead of the intended 16/16 split due to HuggingFace utility function constraints. This means we're only using a single key head instead of the intended multi-head setup.

**Current Impact:**
- Key-value latent space is only being used for **value** heads
- Model uses only **1 key head** (instead of multiple) but still has 8 query heads
- Configuration: 8 query heads, 1 key head, 8 value heads

**Tasks:**
- [ ] Investigate the HuggingFace RoPE utility function error.
    - [x] Locate the source code (done, see references below).
    - [ ] Determine if we can get it working with the right config settings.
- [ ] If not, some options are:
  - [ ] Surgical patch to DeepSeekV3 code
  - [ ] Replace entire Attention class
  - [ ] Create Decoder variant of SubspaceEncoder
- [ ] Test that 16 RoPE / 16 NoPE configuration works correctly
- [ ] Compare performance with current 32/0 setup

**References:**
- The rope_type is `default`, [here](https://github.com/huggingface/transformers/blob/e7d351cebad5f6dcdd169b0c034fdee0a000e6a9/src/transformers/models/deepseek_v3/modeling_deepseek_v3.py#L66).
- The rope init function gets pulled in from [here](https://github.com/huggingface/transformers/blob/34595cf296b1eafce294fd9aa5f43cb53d014930/src/transformers/modeling_rope_utils.py#L382).
- This is the source of the problem [here](https://github.com/huggingface/transformers/blob/34595cf296b1eafce294fd9aa5f43cb53d014930/src/transformers/modeling_rope_utils.py#L113).
    - It looks like it's designed to support this functionality, but the config settings need to work right.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RoPE dimension split issue for balanced head configurations #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fix RoPE dimension split issue for balanced head configurations #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions