Skip to content

[bugfix] fix gather_from_sp#63

Merged
Jintao-Huang merged 2 commits intomodelscope:mainfrom
Jintao-Huang:fix_gather_from_sp
May 5, 2026
Merged

[bugfix] fix gather_from_sp#63
Jintao-Huang merged 2 commits intomodelscope:mainfrom
Jintao-Huang:fix_gather_from_sp

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refines sequence parallel operations across several model files. Key updates include replacing reduce_scatter_to_sequence_parallel_region with scatter_to_sequence_parallel_region, which removes the need for manual scaling by the tensor model parallel size. Additionally, the gather_from_sequence_parallel_region calls now explicitly set tensor_parallel_output_grad=False, and forward passes in Qwen models are wrapped with a CUDA RNG tracker fork to ensure consistent randomness. I have no feedback to provide as there were no review comments to evaluate.

@Jintao-Huang Jintao-Huang merged commit dab4f16 into modelscope:main May 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant