Skip to content

Test new tf on axlearn for EFA#2116

Draft
Steboss wants to merge 11 commits into
mainfrom
sbosisio/test-axlearn-new-tf
Draft

Test new tf on axlearn for EFA#2116
Steboss wants to merge 11 commits into
mainfrom
sbosisio/test-axlearn-new-tf

Conversation

@Steboss
Copy link
Copy Markdown
Contributor

@Steboss Steboss commented May 20, 2026

No description provided.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 20, 2026

/ok to test 7665ae4

1 similar comment
@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 20, 2026

/ok to test 7665ae4

@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 20, 2026

/ok to test 7665ae4

@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 20, 2026

/ok to test e6dab45

@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 20, 2026

/ok to test bac0bf6

@Steboss Steboss requested review from aybchan and olupton May 20, 2026 13:33
olupton
olupton previously approved these changes May 20, 2026
Copy link
Copy Markdown
Collaborator

@olupton olupton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the test job passes

@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 20, 2026

/ok to test 9523b6d

Comment thread .github/workflows/_ci.yaml Outdated
Comment on lines 549 to 550
Copy link
Copy Markdown
Member

@aybchan aybchan May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Remove for MaxText on EKS as well

@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 20, 2026

/ok to test c2a0937

@Steboss Steboss requested a review from aybchan May 21, 2026 09:33
fsdp: 2
tensor-parallel: 2
envs: |-
OFI_NCCL_PROTOCOL=SENDRECV
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observed a significant regression for MaxText from > 260 TFLOP/s/device to < 20 TFLOPS/s/device without SENDRECV protocol is this expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah wait, I forget to make sure MaxText can have the latest TF. Working on it

@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 29, 2026

/ok to test 7b0d42b

@Steboss
Copy link
Copy Markdown
Contributor Author

Steboss commented May 29, 2026

/ok to test c56948a

@Steboss Steboss marked this pull request as draft May 29, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants