-
Notifications
You must be signed in to change notification settings - Fork 607
[Common] Enable determinism for cuDNN >= 9.18 on Blackwell #2584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR enables deterministic FusedAttention on Blackwell GPUs (SM 100+) for FP16/BF16 with cuDNN >= 9.18.0. Key Changes:
Implementation Notes:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant PyTorch/JAX
participant Backend Selection
participant cuDNN Frontend
participant Forward Pass
participant Backward Pass
User->>PyTorch/JAX: Set NVTE_ALLOW_NONDETERMINISTIC_ALGO
Note over User,PyTorch/JAX: 0=deterministic, 1=non-deterministic
PyTorch/JAX->>Backend Selection: get_fused_attn_backend(deterministic)
Note over Backend Selection: New parameter: deterministic
alt Blackwell (sm_arch >= 100) Training
Backend Selection->>Backend Selection: Check cuDNN version & constraints
alt Non-deterministic (cuDNN >= 9.7.0)
Note over Backend Selection: Requires: dropout=0 OR bias=NONE
else Deterministic (cuDNN >= 9.18.0)
Note over Backend Selection: Requires: dropout=0 AND bias=NONE
end
end
Backend Selection->>PyTorch/JAX: Return backend (arbitrary_seqlen or max512)
PyTorch/JAX->>Forward Pass: nvte_fused_attn_fwd(deterministic=false)
Note over Forward Pass: Always uses deterministic algorithm
Forward Pass->>cuDNN Frontend: Execute deterministic forward
cuDNN Frontend-->>Forward Pass: Return O, aux tensors
alt Training Mode
PyTorch/JAX->>Backward Pass: nvte_fused_attn_bwd(deterministic)
Note over Backward Pass: Uses actual deterministic flag
alt Deterministic
Backward Pass->>cuDNN Frontend: Execute deterministic backward (9.18+)
else Non-deterministic
Backward Pass->>cuDNN Frontend: Execute non-deterministic backward (9.7+)
end
cuDNN Frontend-->>Backward Pass: Return dQ, dK, dV
end
Backward Pass-->>User: Gradients
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Greptile OverviewGreptile SummaryOverviewThis PR enables determinism for FusedAttention on Blackwell GPUs (SM 100) with cuDNN version 9.18.0 or higher. The implementation moves determinism checking logic from Python to the C++ backend selection layer. Key Changes
ArchitectureThe change follows a layered approach:
The implementation correctly restricts deterministic FusedAttention to cases where cuDNN guarantees deterministic behavior, avoiding silent non-determinism. Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User as User/Test
participant PyAPI as Python API
participant Utils as utils.py
participant CppExt as C++ Extensions
participant Backend as Backend Selection
participant cuDNN as cuDNN Library
User->>PyAPI: Call attention with deterministic=True
PyAPI->>Utils: get_attention_backend(params)
Utils->>Utils: Extract deterministic from params
Utils->>CppExt: get_fused_attn_backend(..., deterministic)
CppExt->>Backend: nvte_get_fused_attn_backend(..., deterministic)
alt Blackwell (sm_arch >= 100) & Training & Deterministic
Backend->>Backend: Check cuDNN version >= 9.18.0
Backend->>Backend: Check bias_type == NO_BIAS
Backend->>Backend: Check dropout == 0.0
alt All checks pass
Backend-->>CppExt: F16_arbitrary_seqlen backend
else Any check fails
Backend-->>CppExt: No_Backend (disabled)
end
else Other architectures or inference
Backend->>Backend: Apply standard backend selection
Backend-->>CppExt: Selected backend
end
CppExt-->>Utils: Backend choice
Utils-->>PyAPI: Backend configuration
alt Forward Pass
PyAPI->>CppExt: nvte_fused_attn_fwd(..., deterministic=true)
Note over PyAPI,CppExt: Forward always uses deterministic=true
else Backward Pass
PyAPI->>CppExt: nvte_fused_attn_bwd(..., deterministic)
Note over PyAPI,CppExt: Backward respects user's deterministic flag
end
CppExt->>cuDNN: Execute attention operation
cuDNN-->>CppExt: Results
CppExt-->>PyAPI: Output tensors
PyAPI-->>User: Attention output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
make .xml file specific to deterministic tests in qa/ Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
fix typo Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
fix indentation Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
|
/te-ci L0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
Greptile's behavior is changing!From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
|
/te-ci L0 |
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
|
/te-ci jax L0 |
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
/te-ci L0 |
|
/te-ci L1 |
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
|
/te-ci L1 |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
13 files reviewed, 3 comments
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
fix and/or logic Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
|
/te-ci L1 |
|
Cool, we are currently suffering from this issue. |
Description
This PR enables determinism for
FusedAttentionon Blackwell for FP16/BF16 precisions and cuDNN >= 9.18.0.To run with determinism, please set this flag:
export NVTE_ALLOW_NONDETERMINISTIC_ALGO=0.Type of change
Changes
Please see Description.
Checklist: