perf(aggregation): Prevent cuda sync in normalize by ValerianRey · Pull Request #557 · SimplexLab/TorchJD

ValerianRey · 2026-02-04T14:38:10Z

With the previous implementation, normalize causes a cuda synchronization. It's not a big issue at all, because we need a cuda synchronization right after, during the call to project_weights, because it's always done on CPU. But I think it's good for three reasons:

Keep the number of cuda synchronizations minimal for performance reasons.
Avoiding synchronization in unexpected places makes it easier to analyze traces.
If we ever do UPGrad on cuda, it could make a big difference to avoid this cuda sync, because we may be down to 0 cuda sync for aggregation.

There might be a slight performance drop due to using torch.where which is element-wise with a condition that is scalar (and thus broadcasted). But since the gramian is never huge (especially if using UPGrad), this is really fine IMO. In my profiling, this torch.where takes 0.028 ms with batch size of 64.

So this is extremely minor but positive IMO.

codecov · 2026-02-04T14:40:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines	Coverage Δ
src/torchjd/_linalg/_gramian.py	`100.00% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PierreQuinton

Very nice, we should also keep that in mind because I doubt this is the only place we do something like that (the aggregators maybe).

ValerianRey added package: aggregation cc: perf Conventional commit type for changes mostly focused on performance improvements (memory or speed). labels Feb 4, 2026

ValerianRey self-assigned this Feb 4, 2026

ValerianRey requested a review from PierreQuinton February 4, 2026 14:38

perf(aggregation): Prevent cuda sync in normalize

fcd865a

ValerianRey force-pushed the prevent-cuda-sync-normalize branch from 7fb2c75 to fcd865a Compare February 4, 2026 14:39

PierreQuinton approved these changes Feb 5, 2026

View reviewed changes

Merge branch 'main' into prevent-cuda-sync-normalize

bdc566a

ValerianRey merged commit f30a835 into main Feb 5, 2026
15 checks passed

ValerianRey deleted the prevent-cuda-sync-normalize branch February 5, 2026 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(aggregation): Prevent cuda sync in normalize#557

perf(aggregation): Prevent cuda sync in normalize#557
ValerianRey merged 2 commits intomainfrom
prevent-cuda-sync-normalize

ValerianRey commented Feb 4, 2026

Uh oh!

codecov bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

PierreQuinton left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ValerianRey commented Feb 4, 2026

Uh oh!

codecov bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PierreQuinton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 4, 2026 •

edited

Loading