Skip to content

Switch default copy strategy from source-stride-order to HPTT#119

Merged
shinaoka merged 2 commits intomainfrom
fix/use-hptt-default-copy
Feb 20, 2026
Merged

Switch default copy strategy from source-stride-order to HPTT#119
shinaoka merged 2 commits intomainfrom
fix/use-hptt-default-copy

Conversation

@shinaoka
Copy link
Member

Summary

  • Replace copy_strided_src_order with strided_kernel::copy_into_col_major (HPTT) in prepare_input_owned, matching what prepare_input_view already uses
  • Remove dead code: copy_strided_src_order and copy_strided_src_order_par (~170 lines)
  • Add benchmark documentation

Benchmark evidence

Direct comparison with copy elision disabled (opt_flops, 1T, AMD EPYC 7713P):

Instance src-order (ms) HPTT (ms) Winner
lm_brackets_4_4d 35 24 HPTT 31% faster
lm_sentence_4_4d 34 24 HPTT 29% faster
str_matrix_chain_100 20 14 HPTT 30% faster
mera_closed 1739 1567 HPTT 10% faster
mera_open 1129 1142 ~same
tn_focus 400 568 src 30% faster
tn_light 401 560 src 28% faster

HPTT wins on 8/10 instances. The two instances where src-order wins (tn_focus/tn_light) still outperform Julia's OMEinsum.jl thanks to copy elision.

Test plan

  • All tests pass (cargo test and cargo test --features parallel)
  • No warnings
  • HPTT+OpenBLAS config outperforms OMEinsum.jl on all instances (1T and 4T)

Closes #118

🤖 Generated with Claude Code

shinaoka and others added 2 commits February 20, 2026 12:16
Replace copy_strided_src_order with strided_kernel::copy_into_col_major
(HPTT) in prepare_input_owned, matching what prepare_input_view already
uses. Remove the now-dead copy_strided_src_order and
copy_strided_src_order_par functions.

Benchmarks show HPTT outperforms source-stride-order on 8/10 instances
(16-43% faster). Even on the two instances where source-order was faster
(tn_focus/tn_light), copy elision compensates and Rust still outperforms
Julia's OMEinsum.jl.

Closes #118

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shinaoka shinaoka merged commit 53fe0a3 into main Feb 20, 2026
5 checks passed
@shinaoka shinaoka deleted the fix/use-hptt-default-copy branch February 20, 2026 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch default copy strategy from source-stride-order to HPTT

1 participant