Skip to content

docs: permutation optimization analysis + use strided-perm in finalize_into#113

Merged
shinaoka merged 1 commit intomainfrom
docs/permutation-optimization
Feb 19, 2026
Merged

docs: permutation optimization analysis + use strided-perm in finalize_into#113
shinaoka merged 1 commit intomainfrom
docs/permutation-optimization

Conversation

@shinaoka
Copy link
Member

Summary

  • Add docs/permutation-optimization.md documenting the lazy vs eager permutation tradeoff
  • Change einsum2::finalize_into to use strided_perm::copy_into directly (HPTT-optimized writeback)

Documentation covers

  • Lazy vs eager permutation: strided-rs (metadata-only) vs OMEinsum.jl (always materialize)
  • HPTT-inspired strided-perm: bilateral fusion, 2D micro-kernel, macro-kernel blocking
  • Current strategy: lazy + fast HPTT copy when materialization is needed
  • Open questions: always-materialize heuristic, two-stage permutation for scattered sources
  • Benchmark results: strided-rs now competitive with OMEinsum.jl (faster at 4T)

Code change

One-line change in strided-einsum2/src/contiguous.rs:

// Before: strided_kernel::copy_into (falls back to non-HPTT map_into)
// After:  strided_perm::copy_into (HPTT-optimized)

Test plan

  • cargo test -p strided-einsum2 (84 tests pass)
  • cargo test -p strided-opteinsum (all pass)
  • Benchmark improvement verified (tensornetwork_permutation_light_415: 277->208ms 1T, 206->142ms 4T)

🤖 Generated with Claude Code

…inalize_into

Add docs/permutation-optimization.md summarizing the lazy vs eager permutation
tradeoff, the HPTT-inspired strided-perm implementation, and open questions
(always-materialize heuristic, two-stage permutation).

Also includes the one-line change in einsum2 finalize_into to call
strided_perm::copy_into directly, ensuring HPTT optimization is used for
the GEMM writeback path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shinaoka shinaoka merged commit 3d75a9b into main Feb 19, 2026
5 checks passed
@shinaoka shinaoka deleted the docs/permutation-optimization branch February 19, 2026 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant