Skip to content

refactor(kernels): deepen primitive standalone layer#1

Merged
LessUp merged 1 commit into
masterfrom
refactor/primitive-layer-deepening
May 22, 2026
Merged

refactor(kernels): deepen primitive standalone layer#1
LessUp merged 1 commit into
masterfrom
refactor/primitive-layer-deepening

Conversation

@LessUp
Copy link
Copy Markdown
Collaborator

@LessUp LessUp commented May 22, 2026

Summary

  • centralize standalone primitive wrapper validation and kernel launch helpers
  • unify FP32/FP16 conversion with TypeAdapter across tile I/O and typed kernels
  • add regression coverage for online_softmax cross-warp correctness and matmul 64x64x128 launch

Test Plan

  • git diff --check
  • cmake --preset release (blocked locally: nvcc/CUDA toolkit not installed on this machine)
  • ctest --preset release (blocked locally: configure cannot complete without nvcc)

- centralize standalone primitive wrapper validation and launch helpers
- unify FP32/FP16 conversion through TypeAdapter
- add online_softmax and matmul regression coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@LessUp LessUp merged commit cc298de into master May 22, 2026
@LessUp LessUp deleted the refactor/primitive-layer-deepening branch May 22, 2026 03:12
LessUp added a commit that referenced this pull request May 22, 2026
- centralize standalone primitive wrapper validation and launch helpers
- unify FP32/FP16 conversion through TypeAdapter
- add online_softmax and matmul regression coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant