-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Description
Do you have any benchmarks showing where the extra overhead of mx.matmul over a regular matmul is? Is it in the quantization step (calculating scales, rounding, etc.)? If so, do you know if devices with MX support will do this rounding in the hardware itself, and if so, will the overhead become negligible there because of the hardware support?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels