Triton dequantization/config framework#336
Draft
blepping wants to merge 22 commits intocity96:mainfrom
Draft
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Continuation of #331. Sorry for making a new pull, I couldn't reopen the existing one since it had been force pushed.
Wrapping the existing dequant functions with
torch.compilein my benchmark tool brought the results to parity with Triton: https://gist.github.com/blepping/963459b244a4140b081cebdec24c56b2However, I was not able to replicate that in the real world so the Triton kernels are back!
This includes a generic approach to pass configuration parameters around, including overriding the dequant functions (potentially with compiled or Triton versions). There is an
optimizeparameter in the advanced loader that lets you choose betweennone,compileandtriton. Enabling Triton for me is noticeably faster, usingtorch.compileis actually slower than nothing.This also fixes an issue with the existing approach to setting configuration like
dequant_dtype:This is not kosher because while
opsis an instance ofGGMLOps, attributes likeops.Linearare not instance specific, so setting it here will change that attribute in the global class. This means if there are multiple loaders with different settings, you will get the configuration of whatever loader overwrote that value last.