Skip to content

Triton dequantization/config framework#336

Draft
blepping wants to merge 22 commits intocity96:mainfrom
blepping:feat_optimized_dequant
Draft

Triton dequantization/config framework#336
blepping wants to merge 22 commits intocity96:mainfrom
blepping:feat_optimized_dequant

Conversation

@blepping
Copy link
Contributor

Continuation of #331. Sorry for making a new pull, I couldn't reopen the existing one since it had been force pushed.

Wrapping the existing dequant functions with torch.compile in my benchmark tool brought the results to parity with Triton: https://gist.github.com/blepping/963459b244a4140b081cebdec24c56b2

However, I was not able to replicate that in the real world so the Triton kernels are back!

This includes a generic approach to pass configuration parameters around, including overriding the dequant functions (potentially with compiled or Triton versions). There is an optimize parameter in the advanced loader that lets you choose between none, compile and triton. Enabling Triton for me is noticeably faster, using torch.compile is actually slower than nothing.

This also fixes an issue with the existing approach to setting configuration like dequant_dtype:

        ops = GGMLOps()

        if dequant_dtype in ("default", None):
            ops.Linear.dequant_dtype = None
        elif dequant_dtype in ["target"]:
            ops.Linear.dequant_dtype = dequant_dtype
        else:
            ops.Linear.dequant_dtype = getattr(torch, dequant_dtype)

This is not kosher because while ops is an instance of GGMLOps, attributes like ops.Linear are not instance specific, so setting it here will change that attribute in the global class. This means if there are multiple loaders with different settings, you will get the configuration of whatever loader overwrote that value last.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants