Skip to content

Sadly new nodes still recompile constantly. #19

@Ph0rk0z

Description

@Ph0rk0z

On turing, I have to set the orig_dtype to float16 so it doesn't complain about bfloat to half conversion.. I guess it doesn't get set in forward. That's easily fixable though. Unfortunately every run of the node causes recompile so it's about 90s at a time. If I run it without compile, it's not as fast as bob's node with compilation.

I already went in and exposed only one triton autotune config so it's not related to that. Something breaks graphs. There is some partial offload due to TE+Chroma not fully fitting in 22gb of vram but bob's node can handle it. It's a difference between 8.x second on new prompts and 10.x seconds with the same workflow.

Outputs from TE is now improved though and tensorwise models run without NaN on quantops. Dynamic vram is globally disabled and in the nodes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions