Wonder why it can't be directly used with llama.cpp? Also, it would be great to have multi-GPU support.
Wonder why it can't be directly used with llama.cpp?
Also, it would be great to have multi-GPU support.