Right now we have Q4_K GGUF, however FP4 is the way the model is shipped and is numerically different than Q4_K which is uniform quantization. Given that llama.cpp is able to do the inference of GPT120B OSS (I believe) fast enough, it should be possible to support at least as optional format the FP4 weights, in order to really do the inference of the Real Thing that DeepSeek shipped. I doubt there are large differences as even the 2 bit quants work well, but... still. Not a priority but something to remember it is worth investigating.
Right now we have Q4_K GGUF, however FP4 is the way the model is shipped and is numerically different than Q4_K which is uniform quantization. Given that llama.cpp is able to do the inference of GPT120B OSS (I believe) fast enough, it should be possible to support at least as optional format the FP4 weights, in order to really do the inference of the Real Thing that DeepSeek shipped. I doubt there are large differences as even the 2 bit quants work well, but... still. Not a priority but something to remember it is worth investigating.