Thanks for sharing your work!
I notice that under /examples there are few MoE models, config.json indicates the expert placement for different expert parallelism degree. However, I wonder how can I use these these configs to inference models. I presume HF transformers can be used to load model config and ModelForCausalLM, but how should I load model weights from .safetensors? Since moe layers for Occult are self-modified nn.Module. So I am wondering whether you can provide inference codes for models under /examples with data parallelism and expert parallelism supported.
Looking forward to your reply.
Thanks for sharing your work!
I notice that under /examples there are few MoE models, config.json indicates the expert placement for different expert parallelism degree. However, I wonder how can I use these these configs to inference models. I presume HF transformers can be used to load model config and ModelForCausalLM, but how should I load model weights from .safetensors? Since moe layers for Occult are self-modified nn.Module. So I am wondering whether you can provide inference codes for models under /examples with data parallelism and expert parallelism supported.
Looking forward to your reply.