flash-moe is an external mesh-llm plugin for connecting mesh-llm to a
Flash-MoE OpenAI-compatible HTTP server.
The plugin is intentionally decoupled from mesh-llm internals. mesh-llm only
launches the plugin through the plugin API, passes [[plugin]].url as
MESH_LLM_PLUGIN_URL, and consumes the plugin-declared inference endpoint.
mesh-llm plugins install flash-moeYou can also install directly from GitHub:
mesh-llm plugins install Mesh-LLM/flash-moeIf Flash-MoE is already running, attach the endpoint:
[[plugin]]
name = "flash-moe"
url = "http://127.0.0.1:8000/v1"To let the plugin supervise a local Flash-MoE infer process, pass the backend
command and backend args to the plugin:
[[plugin]]
name = "flash-moe"
args = [
"--backend-command", "/opt/flash-moe/metal_infer/infer",
"--",
"--model", "/models/qwen3.5-397b/model.gguf",
"--weights", "/models/qwen3.5-397b/experts.bin",
"--manifest", "/models/qwen3.5-397b/manifest.json",
"--vocab", "/models/qwen3.5-397b/vocab.json"
]The plugin allocates a local port and appends:
--serve <port>
Do not pass --serve in backend args.
cargo buildGitHub releases package archives using the same contract as
Mesh-LLM/blackboard:
flash-moe-<target>.tar.gzor.zipflash-moe-<version>-<target>.tar.gzor.zip
Each archive contains:
flash-moe/flash-moeflash-moe/plugin.toml