Skip to content

Mesh-LLM/flash-moe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

flash-moe

flash-moe is an external mesh-llm plugin for connecting mesh-llm to a Flash-MoE OpenAI-compatible HTTP server.

The plugin is intentionally decoupled from mesh-llm internals. mesh-llm only launches the plugin through the plugin API, passes [[plugin]].url as MESH_LLM_PLUGIN_URL, and consumes the plugin-declared inference endpoint.

Install

mesh-llm plugins install flash-moe

You can also install directly from GitHub:

mesh-llm plugins install Mesh-LLM/flash-moe

Existing Endpoint Mode

If Flash-MoE is already running, attach the endpoint:

[[plugin]]
name = "flash-moe"
url = "http://127.0.0.1:8000/v1"

Managed Process Mode

To let the plugin supervise a local Flash-MoE infer process, pass the backend command and backend args to the plugin:

[[plugin]]
name = "flash-moe"
args = [
  "--backend-command", "/opt/flash-moe/metal_infer/infer",
  "--",
  "--model", "/models/qwen3.5-397b/model.gguf",
  "--weights", "/models/qwen3.5-397b/experts.bin",
  "--manifest", "/models/qwen3.5-397b/manifest.json",
  "--vocab", "/models/qwen3.5-397b/vocab.json"
]

The plugin allocates a local port and appends:

--serve <port>

Do not pass --serve in backend args.

Build

cargo build

Release Archives

GitHub releases package archives using the same contract as Mesh-LLM/blackboard:

  • flash-moe-<target>.tar.gz or .zip
  • flash-moe-<version>-<target>.tar.gz or .zip

Each archive contains:

  • flash-moe/flash-moe
  • flash-moe/plugin.toml

About

Flash-MoE inference endpoint plugin for mesh-llm

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages