🎷 Modal Jazz

The spirit of jazz is the spirit of openness.

— Herbie Hancock, on software licensing

I’ll play it first and tell you what it is later.

— Miles Davis, on vibe-coding

This repository collects together a complete "open AI stack" -- everything you need to run a smart language model and the interfaces that help it complete useful tasks. It uses Modal.

Open Language Modeling Backend

The language model is z.ai's GLM 5.

It is run using:

Nvidia B200 GPUs
The Modal cloud deployment platform (project sponsor)
The SGLang inference server
The OpenAI-compatible API interface (based on /chat/completions).

To speed up the model weight downloading process, you'll need to add a Hugging Face access token stored as a Modal Secret.

For a single user, this achieves > 60 tok/s output.

You can also use a free multitenant endpoint from Modal. The endpoint is free until April 30, 2026. Users are limited to no more than one concurrent request. See the instructions there for the API URL and authentication information.

Open Frontends - `/frontends`

Agentic Coding TUI + WebUI - OpenCode

OpenCode is a terminal user interface for connecting human users, language models, and computer terminals, akin to Anthropic's Claude Code but with broader LLM API support.

We provide instructions for integrating the self-hosted LLM with OpenCode and for deploying OpenCode servers on Modal here

Agentic Assistant - OpenClaw

OpenClaw is an agentic assistant system designed for maximum integrability.

We provide instructions for integrating the self-hosted LLM with OpenClaw here.

Chat Web UI - AI SDK

The Vercel AI SDK offers both Core and UI sub SDKs for integrating JavaScript applications with LLMs.

We demonstrate a simple integration of this stack with the self-hosted LLM -- both a "hello world"-level integration with a NodeJS CLI here and a proper NextJS app here.

It is deployed here.

Chat CLI - `llm`

We like the llm CLI tool from Simon Willison for running quick LLM queries from the terminal.

It offers integration with OpenAI-compatible API providers, like our self-hosted LLM, via the same interface as OpenAI's models. Docs are here.

We demonstrate a small plugin in llm_show_reasoning that prints the LLM's reasoning output -- not available from OpenAI reasoning models, but available for open models. This reduces apparent latency.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
backend		backend
frontends		frontends
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎷 Modal Jazz

Open Language Modeling Backend

Open Frontends - `/frontends`

Agentic Coding TUI + WebUI - OpenCode

Agentic Assistant - OpenClaw

Chat Web UI - AI SDK

Chat CLI - `llm`

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎷 Modal Jazz

Open Language Modeling Backend

Open Frontends - /frontends

Agentic Coding TUI + WebUI - OpenCode

Agentic Assistant - OpenClaw

Chat Web UI - AI SDK

Chat CLI - llm

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Open Frontends - `/frontends`

Chat CLI - `llm`