A composable Python DSL for building GBNF grammars compatible with llama.cpp.
-
Define context-free grammars using expressive Python functions,
-
Compile them into valid
GBNF strings for constrained LLM generation. -
Real-time rule matching during inference.
pip install pygbnf # core DSL only
pip install pygbnf[llm] # + openai (for GrammarLLM)
pip install pygbnf[all] # everythingFor grammar visualization (DOT / SVG export), install Graphviz:
brew install graphviz # macOS
apt install graphviz # Debian / UbuntuStart llama-server with your favorite GGUF model.
$ llama-server -m LFM2-8B-A1B-Q4_K_M.gguf
Build grammar and constraint the model.
from pygbnf import Grammar, GrammarLLM, select
g = Grammar()
@g.rule
def answer():
return select(["yes", "no", "maybe"])
g.start("answer")
llm = GrammarLLM("http://localhost:8080/v1")
text, _ = llm.complete(
messages=[{"role": "user", "content": "Is the sky blue?"}],
grammar=g
)
print(text)The grammar constrains the LLM output — it can only produce yes, no, or maybe.
import pygbnf as cfg
from pygbnf import select, one_or_more, zero_or_more
g = cfg.Grammar()
@g.rule
def number():
n = one_or_more(select("0123456789"))
return select(['-' + n, n])
@g.rule
def operator():
return select(['+', '*', '**', '/', '-'])
@g.rule
def expression():
return select([
number(),
expression() + zero_or_more(" ") + operator()
+ zero_or_more(" ") + expression(),
"(" + expression() + ")"
])
g.start("expression")
print(g.to_gbnf())Output:
root ::= expression
number ::= "-" [0123456789]+ | [0123456789]+
operator ::= "+" | "*" | "**" | "/" | "-"
expression ::=
number
| expression " "* operator " "* expression
| "(" expression ")"
pygbnf includes GrammarLLM, a thin wrapper around any OpenAI-compatible endpoint (llama.cpp, vLLM, Ollama…) that injects the GBNF grammar automatically.
Enable match=True (or pass only/exclude) to get real-time RuleEvents as the LLM generates tokens:
from pygbnf import Grammar, GrammarLLM, select, one_or_more
g = Grammar()
@g.rule
def name():
"""A person's name."""
return one_or_more(select("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "))
@g.rule
def greeting():
"""A greeting message."""
return select(["hello", "hi", "hey"]) + " " + name()
g.start("greeting")
llm = GrammarLLM("http://localhost:8080/v1")
for token, events in llm.stream(
messages=[{"role": "user", "content": "Greet Alice."}],
grammar=g,
match=True,
):
print(token, end="", flush=True)
if events:
for ev in events:
print(f"\n ← [{ev.rule}] {ev.text!r} (doc: {ev.doc})")
print()Each RuleEvent carries:
rule— the matched rule nametext— the matched textfn— the original Python functiondoc— the function's docstring
text, events = llm.complete(
messages=[{"role": "user", "content": "Is the sky blue?"}],
grammar=g,
match=True,
)
print(text)
for ev in events:
print(f" [{ev.rule}] {ev.text!r}")Combine grammar_from_type with GrammarLLM to constrain output to a JSON schema:
from dataclasses import dataclass
from pygbnf import grammar_from_type, GrammarLLM
@dataclass
class City:
name: str
country: str
population: int
g = grammar_from_type(City)
llm = GrammarLLM("http://localhost:8080/v1")
text, _ = llm.complete(
messages=[{"role": "user", "content": "Describe Tokyo in JSON."}],
grammar=g,
)
print(text)
# → {"name": "Tokyo", "country": "Japan", "population": 13960000}Toolkit is a decorator-based tool registry. Register functions with @toolkit.tool, then pass the toolkit to llm.stream() or llm.complete() — the grammar and system prompt are injected automatically.
import enum
from pygbnf import GrammarLLM, Toolkit
toolkit = Toolkit()
class Units(enum.Enum):
CELSIUS = "celsius"
FAHRENHEIT = "fahrenheit"
@toolkit.tool
def get_weather(city: str, units: Units = Units.CELSIUS) -> str:
"""Get current weather for a city."""
return f"22° {units.value} in {city}"
@toolkit.tool
def search_web(query: str, max_results: int = 5) -> str:
"""Search the web."""
return f"Found {max_results} results for {query!r}"
llm = GrammarLLM("http://localhost:8080/v1")
# Stream with toolkit — grammar + system prompt auto-injected
result = ""
for token, _ in llm.stream(
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
toolkit=toolkit,
):
print(token, end="", flush=True)
result += token
# Dispatch the JSON result to the matching function
output = toolkit.dispatch(result)
print(output) # → "22° celsius in Tokyo"The toolkit:
- Builds a GBNF grammar constraining the LLM to produce
{"function": "...", "arguments": {...}}with only registered tool names and typed arguments - Generates a system prompt listing available tools with signatures and docstrings
- Dispatches the parsed JSON to the right function, converting enum strings back to Python
Enuminstances automatically
You can also use llm.tool_call() as a one-liner that streams + dispatches:
output = llm.tool_call(toolkit, "Weather in Tokyo?")
print(output) # → "22° celsius in Tokyo"Note:
GrammarLLMrequires theopenaipackage:pip install openai. The LLM server must support thegrammarfield in its API (llama.cpp does natively).
Every grammar construct is a frozen dataclass node. Nodes compose via + (sequence) and | (alternative):
| Node | Description | GBNF |
|---|---|---|
Literal |
Double-quoted string | "hello" |
CharacterClass |
Character class | [0-9] |
Sequence |
Ordered concatenation | a b c |
Alternative |
Choice between options | a | b | c |
Repeat |
Quantified repetition | x+, x*, x?, x{2,5} |
RuleReference |
Reference to named rule | expression |
TokenReference |
Token-level constraint | <think>, <[1000]> |
Group |
Parenthesised group | (a b) |
Optional_ |
Optional element | x? |
from pygbnf import select, one_or_more, zero_or_more, optional, repeat, group
# Character class from string
select("0123456789") # → [0123456789]
# Alternative from list
select(["+", "-", "*"]) # → "+" | "-" | "*"
# Repetition
one_or_more(x) # → x+
zero_or_more(x) # → x*
optional(x) # → x?
repeat(x, 2, 5) # → x{2,5}
# Grouping
group(a + b) # → (a b)
# Operators
a + b # → a b (sequence)
a | b # → a | b (alternative)Rules are defined with the @g.rule decorator. Calling a rule function inside another rule creates a rule reference (not an inline expansion):
g = cfg.Grammar()
@g.rule
def digit():
return select("0123456789")
@g.rule
def number():
return one_or_more(digit()) # → digit+ (reference, not inlined)Forward references work naturally — rules can reference rules defined later.
llama.cpp supports token-level matching:
from pygbnf import token, token_id, not_token, not_token_id
token("think") # → <think>
token_id(1000) # → <[1000]>
not_token("think") # → !<think>
not_token_id(1001) # → !<[1001]>Common patterns prebuilt:
from pygbnf import (
WS, ws, ws_required, # whitespace
keyword, identifier, number, # basic tokens
float_number, string_literal, # complex tokens
comma_list, between, # structural patterns
separated_by, spaced_comma_list,
)
comma_list(identifier()) # → ident ("," " "* ident)*
between("(", expr, ")") # → "(" expr ")"Detect left recursion in your grammar:
cycles = g.detect_left_recursion()
# Warns: "Left recursion detected: expression -> expression"
# Suggests: rewrite as base (op base)*See the examples/ directory:
| File | Description |
|---|---|
quickstart.py |
The quick-start example from this README |
arithmetic.py |
Arithmetic expressions with operator precedence |
csv_grammar.py |
CSV file format |
json_grammar.py |
Full JSON grammar |
simple_lang.py |
A small programming language |
token_demo.py |
Token-level constraints |
demo_schema.py |
Schema → grammar examples |
demo_enum_select.py |
Enum-based selection |
demo_simple_lang.py |
Mini-language generation with LLM |
demo_vision.py |
Vision + grammar: solve math from an image |
demo_visualization.py |
Export grammar NFA as DOT / SVG |
Run any example:
python examples/arithmetic.pyAuto-generate grammars from Python types and dataclasses:
from dataclasses import dataclass
from pygbnf import grammar_from_type
@dataclass
class Movie:
title: str
year: int
rating: float
g = grammar_from_type(Movie)
print(g.to_gbnf())Also supports function signatures:
from pygbnf import grammar_from_args
def search(query: str, limit: int = 10):
...
g = grammar_from_args(search)
print(g.to_gbnf())Export any grammar as an NFA diagram in DOT or SVG format:
import pygbnf as cfg
from pygbnf import select, one_or_more, optional
from pygbnf.visualization import write_grammar_svg
g = cfg.Grammar()
@g.rule
def number():
return optional("-") + one_or_more(select("0123456789"))
@g.rule
def operator():
return select(["+", "-", "*", "/"])
@g.rule
def expression():
atom = select([number(), "(" + expression() + ")"])
return atom + cfg.zero_or_more(cfg.group(" " + operator() + " " + expression()))
g.start("expression")
# Generates .dot + .svg (requires Graphviz)
write_grammar_svg(g, "arithmetic.svg")When rule_names is omitted, only user-defined rules are included (auto-generated infrastructure rules like ws, json-string, etc. are filtered out).
- Python 3.8+
- Optional:
openai>=1.0forGrammarLLM(pip install pygbnf[llm]) - Optional: Graphviz CLI for SVG rendering
- guidance-ai — pygbnf's composable API is inspired by their approach to constrained generation
- llama.cpp — for the GBNF format and the underlying inference engine