pygbnf

A composable Python DSL for building GBNF grammars compatible with llama.cpp.

Define context-free grammars using expressive Python functions,
Compile them into valid GBNF strings for constrained LLM generation.
Real-time rule matching during inference.

Installation

pip install pygbnf          # core DSL only
pip install pygbnf[llm]     # + openai (for GrammarLLM)
pip install pygbnf[all]     # everything

For grammar visualization (DOT / SVG export), install Graphviz:

brew install graphviz   # macOS
apt install graphviz    # Debian / Ubuntu

Quick Start

Start llama-server with your favorite GGUF model.

$ llama-server -m LFM2-8B-A1B-Q4_K_M.gguf

Build grammar and constraint the model.

from pygbnf import Grammar, GrammarLLM, select

g = Grammar()

@g.rule
def answer():
    return select(["yes", "no", "maybe"])

g.start("answer")

llm = GrammarLLM("http://localhost:8080/v1")

text, _ = llm.complete(
    messages=[{"role": "user", "content": "Is the sky blue?"}],
    grammar=g
)

print(text)

The grammar constrains the LLM output — it can only produce yes, no, or maybe.

Guidance-Style GBNF

import pygbnf as cfg
from pygbnf import select, one_or_more, zero_or_more

g = cfg.Grammar()

@g.rule
def number():
    n = one_or_more(select("0123456789"))
    return select(['-' + n, n])

@g.rule
def operator():
    return select(['+', '*', '**', '/', '-'])

@g.rule
def expression():
    return select([
        number(),
        expression() + zero_or_more(" ") + operator()
            + zero_or_more(" ") + expression(),
        "(" + expression() + ")"
    ])

g.start("expression")
print(g.to_gbnf())

Output:

root ::= expression

number ::= "-" [0123456789]+ | [0123456789]+
operator ::= "+" | "*" | "**" | "/" | "-"
expression ::=
    number
  | expression " "* operator " "* expression
  | "(" expression ")"

LLM Usage

pygbnf includes GrammarLLM, a thin wrapper around any OpenAI-compatible endpoint (llama.cpp, vLLM, Ollama…) that injects the GBNF grammar automatically.

Streaming with rule matching

Enable match=True (or pass only/exclude) to get real-time RuleEvents as the LLM generates tokens:

from pygbnf import Grammar, GrammarLLM, select, one_or_more

g = Grammar()

@g.rule
def name():
    """A person's name."""
    return one_or_more(select("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "))

@g.rule
def greeting():
    """A greeting message."""
    return select(["hello", "hi", "hey"]) + " " + name()

g.start("greeting")

llm = GrammarLLM("http://localhost:8080/v1")

for token, events in llm.stream(
    messages=[{"role": "user", "content": "Greet Alice."}],
    grammar=g,
    match=True,
):
    print(token, end="", flush=True)
    if events:
        for ev in events:
            print(f"\n  ← [{ev.rule}] {ev.text!r} (doc: {ev.doc})")
print()

Each RuleEvent carries:

rule — the matched rule name
text — the matched text
fn — the original Python function
doc — the function's docstring

Non-streaming completion

text, events = llm.complete(
    messages=[{"role": "user", "content": "Is the sky blue?"}],
    grammar=g,
    match=True,
)
print(text)
for ev in events:
    print(f"  [{ev.rule}] {ev.text!r}")

Schema-based grammar with LLM

Combine grammar_from_type with GrammarLLM to constrain output to a JSON schema:

from dataclasses import dataclass
from pygbnf import grammar_from_type, GrammarLLM

@dataclass
class City:
    name: str
    country: str
    population: int

g = grammar_from_type(City)
llm = GrammarLLM("http://localhost:8080/v1")

text, _ = llm.complete(
    messages=[{"role": "user", "content": "Describe Tokyo in JSON."}],
    grammar=g,
)
print(text)
# → {"name": "Tokyo", "country": "Japan", "population": 13960000}

Tool calling with Toolkit

Toolkit is a decorator-based tool registry. Register functions with @toolkit.tool, then pass the toolkit to llm.stream() or llm.complete() — the grammar and system prompt are injected automatically.

import enum
from pygbnf import GrammarLLM, Toolkit

toolkit = Toolkit()

class Units(enum.Enum):
    CELSIUS = "celsius"
    FAHRENHEIT = "fahrenheit"

@toolkit.tool
def get_weather(city: str, units: Units = Units.CELSIUS) -> str:
    """Get current weather for a city."""
    return f"22° {units.value} in {city}"

@toolkit.tool
def search_web(query: str, max_results: int = 5) -> str:
    """Search the web."""
    return f"Found {max_results} results for {query!r}"

llm = GrammarLLM("http://localhost:8080/v1")

# Stream with toolkit — grammar + system prompt auto-injected
result = ""
for token, _ in llm.stream(
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    toolkit=toolkit,
):
    print(token, end="", flush=True)
    result += token

# Dispatch the JSON result to the matching function
output = toolkit.dispatch(result)
print(output)  # → "22° celsius in Tokyo"

The toolkit:

Builds a GBNF grammar constraining the LLM to produce {"function": "...", "arguments": {...}} with only registered tool names and typed arguments
Generates a system prompt listing available tools with signatures and docstrings
Dispatches the parsed JSON to the right function, converting enum strings back to Python Enum instances automatically

You can also use llm.tool_call() as a one-liner that streams + dispatches:

output = llm.tool_call(toolkit, "Weather in Tokyo?")
print(output)  # → "22° celsius in Tokyo"

Note: GrammarLLM requires the openai package: pip install openai. The LLM server must support the grammar field in its API (llama.cpp does natively).

Architecture

AST Nodes

Every grammar construct is a frozen dataclass node. Nodes compose via + (sequence) and | (alternative):

Node	Description	GBNF
`Literal`	Double-quoted string	`"hello"`
`CharacterClass`	Character class	`[0-9]`
`Sequence`	Ordered concatenation	`a b c`
`Alternative`	Choice between options	`a \| b \| c`
`Repeat`	Quantified repetition	`x+`, `x*`, `x?`, `x{2,5}`
`RuleReference`	Reference to named rule	`expression`
`TokenReference`	Token-level constraint	`<think>`, `<[1000]>`
`Group`	Parenthesised group	`(a b)`
`Optional_`	Optional element	`x?`

DSL Combinators

from pygbnf import select, one_or_more, zero_or_more, optional, repeat, group

# Character class from string
select("0123456789")          # → [0123456789]

# Alternative from list
select(["+", "-", "*"])       # → "+" | "-" | "*"

# Repetition
one_or_more(x)                # → x+
zero_or_more(x)               # → x*
optional(x)                   # → x?
repeat(x, 2, 5)              # → x{2,5}

# Grouping
group(a + b)                  # → (a b)

# Operators
a + b                         # → a b   (sequence)
a | b                         # → a | b (alternative)

Rule Definition

Rules are defined with the @g.rule decorator. Calling a rule function inside another rule creates a rule reference (not an inline expansion):

g = cfg.Grammar()

@g.rule
def digit():
    return select("0123456789")

@g.rule
def number():
    return one_or_more(digit())  # → digit+  (reference, not inlined)

Forward references work naturally — rules can reference rules defined later.

Token Constraints

llama.cpp supports token-level matching:

from pygbnf import token, token_id, not_token, not_token_id

token("think")        # → <think>
token_id(1000)        # → <[1000]>
not_token("think")    # → !<think>
not_token_id(1001)    # → !<[1001]>

Grammar Helpers

Common patterns prebuilt:

from pygbnf import (
    WS, ws, ws_required,           # whitespace
    keyword, identifier, number,    # basic tokens
    float_number, string_literal,   # complex tokens
    comma_list, between,           # structural patterns
    separated_by, spaced_comma_list,
)

comma_list(identifier())   # → ident ("," " "* ident)*
between("(", expr, ")")    # → "(" expr ")"

Recursion Analysis

Detect left recursion in your grammar:

cycles = g.detect_left_recursion()
# Warns: "Left recursion detected: expression -> expression"
# Suggests: rewrite as base (op base)*

Examples

See the examples/ directory:

File	Description
`quickstart.py`	The quick-start example from this README
`arithmetic.py`	Arithmetic expressions with operator precedence
`csv_grammar.py`	CSV file format
`json_grammar.py`	Full JSON grammar
`simple_lang.py`	A small programming language
`token_demo.py`	Token-level constraints
`demo_schema.py`	Schema → grammar examples
`demo_enum_select.py`	Enum-based selection
`demo_simple_lang.py`	Mini-language generation with LLM
`demo_vision.py`	Vision + grammar: solve math from an image
`demo_visualization.py`	Export grammar NFA as DOT / SVG

Run any example:

python examples/arithmetic.py

Schema Generation

Auto-generate grammars from Python types and dataclasses:

from dataclasses import dataclass
from pygbnf import grammar_from_type

@dataclass
class Movie:
    title: str
    year: int
    rating: float

g = grammar_from_type(Movie)
print(g.to_gbnf())

Also supports function signatures:

from pygbnf import grammar_from_args

def search(query: str, limit: int = 10):
    ...

g = grammar_from_args(search)
print(g.to_gbnf())

Visualization

Export any grammar as an NFA diagram in DOT or SVG format:

import pygbnf as cfg
from pygbnf import select, one_or_more, optional
from pygbnf.visualization import write_grammar_svg

g = cfg.Grammar()

@g.rule
def number():
    return optional("-") + one_or_more(select("0123456789"))

@g.rule
def operator():
    return select(["+", "-", "*", "/"])

@g.rule
def expression():
    atom = select([number(), "(" + expression() + ")"])
    return atom + cfg.zero_or_more(cfg.group(" " + operator() + " " + expression()))

g.start("expression")

# Generates .dot + .svg (requires Graphviz)
write_grammar_svg(g, "arithmetic.svg")

When rule_names is omitted, only user-defined rules are included (auto-generated infrastructure rules like ws, json-string, etc. are filtered out).

Requirements

Python 3.8+
Optional: openai>=1.0 for GrammarLLM (pip install pygbnf[llm])
Optional: Graphviz CLI for SVG rendering

Acknowledgements

guidance-ai — pygbnf's composable API is inspired by their approach to constrained generation
llama.cpp — for the GBNF format and the underlying inference engine

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
examples		examples
pygbnf		pygbnf
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
arithmetic_nfa.dot		arithmetic_nfa.dot
arithmetic_nfa.svg		arithmetic_nfa.svg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pygbnf

Installation

Quick Start

Guidance-Style GBNF

LLM Usage

Streaming with rule matching

Non-streaming completion

Schema-based grammar with LLM

Tool calling with Toolkit

Architecture

AST Nodes

DSL Combinators

Rule Definition

Token Constraints

Grammar Helpers

Recursion Analysis

Examples

Schema Generation

Visualization

Requirements

Acknowledgements

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pygbnf

Installation

Quick Start

Guidance-Style GBNF

LLM Usage

Streaming with rule matching

Non-streaming completion

Schema-based grammar with LLM

Tool calling with Toolkit

Architecture

AST Nodes

DSL Combinators

Rule Definition

Token Constraints

Grammar Helpers

Recursion Analysis

Examples

Schema Generation

Visualization

Requirements

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages