Structured JSON extraction from any LLM. Schema-enforced, Pydantic-native, multi-provider.
Prompture is a Python library that turns LLM responses into validated, structured data. Define a schema or Pydantic model, point it at any provider, and get typed output back — with token tracking, cost calculation, and automatic JSON repair built in.
from pydantic import BaseModel
from prompture import extract_with_model
class Person(BaseModel):
name: str
age: int
profession: str
person = extract_with_model(Person, "Maria is 32, a developer in NYC.", model_name="openai/gpt-4")
print(person.name) # Maria- Structured output — JSON schema enforcement and direct Pydantic model population
- 12 providers — OpenAI, Claude, Google, Groq, Grok, Azure, Ollama, LM Studio, OpenRouter, HuggingFace, AirLLM, and generic HTTP
- TOON input conversion — 45-60% token savings when sending structured data via Token-Oriented Object Notation
- Stepwise extraction — Per-field prompts with smart type coercion (shorthand numbers, multilingual booleans, dates)
- Field registry — 50+ predefined extraction fields with template variables and Pydantic integration
- Conversations — Stateful multi-turn sessions with sync and async support
- Tool use — Function calling and streaming across supported providers, with automatic prompt-based simulation for models without native tool support
- Caching — Built-in response cache with memory, SQLite, and Redis backends
- Plugin system — Register custom drivers via entry points
- Usage tracking — Token counts and cost calculation on every call
- Auto-repair — Optional second LLM pass to fix malformed JSON
- Batch testing — Spec-driven suites to compare models side by side
Projects powered by Prompture at their core:
- CachiBot — AI-powered bot built on Prompture's structured extraction and multi-provider driver system
- AgentSite — Agent-driven web platform using Prompture for LLM orchestration and structured output
pip install promptureOptional extras:
pip install prompture[redis] # Redis cache backend
pip install prompture[serve] # FastAPI server mode
pip install prompture[airllm] # AirLLM local inferenceSet API keys for the providers you use. Prompture reads from environment variables or a .env file:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
GROQ_API_KEY=...
GROK_API_KEY=...
OPENROUTER_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_API_KEY=...Local providers (Ollama, LM Studio) work out of the box with no keys required.
Pass API keys at runtime via ProviderEnvironment — useful for multi-tenant apps, web backends, or anywhere you don't want to set os.environ:
from prompture import AsyncAgent, ProviderEnvironment
env = ProviderEnvironment(
openai_api_key="sk-...",
claude_api_key="sk-ant-...",
)
agent = AsyncAgent("openai/gpt-4o", env=env)
result = await agent.run("Hello!")Works on Agent, AsyncAgent, Conversation, and AsyncConversation.
Model strings use "provider/model" format. The provider prefix routes to the correct driver automatically.
| Provider | Example Model | Cost |
|---|---|---|
openai |
openai/gpt-4 |
Automatic |
claude |
claude/claude-3 |
Automatic |
google |
google/gemini-1.5-pro |
Automatic |
groq |
groq/llama2-70b-4096 |
Automatic |
grok |
grok/grok-4-fast-reasoning |
Automatic |
azure |
azure/deployed-name |
Automatic |
openrouter |
openrouter/anthropic/claude-2 |
Automatic |
ollama |
ollama/llama3.1:8b |
Free (local) |
lmstudio |
lmstudio/local-model |
Free (local) |
huggingface |
hf/model-name |
Free (local) |
http |
http/self-hosted |
Free |
Single LLM call, returns a validated Pydantic instance:
from typing import List, Optional
from pydantic import BaseModel
from prompture import extract_with_model
class Person(BaseModel):
name: str
age: int
profession: str
city: str
hobbies: List[str]
education: Optional[str] = None
person = extract_with_model(
Person,
"Maria is 32, a software developer in New York. She loves hiking and photography.",
model_name="openai/gpt-4"
)
print(person.model_dump())One LLM call per field. Higher accuracy, per-field error recovery:
from prompture import stepwise_extract_with_model
result = stepwise_extract_with_model(
Person,
"Maria is 32, a software developer in New York. She loves hiking and photography.",
model_name="openai/gpt-4"
)
print(result["model"].model_dump())
print(result["usage"]) # per-field and total token usage| Aspect | extract_with_model |
stepwise_extract_with_model |
|---|---|---|
| LLM calls | 1 | N (one per field) |
| Speed / cost | Faster, cheaper | Slower, higher |
| Accuracy | Good global coherence | Higher per-field accuracy |
| Error handling | All-or-nothing | Per-field recovery |
For raw JSON output with full control:
from prompture import ask_for_json
schema = {
"type": "object",
"required": ["name", "age"],
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
}
}
result = ask_for_json(
content_prompt="Extract the person's info from: John is 28 and lives in Miami.",
json_schema=schema,
model_name="openai/gpt-4"
)
print(result["json_object"]) # {"name": "John", "age": 28}
print(result["usage"]) # token counts and costAnalyze structured data with automatic TOON conversion for 45-60% fewer tokens:
from prompture import extract_from_data
products = [
{"id": 1, "name": "Laptop", "price": 999.99, "rating": 4.5},
{"id": 2, "name": "Book", "price": 19.99, "rating": 4.2},
{"id": 3, "name": "Headphones", "price": 149.99, "rating": 4.7},
]
result = extract_from_data(
data=products,
question="What is the average price and highest rated product?",
json_schema={
"type": "object",
"properties": {
"average_price": {"type": "number"},
"highest_rated": {"type": "string"}
}
},
model_name="openai/gpt-4"
)
print(result["json_object"])
# {"average_price": 389.99, "highest_rated": "Headphones"}
print(f"Token savings: {result['token_savings']['percentage_saved']}%")Works with Pandas DataFrames via extract_from_pandas().
Use the built-in field registry for consistent extraction across models:
from pydantic import BaseModel
from prompture import field_from_registry, stepwise_extract_with_model
class Person(BaseModel):
name: str = field_from_registry("name")
age: int = field_from_registry("age")
email: str = field_from_registry("email")
occupation: str = field_from_registry("occupation")
result = stepwise_extract_with_model(
Person,
"John Smith, 25, software engineer at TechCorp, john@example.com",
model_name="openai/gpt-4"
)Register custom fields with template variables:
from prompture import register_field
register_field("document_date", {
"type": "str",
"description": "Document creation date",
"instructions": "Use {{current_date}} if not specified",
"default": "{{current_date}}",
"nullable": False
})Stateful multi-turn sessions:
from prompture import Conversation
conv = Conversation(model_name="openai/gpt-4")
conv.add_message("system", "You are a helpful assistant.")
response = conv.send("What is the capital of France?")
follow_up = conv.send("What about Germany?") # retains contextRegister Python functions as tools the LLM can call during a conversation:
from prompture import Conversation, ToolRegistry
registry = ToolRegistry()
@registry.tool
def get_weather(city: str, units: str = "celsius") -> str:
"""Get the current weather for a city."""
return f"Weather in {city}: 22 {units}"
conv = Conversation("openai/gpt-4", tools=registry)
result = conv.ask("What's the weather in London?")For models without native function calling (Ollama, LM Studio, etc.), Prompture automatically simulates tool use by describing tools in the prompt and parsing structured JSON responses:
# Auto-detect: uses native tool calling if available, simulation otherwise
conv = Conversation("ollama/llama3.1:8b", tools=registry, simulated_tools="auto")
# Force simulation even on capable models
conv = Conversation("openai/gpt-4", tools=registry, simulated_tools=True)
# Disable tool use entirely
conv = Conversation("openai/gpt-4", tools=registry, simulated_tools=False)The simulation loop describes tools in the system prompt, asks the model to respond with JSON (tool_call or final_answer), executes tools, and feeds results back — all transparent to the caller.
Set cost and token limits with policy-based enforcement:
from prompture import AsyncAgent
agent = AsyncAgent(
"openai/gpt-4o",
max_cost=0.50,
budget_policy="hard_stop", # accepts strings or BudgetPolicy enum
fallback_models=["openai/gpt-4o-mini"],
)Policies: "hard_stop" (raise BudgetExceededError on exceed), "warn_and_continue" (log and proceed), "degrade" (auto-switch to cheaper model at 80% budget).
Extract provider info from model strings:
from prompture import provider_for_model, parse_model_string
provider_for_model("claude/claude-sonnet-4-6") # "claude"
provider_for_model("claude/claude-sonnet-4-6", canonical=True) # "anthropic"
parse_model_string("openai/gpt-4o") # ("openai", "gpt-4o")Auto-detect available models from configured providers:
from prompture import get_available_models
models = get_available_models()
for model in models:
print(model) # "openai/gpt-4", "ollama/llama3:latest", ...import logging
from prompture import configure_logging
configure_logging(logging.DEBUG)All extraction functions return a consistent structure:
{
"json_string": str, # raw JSON text
"json_object": dict, # parsed result
"usage": {
"prompt_tokens": int,
"completion_tokens": int,
"total_tokens": int,
"cost": float,
"model_name": str
}
}prompture run <spec-file>Run spec-driven extraction suites for cross-model comparison.
The most common integration pattern — an AI chat endpoint with database-backed tools:
from fastapi import APIRouter, Depends
from prompture import AsyncAgent, ToolRegistry, ProviderEnvironment, BudgetExceededError
router = APIRouter()
def build_tools(db) -> ToolRegistry:
registry = ToolRegistry()
@registry.tool
async def search_records(query: str) -> str:
"""Search the database for matching records."""
results = await db.execute(...)
return format_results(results)
return registry
@router.post("/chat")
async def chat(message: str, db=Depends(get_db)):
env = ProviderEnvironment(openai_api_key=get_api_key_from_db(db))
agent = AsyncAgent(
"openai/gpt-4o",
env=env,
tools=build_tools(db),
system_prompt="You are a helpful assistant with database access.",
max_cost=0.25,
budget_policy="hard_stop",
)
try:
result = await agent.run(message)
return {"reply": result.output_text, "usage": result.usage}
except BudgetExceededError:
return {"error": "Cost limit exceeded"}, 429Stream responses via Server-Sent Events:
from fastapi.responses import StreamingResponse
from prompture import AsyncAgent, StreamEventType
@router.post("/chat/stream")
async def chat_stream(message: str):
agent = AsyncAgent("claude/claude-sonnet-4-6", env=env, system_prompt="...")
async def event_stream():
async for event in agent.run_stream(message):
match event.event_type:
case StreamEventType.text_delta:
yield f"data: {json.dumps({'type': 'text', 'content': event.data})}\n\n"
case StreamEventType.tool_call:
yield f"data: {json.dumps({'type': 'tool_call', 'name': event.data['name']})}\n\n"
case StreamEventType.output:
yield f"data: {json.dumps({'type': 'done'})}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")Use AsyncConversation.ask_for_json() for one-shot structured data extraction:
from prompture import AsyncConversation
@router.get("/insights")
async def get_insights():
conv = AsyncConversation("openai/gpt-4o", system_prompt="You analyze data.")
result = await conv.ask_for_json(
f"Analyze this data and produce insights:\n\n{context}",
{"type": "object", "properties": {
"insights": {"type": "array", "items": {"type": "object", ...}},
"summary": {"type": "string"},
}},
)
return result["json_object"]Key exceptions to catch in production:
from prompture import BudgetExceededError, DriverError, ExtractionError, ValidationError
try:
result = await agent.run(message)
except BudgetExceededError:
# Cost or token limit exceeded — return 429
pass
except DriverError:
# Provider API error (auth, rate limit, network) — return 502
pass
except ExtractionError:
# JSON parsing/validation failed — return 422
pass
except ValidationError:
# Schema validation failed — return 422
pass# Install with dev dependencies
pip install -e ".[test,dev]"
# Run tests
pytest
# Run integration tests (requires live LLM access)
pytest --run-integration
# Lint and format
ruff check .
ruff format .PRs welcome. Please add tests for new functionality and examples under examples/ for new drivers or patterns.