A Python framework for self-optimizing LLM agents. The agent runs against an evaluation dataset, and an optimizer LLM automatically rewrites the agent's configuration to improve its score — iterating until performance peaks or a target score is reached.
The agent is driven by agent.toml, a plain-text config that defines its system prompt and tool usage instructions. At eval time:
- The agent runs every dataset pair in parallel (two-stage LLM loop per pair).
- Outputs are scored by cosine similarity against expected answers using OpenAI embeddings.
- The optimizer LLM receives the full execution log and the current
agent.toml, then rewrites it to improve the score. - The new config is kept only if the score improves; otherwise the previous config is restored.
- When the agent hits a perfect score, the winning config is saved as
agent.<timestamp>.toml.
# One-shot agent run
uv run python src/manager.py
# Evaluation + self-optimization loop
uv run python src/eval.pyRequires OPENAI_API_KEY in .env.
[instructions]
system = "... {{prompt}} ..."
[tool]
spec = "... tool usage instructions ..."{{prompt}} is replaced with the user's query at runtime. The optimizer rewrites this file to improve agent performance.