promptctl is the definitive CLI for Prompt Engineering as Code. Use it to develop, assert, test, and monitor your LLM prompts with the rigor of software engineering.
Brought to you by 360labs.dev
We build advanced AI agents and developer tools to accelerate the next generation of software.
- 📄 Prompts as Code: Define prompts in Markdown files with YAML frontmatter. Version control your prompts alongside your code.
- ✅ Test-Driven Development: Write test cases in JSON. Assert output quality using exact matches, regex, semantic similarity, or JSON schema validation.
- 📊 Bulk Evaluation: Run hundreds of test cases against your prompts in seconds. Get pass/fail rates and latency metrics instantly.
- ⚖️ A/B Testing: Scientifically compare two versions of a prompt. Calculate win rates and statistical confidence intervals (Bootstrapping) to make data-driven decisions.
- 📈 Local Dashboard: Visualize run history, inspect traces, and analyze costs with a built-in React-based dashboard.
- 🔌 Multi-Provider Support: First-class support for Google Gemini (including 1.5, 2.0, 2.5, and 3.0 series), with extensible architecture for OpenAI, Anthropic, and local models.
- Node.js 18+
- pnpm
Clone the repository and install dependencies:
git clone https://github.com/360labs/promptctl.git
cd promptctl
pnpm install
pnpm run buildLink the CLI globally to use promptctl anywhere:
cd packages/cli
pnpm link --globalNavigate to your workspace and initialize the directory structure:
mkdir my-ai-project
cd my-ai-project
promptctl initThis generates:
prompts/: Directory for your Markdown prompt files.tests/: Directory for your JSON test cases..promptctl/: Local configuration and logging directory.
Create prompts/sentiment.md. Use {{variable}} syntax for dynamic inputs.
---
name: sentiment-classifier
model: gemini-1.5-flash
temperature: 0.0
max_output_tokens: 128
---
You are a sentiment analysis expert.
Classify the following review as POSITIVE, NEGATIVE, or NEUTRAL.
Review: "{{review}}"
Classification:Create tests/sentiment.json. Define inputs and expected assertions.
[
{
"id": "case-1",
"input": { "review": "The product exceeded my expectations in every way!" },
"assert": [{ "type": "contains", "value": "POSITIVE" }]
},
{
"id": "case-2",
"input": { "review": "It arrived broken and support was unhelpful." },
"assert": [{ "type": "contains", "value": "NEGATIVE" }]
}
]Set your API key (currently supporting Google Gemini):
export GOOGLE_API_KEY=AIzaSy...Run the evaluation:
promptctl eval prompts/sentiment.md --tests tests/sentiment.jsonOutput:
| Test ID | Pass | Score | Latency | Tokens | Cost ($) |
|---|---|---|---|---|---|
| case-1 | PASS | 1.0 | 450ms | 45 | 0.000002 |
| case-2 | PASS | 1.0 | 412ms | 48 | 0.000002 |
Compare prompts/v1.md against prompts/v2.md to see if your changes actually improved performance.
promptctl ab prompts/v1.md prompts/v2.md --tests tests/sentiment.jsonPromptctl will run both prompts against the dataset and compute a 95% Confidence Interval on the performance delta.
Launch the local analysis dashboard to browse history:
promptctl dashboardOpen http://localhost:3000 to view your execution logs, costs, and traces.
Detailed documentation is available in the docs/ directory:
- Getting Started: detailed setup guide.
- Writing Assertions: learn about
equals,regex,json_schema, and more. - Providers: configuring Google, OpenAI, etc.
Promptctl is a monorepo managed by Turbo and pnpm workspaces:
packages/core: The heart of the logic (eval runner, A/B engine, assertions).packages/cli: The terminal interface.packages/providers: LLM API adapters.packages/dashboard: Vite + Express visualization tool.packages/trace: Structured JSONL logging and cost calculation.
We welcome contributions! Please see CONTRIBUTING.md for details on how to set up your dev environment and submit PRs.
This project is licensed under the MIT License.
Built with ❤️ by 360labs.dev