refine

A CLI tool that wraps LLM API calls in iterative refinement loops to produce higher-quality outputs for complex cognitive tasks.

Instead of a single prompt-and-response, refine decomposes a task, generates an initial response, evaluates it against task-specific quality criteria, and refines through targeted iteration — stopping when quality thresholds are met or the cost budget is exhausted.

Setup (decompose) → Loop (generate → evaluate → refine → audit) × N → Teardown (synthesise)

Install

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Configuration

Copy the example environment file and add your keys:

cp .env.example .env

# Required: Anthropic API key
ANTHROPIC_API_KEY=sk-ant-...

# Optional: Pinboard token for research phase (see below)
PINBOARD_TOKEN=username:TOKEN

Authentication

refine resolves Anthropic credentials using a three-tier cascade:

Priority	Source	How it's used
1	`ANTHROPIC_API_KEY` in `.env` or environment	Standard PydanticAI agent
2	`ANTHROPIC_AUTH_TOKEN`	PydanticAI via `AsyncAnthropic(auth_token=...)`
3	`claude` CLI on PATH	Shells out to `claude -p` (uses Claude Code's own auth)

Inside Claude Code / Claude Desktop: auth is automatic — the tool detects the CLI and uses it as a fallback when no API key is set.

Usage

refine [PROMPT] [OPTIONS]

Research (deep-research profile, default)

# Research task with default settings
refine "Analyse the competitive landscape for AI code editors"

# Quick draft — fewer iterations, lower cost
refine "Pros and cons of Rust vs Go for CLI tools" --tier quick

# Thorough analysis — more iterations, higher quality targets
refine "Develop a market entry strategy for UK fintech" --tier thorough

# Save to file
refine "State of WebAssembly adoption in 2026" -o output/report.md

Copywriting (copywriting profile)

# Elevator pitch
refine "Write a 100-word elevator pitch for a B2B SaaS product" --profile copywriting

# Multiple variants
refine "Write 3 tagline variants for a fintech app targeting Gen Z: one punchy, one warm, one investor-facing" --profile copywriting

# Quick first draft — generate + evaluate, no refinement
refine "Write a launch email subject line for our new API product" --profile copywriting --tier quick

# High-stakes copy with more refinement passes
refine "Write the hero copy for our Series B fundraise landing page" --profile copywriting --tier thorough -o output/hero-copy.md

Working with files

# Refine an existing draft
refine --input draft.md

# Analyse a document with a specific question
refine "Evaluate this against current market conditions" --input business-plan.md

Controlling cost and iterations

# Set a strict budget
refine "..." --max-budget-usd 1.00

# Disable budget limit entirely — run until quality or iteration cap
refine "..." --no-budget-limit

# Limit to 1 iteration (generate + evaluate only, no refinement)
refine "..." --max-iterations 1

# Use a cheaper model for all phases
refine "..." --model claude-haiku-4-5

Budget behaviour: When the budget is about to be exceeded, the tool prompts you to continue or stop. If you continue, the budget ceiling is raised to cover the next operation. This avoids losing work mid-run while keeping cost visibility. Use --no-budget-limit to skip the prompt entirely.

Options

Flag	Short	Description
`--profile`	`-p`	Refinement profile name (default: `deep-research`)
`--tier`	`-t`	Quality tier: `quick`, `standard`, `thorough` (default: `standard`)
`--input`	`-i`	Input file path (markdown, text)
`--output`	`-o`	Output file path (default: stdout)
`--model`	`-m`	Override all models (e.g. `claude-haiku-4-5`)
`--max-budget-usd`		Override per-loop cost budget
`--no-budget-limit`		Disable budget cap (run until quality or iteration cap)
`--max-iterations`		Override maximum iteration count
`--dry-run`		Show decomposition and cost estimate without executing
`--verbose`	`-v`	Show evaluation details and debug logging
`--quiet`	`-q`	Suppress progress, output result only
`--force`		Skip pre-flight checks (input size warnings)

Research phase (optional)

Before running refine on a complex topic, you can optionally prime the prompt with curated research from your Pinboard bookmarks. This requires the Pinboard MCP server and a valid token.

Setup

pip install pinboard-mcp-server

Add your Pinboard token to .env:

PINBOARD_TOKEN=username:TOKEN

Add the MCP server to your Claude Code config (.claude/settings.local.json):

{
  "mcpServers": {
    "pinboard": {
      "command": "pinboard-mcp-server",
      "env": {
        "PINBOARD_TOKEN": "${PINBOARD_TOKEN}"
      }
    }
  }
}

Workflow

The research phase is a manual step run inside Claude Code before invoking refine:

Map tags — Call listTags to find tags relevant to your topic
Query bookmarks — Call listBookmarksByTags for each relevant tag
Search keywords — Call searchBookmarks for key phrases the tag approach might miss
Curate — Deduplicate, rank by relevance, and save a manifest
Extract — Fetch each URL, extract key ideas, and write a synthesis

The curated research can then be passed to refine as input:

refine "Design a monetisation strategy for a two-sided marketplace" \
  --input output/research-synthesis.md \
  --tier thorough \
  -o output/strategy.md

See docs/research-manifest.md and docs/research-synthesis.md for examples of this workflow's output.

Profiles

Profiles define what the tool evaluates, how it decomposes tasks, and what "good" looks like. Each profile ships with its own axes, prompt templates, and budget defaults.

`deep-research` (default)

Multi-source research synthesis with evidence grading. Decomposes via question tree, refines additively, produces structured reports.

Axis	Weight	Hard-fail	What it measures
Coverage	1.0	No	Has the question space been fully explored?
Evidence Quality	1.2	Yes	Are claims supported by credible sources?
Coherence	1.0	No	Is the argument internally consistent?
Depth	0.8	No	Does the analysis go beyond surface-level?
Actionability	0.8	No	Can the reader act on the findings?

`copywriting`

Short-form persuasive writing: pitches, taglines, positioning, brand copy. Decomposes via constraint mapping (extracts rules from the brief, identifies variants), refines eliminatively (cuts rather than adds), produces copy directly with no strategic scaffolding.

Axis	Weight	Hard-fail	What it measures
Precision	1.2	No	Every sentence earns its place — no padding, no filler
Constraint Compliance	1.0	Yes	Follows every hard rule in the brief (naming, claims, format)
Audience Fit	1.0	No	The target listener recognises their reality in this
Distinctiveness	0.8	No	Variants sound meaningfully different, not just reworded
Coherence	0.8	No	Tone is consistent, claims don't contradict each other

Hard-fail axes must pass before the loop can exit, regardless of other scores.

Quality tiers

Tiers adjust targets, iteration caps, and budgets relative to the profile's base values:

Tier	Targets	Max iterations	Budget	Use when
`quick`	Default - 1 (floor: 2)	2	50% of base	Speed matters more than polish
`standard`	As defined in profile	base	100% of base	Normal use
`thorough`	Default + 1 (cap: 4)	base × 1.5	200% of base	High-stakes output worth the spend

How it works

Each run follows three phases:

1. Decomposition — The prompt is broken into sub-tasks using a task-specific strategy (question tree for research, constraint mapping for copywriting).

2. Iterative loop — For each iteration:

Generate content for each sub-task (or refine from previous iteration)
Evaluate against multi-axis quality criteria
Audit — deterministic check: all axes pass? budget left? diminishing returns? Continue or stop.

3. Synthesis — Sub-task outputs are assembled into a coherent final document.

The loop terminates when:

All quality axes meet their targets
Cost budget is exhausted (you'll be prompted to continue or stop)
Improvement plateaus for 2 consecutive iterations
Maximum iterations reached

Output

All generated content should be saved to the output/ directory (gitignored by default).

The final output is markdown with a metadata block (HTML comment, non-rendering):

<!-- refine-metadata
profile: deep-research
tier: standard
iterations: 2
termination: quality_threshold_reached
cost_usd: 0.78
duration_seconds: 8.1
axes:
  coverage: 3/3 ✓
  evidence_quality: 3/3 ✓
-->

Every run also writes a full JSON audit trail to .refine/logs/.

Tests

pytest              # run all tests
pytest -v           # verbose output
pytest tests/test_auditor.py   # just the convergence detector tests

Project structure

.
├── refine/
│   ├── __init__.py
│   ├── cli.py                    # CLI entry point (Typer + Rich)
│   ├── config.py                 # Settings, auth resolution, env loading
│   ├── errors.py                 # Error types
│   ├── engine/
│   │   ├── loop.py               # Core refinement loop orchestrator
│   │   ├── decomposer.py         # Task decomposition (setup)
│   │   ├── generator.py          # Generation + refinement
│   │   ├── evaluator.py          # Multi-axis evaluation
│   │   ├── auditor.py            # Convergence detection (deterministic)
│   │   ├── synthesiser.py        # Final assembly (teardown)
│   │   ├── context.py            # Context window management
│   │   ├── llm.py                # LLM abstraction (PydanticAI + claude -p fallback)
│   │   └── templates.py          # Jinja2 template rendering
│   ├── models/
│   │   ├── evaluation.py         # EvaluationResult, AxisEvaluation
│   │   ├── decomposition.py      # DecompositionResult, SubTask
│   │   ├── profile.py            # Profile config + loading
│   │   ├── trace.py              # Audit trail models
│   │   └── cost.py               # Cost tracking + pricing
│   └── profiles/
│       ├── deep-research/        # Research synthesis profile
│       │   ├── profile.yaml
│       │   └── *.md              # 5 prompt templates
│       └── copywriting/          # Short-form persuasive writing profile
│           ├── profile.yaml
│           └── *.md              # 5 prompt templates
├── tests/
├── docs/
│   ├── spec-v2.md                # Design specification
│   ├── research-manifest.md      # Curated bookmark research
│   └── research-synthesis.md     # Research extraction + synthesis
├── output/                       # Generated content (gitignored)
├── .env.example                  # Environment template
├── pyproject.toml
├── LICENSE
└── README.md

Licence

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

refine

Install

Configuration

Authentication

Usage

Research (deep-research profile, default)

Copywriting (copywriting profile)

Working with files

Controlling cost and iterations

Options

Research phase (optional)

Setup

Workflow

Profiles

`deep-research` (default)

`copywriting`

Quality tiers

How it works

Output

Tests

Project structure

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
output		output
refine		refine
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

refine

Install

Configuration

Authentication

Usage

Research (deep-research profile, default)

Copywriting (copywriting profile)

Working with files

Controlling cost and iterations

Options

Research phase (optional)

Setup

Workflow

Profiles

deep-research (default)

copywriting

Quality tiers

How it works

Output

Tests

Project structure

Licence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`deep-research` (default)

`copywriting`

Packages