reticle

Visual agent toolkit — grid overlays and spatial tools for LLM agents.

LLMs struggle with precise spatial reasoning in images. Reticle bridges this gap by giving agents a coordinate grid overlay and point-plotting tools, enabling iterative spatial understanding through tool use.

Demo

# Install
pip install reticle[openai]    # or [gemini], [all]

# Set your API key
export OPENAI_API_KEY=sk-...

# Point to objects in an image
python main.py examples/grocery.jpg "point to the red onions"
python main.py kitchen.jpg "point to every appliance"
python main.py room.jpg "point to the corners of the table"
python main.py floorplan.png "point to all the doors" --model gemini-3.1-pro-preview

The agent will plot labeled markers on the image and save an annotated image to <name>_pointed.<ext>.

How it works

Grid overlay — renders a normalized coordinate grid on any image (red=x, blue=y) so agents can reference precise locations
Point plotting — agents plot labeled markers on objects using grid coordinates
Agent loop — multi-turn tool-use loop where the agent sees images, calls tools, and iterates

Usage

import asyncio
from reticle.agent.loop import AgentLoop
from reticle.llm.routing import get_llm_service
from reticle.tools.grid import render_grid_overlay
from reticle.tools.plot_points import PlotPointTool
from reticle.tools.image import load_image_base64, infer_media_type
from reticle.agent.events import TextDeltaEvent, CompleteEvent

async def analyze(image_path: str):
    llm = get_llm_service("gpt-5.4", thinking_level="low")
    grid_b64 = render_grid_overlay(image_path)

    plot_tool = PlotPointTool()
    plot_tool.set_image_path(image_path)
    plot_tool.set_grid_b64(grid_b64)

    loop = AgentLoop(
        llm=llm,
        tools=[plot_tool],
        system_prompt="You are a visual pointing agent. Use plot_points to mark objects.",
        max_turns=5,
    )

    # Seed with image
    img_b64 = load_image_base64(image_path)
    media_type = infer_media_type(image_path)
    loop.conversation.append({
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": media_type, "data": img_b64}},
            {"type": "text", "text": "Point to the windows."},
        ],
    })

    async for event in loop.run():
        if isinstance(event, TextDeltaEvent):
            print(event.delta, end="", flush=True)
        elif isinstance(event, CompleteEvent):
            print(f"\nDone: {event.reason}")

asyncio.run(analyze("photo.png"))

Building custom tools

Extend BaseDeclarativeTool to create your own visual tools:

from reticle.agent.tools.base import BaseDeclarativeTool, BaseToolInvocation, ToolResult

class MyTool(BaseDeclarativeTool):
    def __init__(self):
        schema = {
            "type": "function",
            "function": {
                "name": "my_tool",
                "description": "Does something visual",
                "parameters": {"type": "object", "properties": {}, "required": []},
            },
        }
        super().__init__("my_tool", schema)

    async def build(self, params):
        return MyToolInvocation(params)

class MyToolInvocation(BaseToolInvocation):
    def get_description(self):
        return "Running my tool"

    async def execute(self):
        # Return rich content (images + text) for the agent to see
        return ToolResult(
            output="result",
            content_blocks=[
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}},
                {"type": "text", "text": "Feedback for the agent"},
            ],
        )

Supported LLM providers

OpenAI — GPT-4.1, GPT-5, o3, o4 (Responses API with reasoning)
Google — Gemini 2.5/3.x (thinking budgets)

Set API keys via environment variables: OPENAI_API_KEY, GEMINI_API_KEY.

Architecture

reticle/
├── agent/         # Core agent loop, events, tool framework
│   ├── loop.py    # AgentLoop — multi-turn agentic loop
│   ├── events.py  # Typed event system (streaming)
│   └── tools/     # BaseDeclarativeTool, ToolRegistry
├── llm/           # Multi-provider LLM abstraction
│   ├── base.py    # BaseLLMService interface
│   ├── routing.py # Prefix-based provider routing
│   └── ...        # OpenAI, Gemini
└── tools/         # Visual tools
    ├── grid.py    # Coordinate grid overlay
    ├── plot_points.py  # Point plotting + edge detection
    └── image.py   # Image utilities

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
src/reticle		src/reticle
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reticle

Demo

How it works

Usage

Building custom tools

Supported LLM providers

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

reticle

Demo

How it works

Usage

Building custom tools

Supported LLM providers

Architecture

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages