Skip to content

johnathanchiu/reticle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reticle

Visual agent toolkit — grid overlays and spatial tools for LLM agents.

LLMs struggle with precise spatial reasoning in images. Reticle bridges this gap by giving agents a coordinate grid overlay and point-plotting tools, enabling iterative spatial understanding through tool use.

Demo

# Install
pip install reticle[openai]    # or [gemini], [all]

# Set your API key
export OPENAI_API_KEY=sk-...

# Point to objects in an image
python main.py examples/grocery.jpg "point to the red onions"
python main.py kitchen.jpg "point to every appliance"
python main.py room.jpg "point to the corners of the table"
python main.py floorplan.png "point to all the doors" --model gemini-3.1-pro-preview

The agent will plot labeled markers on the image and save an annotated image to <name>_pointed.<ext>.

How it works

  1. Grid overlay — renders a normalized coordinate grid on any image (red=x, blue=y) so agents can reference precise locations
  2. Point plotting — agents plot labeled markers on objects using grid coordinates
  3. Agent loop — multi-turn tool-use loop where the agent sees images, calls tools, and iterates

Usage

import asyncio
from reticle.agent.loop import AgentLoop
from reticle.llm.routing import get_llm_service
from reticle.tools.grid import render_grid_overlay
from reticle.tools.plot_points import PlotPointTool
from reticle.tools.image import load_image_base64, infer_media_type
from reticle.agent.events import TextDeltaEvent, CompleteEvent

async def analyze(image_path: str):
    llm = get_llm_service("gpt-5.4", thinking_level="low")
    grid_b64 = render_grid_overlay(image_path)

    plot_tool = PlotPointTool()
    plot_tool.set_image_path(image_path)
    plot_tool.set_grid_b64(grid_b64)

    loop = AgentLoop(
        llm=llm,
        tools=[plot_tool],
        system_prompt="You are a visual pointing agent. Use plot_points to mark objects.",
        max_turns=5,
    )

    # Seed with image
    img_b64 = load_image_base64(image_path)
    media_type = infer_media_type(image_path)
    loop.conversation.append({
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": media_type, "data": img_b64}},
            {"type": "text", "text": "Point to the windows."},
        ],
    })

    async for event in loop.run():
        if isinstance(event, TextDeltaEvent):
            print(event.delta, end="", flush=True)
        elif isinstance(event, CompleteEvent):
            print(f"\nDone: {event.reason}")

asyncio.run(analyze("photo.png"))

Building custom tools

Extend BaseDeclarativeTool to create your own visual tools:

from reticle.agent.tools.base import BaseDeclarativeTool, BaseToolInvocation, ToolResult

class MyTool(BaseDeclarativeTool):
    def __init__(self):
        schema = {
            "type": "function",
            "function": {
                "name": "my_tool",
                "description": "Does something visual",
                "parameters": {"type": "object", "properties": {}, "required": []},
            },
        }
        super().__init__("my_tool", schema)

    async def build(self, params):
        return MyToolInvocation(params)

class MyToolInvocation(BaseToolInvocation):
    def get_description(self):
        return "Running my tool"

    async def execute(self):
        # Return rich content (images + text) for the agent to see
        return ToolResult(
            output="result",
            content_blocks=[
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}},
                {"type": "text", "text": "Feedback for the agent"},
            ],
        )

Supported LLM providers

  • OpenAI — GPT-4.1, GPT-5, o3, o4 (Responses API with reasoning)
  • Google — Gemini 2.5/3.x (thinking budgets)

Set API keys via environment variables: OPENAI_API_KEY, GEMINI_API_KEY.

Architecture

reticle/
├── agent/         # Core agent loop, events, tool framework
│   ├── loop.py    # AgentLoop — multi-turn agentic loop
│   ├── events.py  # Typed event system (streaming)
│   └── tools/     # BaseDeclarativeTool, ToolRegistry
├── llm/           # Multi-provider LLM abstraction
│   ├── base.py    # BaseLLMService interface
│   ├── routing.py # Prefix-based provider routing
│   └── ...        # OpenAI, Gemini
└── tools/         # Visual tools
    ├── grid.py    # Coordinate grid overlay
    ├── plot_points.py  # Point plotting + edge detection
    └── image.py   # Image utilities

License

MIT

About

a pattern of fine lines or markings built into the eyepiece of an optical device

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages