LightLens

Real-time visual assistance for blind and visually impaired users, powered by Vision Agents.

LightLens streams live camera video to an AI agent that describes surroundings, warns about hazards, and gives step-by-step walking directions using clock-face references ("chair at 2 o'clock, about 3 steps").

How It Works

LightLens is built on Vision Agents, a framework for building real-time voice and video AI applications. Vision Agents handles the hard parts like WebRTC transport, LLM orchestration, video frame distribution, function calling, and session management. so we can focus on the visual assistance logic.

Vision Agents in This Project

Agent + Gemini Realtime LLM: Vision Agents connects the user's microphone and camera to Google's Gemini Realtime model for native speech-to-speech conversation with sub-50ms latency. The user speaks, Gemini sees the live video and hears the audio, and speaks back, all through Vision Agents Agent class and gemini.Realtime plugin.

Video Processors: Vision Agents distributes video frames to custom processors at independent FPS rates. LightLens uses three:

Processor	Base Class	FPS	What It Does
YOLO	`VideoProcessorPublisher`	20	Detects objects (person, chair, car, etc.) with bounding boxes
MiDaS	`VideoProcessorPublisher`	15	Estimates depth/distance for a 3x3 spatial grid
Navigation	`Processor`	every 5s	Fuses YOLO + MiDaS into step-by-step walking directions

Function Calling: Vision Agents' @llm.register_function() decorator lets us register Python functions that Gemini can call mid-conversation. LightLens registers get_steps_to_nearest_object() so the agent can answer "how do I get to the chair?" with real-time sensor data.

Edge Transport (Stream.io): Vision Agents' getstream.Edge plugin handles WebRTC video/audio transport and chat through Stream's global edge network. The framework manages call creation, user tokens, and session lifecycle.

HTTP Server Mode: Vision Agents Runner + AgentLauncher serve the agent as a FastAPI application with built-in session management (POST /sessions, DELETE /sessions/{id}), health checks, and concurrency limits.

Architecture

Browser (React)                          Backend (Python)
┌──────────────┐    Stream WebRTC    ┌───────────────────────┐
│  Video Call  │◄──────────────────► │  Gemini Realtime LLM  │
│  Chat Panel  │                     │                       │
│              │                     │ YOLO Processor (20fps)│
│  YOLO Panel  │    WebSocket /ws    │ MiDaS Processor(15fps)│
│  MiDaS Panel │◄──────────────────► │  Nav Processor (5s)   │
│  Nav Panel   │                     │                       │
└──────────────┘                     └───────────────────────┘

Prerequisites

Python 3.13+
Node.js 18+
Stream.io account (API key + secret)
Google AI Studio Gemini API key

Setup

1. Clone and configure

git clone <repo-url>
cd LightLens

Create the environment file:

cp ai/.env.example ai/.env

Edit ai/.env and fill in your keys:

STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret
GEMINI_API_KEY=your_gemini_api_key
...

2. Start the backend

cd ai
uv sync
uv run main.py serve

The backend starts on http://localhost:8000.

3. Start the frontend

cd frontend
npm install
npm run dev

The frontend starts on http://localhost:5173.

4. Use it

Open http://localhost:5173 in your browser, click Connect, and allow camera + microphone access. The AI agent will join the call and start describing what it sees.

Project Structure

ai/
├── main.py              # Entry point
├── app.py               # FastAPI app setup
├── config.py            # Environment variables
├── agent.py             # Agent creation + LLM function tools
├── ws_manager.py        # WebSocket broadcast manager
├── instructions.md      # Agent behavior prompt (16 rules)
├── processors/
│   ├── yolo_processor.py
│   ├── midas_processor.py
│   └── navigation_processor.py
└── routes/
    ├── token.py         # POST /api/token
    └── ws.py            # WebSocket /ws

frontend/
├── src/
│   ├── App.tsx          # Main layout
│   ├── api.ts           # Backend API calls
│   ├── hooks/           # useStreamCall, useWebSocket
│   └── components/      # UI panels
└── vite.config.ts       # Dev proxy to backend

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ai		ai
frontend		frontend
.gitignore		.gitignore
README.md		README.md
vision-agent-skill.md		vision-agent-skill.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LightLens

How It Works

Vision Agents in This Project

Architecture

Prerequisites

Setup

1. Clone and configure

2. Start the backend

3. Start the frontend

4. Use it

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Shadow-Flash/LightLens

Folders and files

Latest commit

History

Repository files navigation

LightLens

How It Works

Vision Agents in This Project

Architecture

Prerequisites

Setup

1. Clone and configure

2. Start the backend

3. Start the frontend

4. Use it

Project Structure

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages