Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,24 @@
.env*
!.env*.example

# # Old expo
# .expo/
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
.venv/

# uv
.uv/
# Agent environment variables
agents/.env

# SSL certificates
*.pem

# Database
*.db
*.sqlite

# Logs
*.log
224 changes: 99 additions & 125 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,151 +1,125 @@
# Blindsighted (Sample App)
# Julie

**A hackathon-ready template for building AI-powered experiences with Ray-Ban Meta smart glasses.**
**AI-powered shopping assistant for visually impaired users.**

Blindsighted is a **sample app** that connects Ray-Ban Meta smart glasses to AI agents via LiveKit. The context is for a visual assistance app for blind/visually impaired users, but the architecture works for any AI-powered glasses experience.
## The Problem

The integration setup with Meta's wearables SDK and LiveKit streaming was finicky to get right. This template gives you a working foundation so you can skip that part and jump straight to the interesting bits.
Grocery shopping is a significant challenge for blind and visually impaired individuals. Identifying products on shelves, reading labels, and locating specific items typically requires assistance from others—limiting independence and privacy.

## Architecture Overview
## The Solution

```
iOS App (Swift) → LiveKit Cloud (WebRTC) → AI Agents (Python)
↓ ↑
└──────→ FastAPI Backend (optional) ───────┘
(sessions, storage, etc.)
```

**Three independent components:**

- **`ios/`** - Native iOS app using Meta Wearables DAT SDK

- Streams video/audio from Ray-Ban Meta glasses to LiveKit
- Receives audio/data responses from agents
- Works standalone if you just want to test the glasses SDK
Julie combines **Ray-Ban Meta smart glasses** with **AI vision and voice** to give users complete autonomy when shopping, providing them with enough information to make qualitative, subjective choices about product selection. No screen interaction required—everything works through natural voice and audio feedback.

- **`agents/`** - LiveKit agents (Python)
## How It Works

- Join LiveKit rooms as peers
- Process live video/audio streams with AI models
- Send responses back via audio/video/data channels
- **This is where the magic happens** - build your AI features here
1. **Point** — User faces a shelf wearing the glasses
2. **Scan** — Gemini [via Elevenlabs TTS] guides positioning until the full shelf is visible
3. **Identification** — Gemini identifies all products
4. **Discuss** — User has back and forth conversation with Elevenlabs Agent to determine item selection
5. **Reach** — AI guides their hand directly to the product using real-time camera feedback

- **`api/`** - FastAPI backend (Python)
- Session management and room creation
- R2 storage for life logs and replays
- Optional but useful for anything ad hoc you need a backend for
The entire experience is **eyes-free**.

**You can work on just one part.** Want to build a cool agent but not touch iOS? Great. Want to experiment with the glasses SDK without running agents? Also fine. Want to add interesting storage/indexing features? The backend's there for you.
## Key Features
- **Voice-first interaction** — No buttons, no screens, just conversation
- **Real-time guidance** — Continuous audio feedback using clock positions ("move to 2 o'clock")
- **Product identification** — Recognizes items, brands, prices, and shelf locations
- **Hand guidance** — Guides user's hand to the exact product location
- **Works with existing hardware** — Ray-Ban Meta glasses + iPhone

## Quick Start

### iOS App
## System Architecture

```bash
cd ios
open Blindsighted.xcodeproj
# Build and run in Xcode (⌘R)
```

**Requirements**: Xcode 26.2+, iOS 17.0+, Swift 6.2+

See [ios/README.md](ios/README.md) for detailed setup.

### Agents

```bash
cd agents
uv sync
uv run example_agent.py dev
👓 RAY-BAN META GLASSES
│ photos
┌─────────────┐
│ iOS App │
└──────┬──────┘
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ LOW photos │ │ HIGH photo │ │ LOW photos │
│ (position) │ │ (identify) │ │ (guidance) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ GEMINI VISION AI │
│ │
│ ① Navigation Mode ② Identification Mode ③ Hand Guidance Mode │
│ "Move camera right" "Found 12 products" "Move hand to 2 o'clock" │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ 🔊 TTS Audio CSV Product List 🔊 TTS Audio │
└─────────┬───────────────────────┬───────────────────────────┬────────────────┘
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ FASTAPI BACKEND │ │
│ │ │ │
│ │ POST /csv/upload ←── Gemini │ │
│ │ GET /csv/get-summary ──→ 11L │ │
│ │ POST /user-choice ←── 11L │◄──────────┘
│ │ GET /user-choice/latest ──→ Gemini │
│ │ │ │
│ └────────────────┬────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ ELEVENLABS CONVERSATIONAL AI │ │
│ │ │ │
│ │ 🎤 User: "What's available?" │ │
│ │ 📋 Agent: Reads product list │ │
│ │ 🎤 User: "I want the Coca Cola"│ │
│ │ ✅ Agent: Posts choice to API ─┼───────────┘
│ │ │ triggers ③
│ └─────────────────────────────────┘
🔊 AUDIO OUTPUT (via glasses speakers)
```

**Test without hardware**: Use the [LiveKit Agents Playground](https://agents-playground.livekit.io/) to test agents with your webcam/microphone instead of glasses.
**Flow Summary:**
1. **LOW photos** → Gemini guides camera positioning → Audio feedback
2. **HIGH photo** → Gemini identifies products → CSV uploaded to API
3. **ElevenLabs Agent** reads products, user selects via voice → Choice posted to API
4. **LOW photos** → Gemini reads user choice from API → Hand guidance mode → Audio feedback

See [agents/README.md](agents/README.md) for agent development.
| Component | Purpose |
|-----------|---------|
| `ios/` | Captures photos from Ray-Ban Meta glasses |
| `agents/` | Gemini AI for vision analysis + ElevenLabs TTS for audio output |
| `api/` | Backend storing product data and user selections |

### API Backend (Optional)
## Quick Start

```bash
cd api
uv sync
uv run main.py
```

API docs at `http://localhost:8000/docs`

## What's Included

### iOS App Features

- Live video streaming from Ray-Ban Meta glasses
- Audio routing to/from glasses (left/right channel testing)
- Photo capture during streaming
- Video recording and local storage
- Video gallery with playback
- LiveKit integration with WebRTC
- Share videos/photos
# API
cd api && uv sync && uv run main.py

### Agent Template
# Agent
cd agents && uv sync && uv run shelf_assistant.py

- LiveKit room auto-join based on session
- Audio/video stream processing
- AI model integration examples (vision, TTS)
- Bidirectional communication (receive video, send audio)

### Backend API

- Session management endpoints
- LiveKit room creation with tokens
- R2 storage integration for life logs
- FastAPI with dependency injection patterns

## Use It Your Way

**Feel free to:**

- Rip out everything you don't need
- Replace the AI models with your own
- Change the entire agent architecture
- Use a different backend (or no backend)
- Build something completely different on top of the glasses SDK

**This is over-engineered for a hackathon.** The three-component architecture exists because I found the initial setup painful and wanted to provide options. If you have a better approach or this feels too complicated, throw it away! The point is to give you working examples to learn from, not to force an architecture on you.

## Environment Variables & API Keys

The app needs a few API keys to work:

- **LiveKit**: Server URL, API key, API secret (for WebRTC streaming)
- **OpenRouter API Key** (optional, for AI models)
- **ElevenLabs API Key** (optional, for TTS)

**Having trouble getting something running?** Reach out and I'll unblock you.
# iOS
cd ios && open Blindsighted.xcodeproj
```

See `ios/Config.xcconfig.example` and `api/.env.example` for configuration details.
**Required API keys** (in `.env` files):
- `GOOGLE_API_KEY` — Gemini vision AI
- `ELEVENLABS_API_KEY` — Voice synthesis

## Documentation
## Accessibility by Design

- **CLAUDE.md** - Full development guide with architecture details, code patterns, troubleshooting
- **ios/README.md** - iOS-specific setup and configuration
- **agents/README.md** - Agent development guide
- **api/** - Backend API with OpenAPI docs at `/docs`
- **No visual interface required** — All feedback is audio
- **Natural language** — "I want the orange juice" not menu navigation
- **Spatial audio cues** — Clock positions for intuitive direction
- **Confirmation feedback** — "Got it!" when item is reached
- **Error recovery** — Graceful re-prompting if something goes wrong

## License

**In short:** Keep it open source, it's fine to make money with it. I'd love to see what you build with it.

**Exception**: The iOS app incorporates sample code from Meta's [meta-wearables-dat-ios](https://github.com/facebook/meta-wearables-dat-ios) repository, which has its own license terms. Check that repo for Meta's SDK license.

## Why Does This Exist?

I built this because:

1. Getting Meta's wearables SDK working took a bit of time without being 'fun'.
2. Originally I had custom WebRTC streaming (which took a lot of time); Pentaform showed me LiveKit which seems much more suitable for a hackathon use-case so I swapped over to that for this project, but also has it's own pain points.
3. Unlikely typical hackathons which are one-and-done, it'd be great to have something people can iterate on.

If this helps you build something cool, that's awesome. If you find a better way to do any of this, even better.

## Contributing

Found a bug? Have a better pattern? PRs welcome. This is meant to help people, so improvements that make it easier to use or understand are great.
MIT License — See [LICENSE](LICENSE)
Binary file added Screenshot 2026-01-17 at 1.46.20 PM.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions agents/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""LiveKit Agents for Blindsighted - Vision-based AI assistance."""
"""Julie Agents - Gemini-powered vision assistance for supermarket shopping."""

from agents.vision_agent import VisionAssistant, vision_agent, server
from shelf_assistant import ShelfAssistant, LocalPhotoManager

__all__ = ["VisionAssistant", "vision_agent", "server"]
__all__ = ["ShelfAssistant", "LocalPhotoManager"]
19 changes: 6 additions & 13 deletions agents/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,15 @@ class Settings(BaseSettings):
extra="ignore",
)

# LiveKit Agent Configuration
livekit_agent_name: str = "vision-agent"
livekit_url: str = ""
livekit_api_key: str = ""
livekit_api_secret: str = ""
# Google AI API (for Gemini)
google_api_key: str = ""

# OpenRouter API
openrouter_api_key: str = ""
openrouter_base_url: str = "https://openrouter.ai/api/v1"
# API Backend URL
api_base_url: str = "https://localhost:8000"

# ElevenLabs API
# ElevenLabs Conversational AI (for reference)
elevenlabs_api_key: str = ""
elevenlabs_voice_id: str = "21m00Tcm4TlvDq8ikWAM" # Rachel voice

# Deepgram API
deepgram_api_key: str = ""
elevenlabs_agent_id: str = "agent_0701kf5rm5s6f7jtnh7swk9nkx0a"


settings = Settings()
Loading