Potency AI

Zero-server intelligence. Runs entirely in your browser.

Potency AI is a full-featured AI workstation that runs entirely inside your web browser. No servers. No API keys. No cloud. Every model — language, vision, speech — executes on-device through WebAssembly and WebGPU. Your data never leaves your machine.

Built on the RunAnywhere SDK, it ships a glassmorphic interface with five integrated tools, a live agent brain debugger, and a complete model management system.

What Makes It Different

	Cloud AI Tools	Potency AI
Data Privacy	Sent to remote servers	Never leaves your device
API Keys	Required, often paid	None needed — ever
Internet	Required for every request	Only for initial model download
Latency	Network round-trip	Instant local inference
Cost	Per-token billing	Free forever
Offline	Broken	Fully functional

Integrated Tools

1. Deep Research Agent

A multi-stage autonomous research pipeline:

Intent Classification — Understands query type (comparison, explanation, evaluation)
Research Planning — Breaks the question into sub-tasks with search strategies
Source Retrieval — Fetches real-time data from Wikipedia (CORS-free REST API)
Architecture Analysis — Extracts patterns, tradeoffs, and key insights
Report Synthesis — Streams a complete Markdown report with citations
Follow-up Generation — Suggests deeper exploration paths

The entire pipeline is observable in the Agent Brain sidebar — watch each sub-agent (Classifier, Planner, Retriever, Analyst, Writer) activate in real time with live log entries.

2. Notes & Chat

A streaming conversation interface backed by local LLM inference:

Real-time token streaming with typing animation
Per-message performance metrics (tokens, tok/s, latency)
Active model indicator with load status
Clear conversation with one click
Persistent across tab switches

3. Speech-to-Intelligence

A complete voice pipeline — speak naturally, get AI responses read back to you:

Voice Activity Detection (Silero VAD v5) — Detects when you start and stop talking
Speech-to-Text (Whisper Tiny) — Transcribes your speech in real time
Language Model — Generates concise responses to what you said
Text-to-Speech (Piper TTS) — Reads the response back with natural voice
Auto-restart Listening — Continuous conversation without re-tapping
Silence Detection — Waits 2.5s after you pause, accumulates multiple segments
Built-in Diagnostics — Test mic access, model status, and pipeline health

4. Vision Engine

Grant AI optical access to your camera:

Snapshot Mode — Capture a single frame and describe it
Continuous Mode — Live feed analysis every 2.5 seconds
Smart Frame Diffing — Skips identical frames to save compute
Custom Prompts — Ask anything: "Read the text", "Count the objects", "Describe the scene"
WASM Crash Recovery — Automatic VLM worker restart with exponential backoff
Built-in Diagnostics — Verify camera, model, worker bridge, SharedArrayBuffer

5. Tool Pipeline Explorer

An interactive sandbox for testing function-calling AI:

Pre-loaded demo tools (weather, calculator, time, random number)
Visual execution trace showing tool calls, results, and final output
Register custom tools with typed parameters at runtime
Auto-execute toggle for hands-free pipeline runs

6. Model Manager

Full control over your on-device model library:

View all models grouped by category (LLM, VLM, STT, TTS, VAD)
Download, load, unload, and delete models individually
Import local models — Drag-and-drop .gguf, .onnx, or .tar.gz files
Real-time download progress with percentage tracking
Storage usage dashboard (used / available OPFS space)

Architecture

Browser Tab
├── React UI (Glassmorphic shell with Tailwind CSS v4)
│   ├── Sidebar Navigation (5 tools + settings)
│   ├── Agent Brain Panel (live pipeline debugger)
│   └── Settings Panel (theme, accent color, background)
│
├── RunAnywhere SDK Core (TypeScript, no WASM)
│   ├── ModelManager (download, cache, load orchestration)
│   ├── EventBus (cross-component communication)
│   ├── AudioCapture / VideoCapture (media APIs)
│   └── VoicePipeline (STT → LLM → TTS orchestrator)
│
├── LlamaCPP Backend (WASM)
│   ├── LLM inference (LFM2 1.2B Tool / 350M)
│   ├── VLM inference via Web Worker (LFM2-VL 450M)
│   ├── Tool calling engine
│   └── WebGPU acceleration (auto-detected)
│
└── ONNX Backend (sherpa-onnx WASM)
    ├── STT (Whisper Tiny English)
    ├── TTS (Piper Lessac Medium)
    └── VAD (Silero v5)

All models are cached in the browser's Origin Private File System (OPFS) — a persistent, sandboxed storage layer. First download pulls from HuggingFace; subsequent loads are instant from cache.

Models

Model	Category	Size	Purpose
LFM2 1.2B Tool Q4_K_M	LLM	~800 MB	Research agent, tool calling, chat
LFM2 350M Q4_K_M	LLM	~250 MB	Fast chat fallback
LFM2-VL 450M Q4_0	VLM	~500 MB	Vision + language (camera analysis)
Whisper Tiny English	STT	~105 MB	Speech recognition
Piper Lessac Medium	TTS	~65 MB	Voice synthesis
Silero VAD v5	VAD	~5 MB	Voice activity detection

All models are quantized for efficient browser execution. Import your own .gguf or .onnx models through the Model Manager.

Quick Start

git clone https://github.com/ayushap18/Potency-AI.git
cd Potency-AI
npm install
npm run dev

Open http://localhost:5173 in Chrome or Edge.

Models download automatically the first time you use each tool. Subsequent visits load from browser cache.

Production Build

npm run build
npm run preview

The build output in dist/ includes all WASM binaries and can be deployed to any static host that supports the required security headers.

Browser Requirements

Potency AI requires modern browser capabilities for on-device neural inference:

Requirement	Why
Chrome 120+ or Edge 120+	WebGPU and WASM SIMD support
SharedArrayBuffer	Multi-threaded WASM execution
Cross-Origin Isolation	Required for SharedArrayBuffer
OPFS	Persistent model storage
WebGPU (optional)	2-4x faster inference when available

The Vite dev server automatically sets the required Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy headers.

For production deployment, configure your server to send:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless

Project Structure

potency-ai/
├── public/
│   └── logo.svg                  # Project logo
├── src/
│   ├── main.tsx                  # React entry point
│   ├── App.tsx                   # Shell layout (header, sidebar, brain panel)
│   ├── runanywhere.ts            # SDK init, model catalog, VLM worker wiring
│   ├── agent/
│   │   ├── agent.ts              # Research pipeline orchestrator
│   │   ├── localLLM.ts           # LLM wrapper with JSON retry logic
│   │   ├── prompts.ts            # Prompt templates with input sanitization
│   │   └── retrieval.ts          # Wikipedia source retrieval
│   ├── components/
│   │   ├── AgentTab.tsx          # Research agent interface
│   │   ├── ChatTab.tsx           # Streaming chat
│   │   ├── VoiceTab.tsx          # Voice pipeline + diagnostics
│   │   ├── VisionTab.tsx         # Camera + VLM + diagnostics
│   │   ├── ToolsTab.tsx          # Tool pipeline sandbox
│   │   ├── ModelManagerPanel.tsx  # Model download/load/import UI
│   │   ├── ModelBanner.tsx       # Download progress indicator
│   │   └── CursorGrid.tsx        # Animated background
│   ├── hooks/
│   │   └── useModelLoader.ts     # Model lifecycle hook
│   ├── context/
│   │   └── ThemeContext.tsx       # Theme, accent color, background
│   ├── workers/
│   │   └── vlm-worker.ts         # VLM Web Worker entry point
│   └── styles/
│       └── index.css             # Tailwind v4 theme tokens
├── vite.config.ts                # Vite config with WASM copy plugin
├── tailwind.config.js
├── tsconfig.json
└── package.json

Design System

The interface uses a custom glassmorphism design system built on Tailwind CSS v4:

Glass panels with backdrop blur and translucent borders
Dynamic theming — Dark / Light mode with smooth transitions
6 accent colors — Switchable in settings
Animated grid background — Reactive cursor-following grid
Responsive layout — Full sidebar on desktop, slide-out drawer on mobile
Material Symbols icon set throughout

Tech Stack

Layer	Technology
Frontend	React 19 + TypeScript 5
Styling	Tailwind CSS v4 (static compilation)
Build	Vite 6
AI Runtime	RunAnywhere Web SDK
LLM/VLM Engine	llama.cpp compiled to WASM
Speech Engine	sherpa-onnx compiled to WASM
GPU Acceleration	WebGPU (auto-detected, CPU fallback)
Model Storage	Browser OPFS (Origin Private File System)
Source Retrieval	Wikipedia REST API

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Roadmap

Multi-turn conversation memory for research agent
Image generation via on-device Stable Diffusion
PDF and document analysis pipeline
Collaborative whiteboard with AI annotation
Offline-first PWA with service worker caching
Custom model fine-tuning in-browser
Multi-language STT/TTS support
Plugin system for community-built tools

License

MIT License. See LICENSE for details.

_{Built with local-first principles. Your intelligence, your hardware, your data.}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
public		public
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
site-teardown-skill.md		site-teardown-skill.md
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vercel.json		vercel.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Potency AI

What Makes It Different

Integrated Tools

1. Deep Research Agent

2. Notes & Chat

3. Speech-to-Intelligence

4. Vision Engine

5. Tool Pipeline Explorer

6. Model Manager

Architecture

Models

Quick Start

Production Build

Browser Requirements

Project Structure

Design System

Tech Stack

Contributing

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Potency AI

What Makes It Different

Integrated Tools

1. Deep Research Agent

2. Notes & Chat

3. Speech-to-Intelligence

4. Vision Engine

5. Tool Pipeline Explorer

6. Model Manager

Architecture

Models

Quick Start

Production Build

Browser Requirements

Project Structure

Design System

Tech Stack

Contributing

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages