Enable AI to control your desktop, mobile and HMI devices
-
Updated
Mar 2, 2026 - Python
Enable AI to control your desktop, mobile and HMI devices
A local browser automation agent based on Microsoft Fara-7B model optimized for LM Studio inference.
Production-grade multi-modal AI platform — 17 real-time vision & audio tabs, 22 SDK modules, 7-tier LLM cascade, 37+ endpoints. Built for Vision Possible Hackathon by WeMakeDevs x Stream.
Real-time visual assistance for blind and visually impaired users, powered by Vision Agents
Meeting Assistant built to enhance virtual meetings by providing real-time insights, summaries, and intelligent assistance.
Video Analysis Tool - Agents that write vision code. Developed using OpenAI, Claude and Gemini
Next.js + Python demo of a smart video meeting assistant using Stream Video/Chat and realtime LLMs.
Real-time AI agent for querying live courtroom video with sub-500ms latency. Multimodal search combining video intelligence, speech-to-text, and hybrid search. Built with Stream, Twelve Labs, Deepgram, and Gemini Live API.
Add a description, image, and links to the vision-agents topic page so that developers can more easily learn about it.
To associate your repository with the vision-agents topic, visit your repo's landing page and select "manage topics."