A fully offline, voice-activated AI assistant with real-time conversation, tool integration, and a personality of your choice
Imagine an AI assistant that lives entirely on your machine – no cloud dependency, no latency, no privacy concerns. An assistant that can hear you, understand context, use real-time tools, and speak back with a voice you choose.
That's EvaVex.
"The empathy of Eva, the precision of Vex.
Your personal AI. Completely offline. Completely yours."
👀 Watch the Demo
- Real-time speech recognition with Whisper.cpp
- Voice Activity Detection (VAD)
- Streaming responses with immediate audio feedback
- Main LLM (Phi-4-mini) for natural conversation
- Tool LLM (Qwen3-0.6B) for smart function detection
- Fully offline – runs entirely on your hardware
# Just ask naturally:
🌤️ "What's the weather in Paris?"
📈 "Show me Tesla stock price"
🔍 "Search for quantum computing breakthroughs"
⏰ "What time is it in Tokyo?"
📝 "Explain the quantum physics"
# Or combine them:
"Check weather in London and Apple stock price"
"Search for AI news and current time"Choose your companion:
(⌐■_■) VEX
- 🎙️ male voice
- ⚡ Direct, efficient style
- 🎯 Perfect for productivity
(˶˃ ᵕ ˂˶) EVA
- 🎙️ female voice
- 💬 Friendly, engaging style
- ☕ Great for casual conversation
- Streaming TTS – Hear responses as they're generated
- Sentence chunking – Natural speech rhythm
- Concurrent processing – Multiple TTS workers for speed
- Instant interruption – Cmd+Backspace to stop mid-response
- Rich markdown rendering
- Color-coded conversations
- Real-time transcription display
- Live loading animations
flowchart TD
A["🎤 Microphone"] --> B["VAD Detection"]
B --> C["Whisper Transcription"]
C --> D["Input Classifier"]
D -->|General Chat| E["Main LLM - Phi-4-mini"]
D -->|Tool Request| F["Tool LLM - Qwen3-0.6B"]
E --> G["Response Generation"]
F --> H["Function Detection"]
H --> I["Tool Execution"]
I --> J["Result Summary"]
J --> G
G --> K["Text Processing"]
K --> L["Sentence Splitter"]
L --> M["Concurrent TTS"]
M --> N["Audio Playback"]
N --> O["🔊 Speakers"]
| Component | Latency | Notes |
|---|---|---|
| VAD Detection | < 10ms | Real-time processing |
| Whisper Transcription | 800-1500ms | Depends on model size |
| Tool Classification | 300-900ms | Qwen3-0.6B optimized |
| LLM Generation | Streaming | First token in ~100ms |
| TTS Synthesis | Concurrent | Multiple workers |
| 🔬 Test Hardware | |
|---|---|
| Model | Dell Latitude 5420 |
| OS | Arch Linux (Kernel 6.12.68-1-lts) |
| CPU | Intel Core i7-1185G7 (4 cores) |
| GPU | Intel Iris Xe Graphics @ 1.35 GHz (Integrated GPU) |
| RAM | 32GB (DDR4 2667 MT/s) |
All benchmarks run locally on the hardware above. Performance may vary, but expect similar results on modern CPUs/GPUs.
🌐 100% Offline – No cloud, no tracking, no subscriptions
🎮 Gamers & Power Users – Voice control without alt-tabbing. Perfect for streamers & devs
🔧 Hackable – Swap models, add tools, customize personalities
💰 Free & Open Source – No API costs, no limits, no hidden fees
Private by design. Local by choice. Yours forever.
EvaVex Coming Summer 2026 ദ്ദി (˵ •̀ ᴗ - ˵ )✧
