Speak your mind. AI does the rest.
A fully local, AI-powered voice diary. Record your thoughts through the microphone, and the app automatically transcribes your speech, refines it into a beautiful diary entry using a local LLM, and presents it back as a cinematic story with mood-based visuals — all without a single cloud dependency.
- Voice Recording — one tap to record, one tap to stop
- Local Speech-to-Text — powered by faster-whisper (Whisper base model, runs fully on CPU)
- AI Refinement — raw transcripts refined into emotionally resonant diary entries via Ollama (llama3.2)
- Mood Detection — automatically tags each entry with a mood: happy, sad, nostalgic, anxious, excited, calm, or angry
- Story View — revisit entries as a cinematic full-screen story with mood-matched gradient backgrounds and animated text reveals
- 100% Local — no internet required after setup; all AI runs on your machine
| Layer | Technology |
|---|---|
| Frontend | Next.js 15, Tailwind CSS, Framer Motion |
| Speech-to-Text | faster-whisper (Python + FastAPI) |
| AI Refinement | Ollama (llama3.2) |
| Database | MongoDB (local) |
| Monorepo | Turborepo + npm workspaces |
voice-diary/
├── apps/
│ ├── web/ # Next.js app (frontend + API routes)
│ └── stt/ # Python faster-whisper microservice
├── packages/
│ └── db/ # Shared Mongoose models
├── scripts/
│ ├── setup.ps1 # One-time setup script (Windows)
│ ├── start-mongo.ps1
│ ├── start-ollama.ps1
│ └── start-stt.ps1
├── data/ # Local MongoDB data (gitignored)
└── package.json
Make sure you have these installed before running setup:
- Node.js v18+
- Python 3.10+
- MongoDB Community Server 6.0+
- Ollama (with llama3.2 pulled)
- Windows 10/11 (scripts are PowerShell — Linux/Mac users replace
.ps1with.sh)
1. Clone the repo
git clone https://github.com/Prabodh-dev/voice-diary.git
cd voice-diary2. Create your environment file
cp .env.example apps/web/.env.localEdit apps/web/.env.local:
MONGODB_URI=mongodb://127.0.0.1:27017/voice-diary
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2
STT_URL=http://localhost:80003. Run one-time setup
npm run setupThis will install all Node and Python dependencies, create the Python virtual environment, and pull the llama3.2 model from Ollama.
npm run devThis single command starts all four services together:
| Service | Port | Status |
|---|---|---|
| MongoDB | 27017 | starts automatically |
| Ollama | 11434 | skips if already running |
| Whisper STT | 8000 | starts Python service |
| Next.js Web | 3000 | main app |
🎤 You speak
↓
📝 faster-whisper transcribes audio → raw text
↓
🤖 Ollama (llama3.2) rewrites it → title, refined entry, mood, tags
↓
💾 Saved to MongoDB
↓
🎨 Story view renders with mood gradient + Framer Motion animations
Tap the mic, speak your diary entry, tap stop. The app handles the rest — transcription, refinement, and saving happen automatically.
All your entries listed with mood-colored cards, date, title, summary, and tags at a glance.
Full-screen cinematic view of an entry. Mood-matched gradient background, animated text reveal, and a toggle to compare with the original raw transcript.
Pull requests are welcome. For major changes, please open an issue first.