The open-source ElevenLabs alternative.
AI-powered text-to-speech and voice cloning built with Next.js 16, React 19, and Chatterbox TTS.
Learn how to build this entire project from scratch in a free 12-hour video course on YouTube. The tutorial covers every feature - authentication, text-to-speech, voice cloning, billing, deployment, and more.
Each chapter has a matching branch so you can check out the code at any point in the tutorial:
| Branch | Chapter |
|---|---|
main |
Final project (all chapters combined) |
02-dashboard |
Dashboard layout and navigation |
03-text-to-speech-ui |
Text-to-speech UI |
04-backend-infrastructure |
Backend infrastructure (tRPC, R2, Prisma) |
05-voice-selection |
Voice selection and library |
06-tts-generation-audio-player |
TTS generation and audio player |
07-tts-history-polish |
TTS history and polish |
bonus-sentry-error-monitoring |
Bonus: Sentry error monitoring |
08-voice-management |
Voice management and cloning |
09-billing |
Billing and usage metering |
git checkout 04-backend-infrastructure # example: jump to Chapter 4- Text-to-Speech - Generate speech from text with adjustable creativity, variety, expression, and flow parameters
- Zero-Shot Voice Cloning - Upload or record a voice sample (10s minimum) and clone it instantly - no fine-tuning required
- 20 Built-in Voices - Pre-seeded system voices across 12 categories and 5 locales
- Waveform Audio Player - WaveSurfer.js visualization with seek, play/pause, and download
- Multi-Tenant - Team-based access via Clerk Organizations with full data isolation
- Usage-Based Billing - Pay-as-you-go character metering with configurable pricing via Polar products and meters
- Generation History - Browse and replay past generations with preserved voice metadata
- Fully Responsive - Mobile-first with bottom drawers, compact controls, and adaptive layouts
- Node.js 20.9 or later
- Prisma Postgres database
- Clerk account (with Organizations enabled)
- Cloudflare R2 bucket
- Modal account (for GPU-hosted TTS)
- Polar account (for billing)
git clone https://github.com/code-with-antonio/resonance.git
cd resonance
npm installcp .env.example .envFill in the blank values in .env. Sensible defaults (Clerk routes, Polar meter names, APP_URL, etc.) are pre-filled.
In your Polar dashboard, create two meters under Meters:
-
Voice Creation meter
- Filter: Name equals
voice_creation - Aggregation: Count
- Filter: Name equals
-
Text-to-Speech Characters meter
- Filter: Name equals
tts_generation - Aggregation: Sum over
characters
- Filter: Name equals
Then create a new product with Recurring subscription pricing. Under Price Type, add two metered prices:
-
Click Add metered price and select the Text-to-Speech Characters meter
- Set the Amount per unit (price per character, e.g.
$0.003) - Optionally set a Cap amount (e.g.
$100)
- Set the Amount per unit (price per character, e.g.
-
Click Add metered price again and select the Voice Creation meter
- Set the Amount per unit (price per voice generation, e.g.
$0.25) - Optionally set a Cap amount (e.g.
$100)
- Set the Amount per unit (price per voice generation, e.g.
With only metered prices, the subscription starts at $0/month and scales with usage. If you want a baseline subscription fee (e.g. $20/month), add a third price to the same product — select a fixed price instead of a metered price. This requires no code changes since fixed prices are handled entirely by Polar.
Ensure Allow multiple subscriptions is turned off under Settings > Billing (this is the Polar default).
Copy the product ID into POLAR_PRODUCT_ID. The meter filter names and aggregation property must match the POLAR_METER_* env variables.
npx prisma migrate deployThe included chatterbox_tts.py is adapted from Modal's official Chatterbox TTS example, modified to read voice reference audio directly from your R2 bucket instead of a Modal Volume.
Before deploying, update chatterbox_tts.py with your R2 credentials:
R2_BUCKET_NAME = "<your-r2-bucket-name-here>"
R2_ACCOUNT_ID = "<your-r2-account-id-here>"Then create the required secrets in your Modal dashboard:
| Secret Name | Keys | Description |
|---|---|---|
cloudflare-r2 |
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY |
R2 API credentials (used for bucket mount) |
chatterbox-api-key |
CHATTERBOX_API_KEY |
API key to protect the endpoint (use any strong random string) |
hf-token |
HF_TOKEN |
Hugging Face token (for downloading the Chatterbox model weights) |
Deploy to Modal:
modal deploy chatterbox_tts.pyThis deploys Chatterbox TTS to a serverless NVIDIA A10G GPU on Modal. The container mounts your R2 bucket read-only for direct access to voice reference audio. Use the resulting Modal URL as CHATTERBOX_API_URL in your .env.local.
Note: The first request after a period of inactivity may take longer due to cold starts as Modal provisions the GPU container.
Once deployed, generate the type-safe Chatterbox client from the OpenAPI spec:
npm run sync-apinpx prisma db seedSeeds 20 built-in voices to the database and R2. The system voice WAV files are included in the repository and originate from Modal's voice sample pack.
npm run devOpen http://localhost:3000.
Resonance is designed to be self-hosted. You'll need:
- A PostgreSQL database - Prisma Postgres (recommended), or any managed Postgres
- Cloudflare R2 - For audio storage (S3-compatible, generous free tier)
- Modal - For serverless GPU inference (pay-per-second billing)
- Clerk - For authentication and multi-tenancy
- Polar - For metered billing (use sandbox mode with card
4242 4242 4242 4242for testing)
Deploy the Next.js app to any Node.js host (Railway, Docker, etc.).
src/
├── app/ # Next.js App Router
│ ├── (dashboard)/ # Protected routes (home, TTS, voices)
│ ├── api/ # Audio proxy routes + tRPC handler
│ ├── sign-in/ # Clerk auth pages
│ └── sign-up/
├── components/ # Shared UI components (shadcn/ui + custom)
├── features/
│ ├── dashboard/ # Home page, quick actions
│ ├── text-to-speech/ # TTS form, audio player, settings, history
│ ├── voices/ # Voice library, creation, recording
│ └── billing/ # Usage display, checkout
├── hooks/ # App-wide hooks
├── lib/ # Core: db, r2, polar, env, chatterbox client
├── trpc/ # tRPC routers, client, server helpers
├── generated/ # Prisma client
└── types/ # Generated API types
| Command | Description |
|---|---|
npm run dev |
Start dev server |
npm run build |
Production build |
npm run start |
Start production server |
npm run lint |
Lint with ESLint |
npm run sync-api |
Regenerate Chatterbox API types from OpenAPI spec |
- Chatterbox TTS by Resemble AI - the open-source zero-shot voice cloning model powering speech generation
- Modal - serverless GPU deployment example and voice sample pack