A FastAPI service that converts an audio file and a set of timed images into a video. Images switch at specified time markers, chapters are embedded, and conversion progress is streamed in real-time via Server-Sent Events (SSE).
A React single-page app (SPA) provides a multi-step UI for the full workflow — audio input, waveform-based image marker studio, live SSE progress tracking, and video preview/download — with additional pages for user documentation, API reference, and company information.
Requirements: Python 3.10+ and ffmpeg installed on PATH.
# 1. Clone / navigate to the project
cd audio2videoHttp
# 2. Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env — at minimum set API_KEY to a secure value| Variable | Default | Description |
|---|---|---|
API_KEY |
changeme |
Secret key sent in X-API-Key header |
HOST |
0.0.0.0 |
Bind address |
PORT |
8000 |
Bind port |
BASE_URL |
http://localhost:8000 |
Public base URL used in returned URLs |
CACHE_DIR |
cache |
Directory for downloaded audio/images |
OUTPUT_DIR |
output |
Directory for generated videos |
brew install ffmpeguvicorn app.main:app --reload --host 0.0.0.0 --port 8000Interactive API docs available at http://localhost:8000/docs
The React frontend is in the frontend/ directory. It provides a complete UI for the service:
| Page | Route | Description |
|---|---|---|
| Home | / |
Step 1 — choose audio by URL or file upload; set title & description |
| Studio | /studio |
Step 2 — waveform preview, add timed image slides with chapter titles, configure output options |
| Progress | /progress/:jobId |
Step 3 — live SSE progress bar with percentage and time processed |
| Preview | /preview/:jobId |
Step 4 — in-browser video player, download button |
| Guide | /guide |
User documentation — step-by-step workflow, supported formats, FAQ |
| API Docs | /api-docs |
In-app API reference with all endpoints, fields, and curl examples |
| About | /about |
About Rumsan, mission, values, and Audio2Video tool overview |
Interactive Swagger UI (auto-generated by FastAPI) is also available at /docs on the backend.
Requires Node 18+ and the backend already running on port 8000.
cd frontend
npm install
cp .env.example .env # default proxies to http://localhost:8000
npm run devOpen http://localhost:5173 in your browser. The Vite dev server proxies API calls to the backend automatically.
cd frontend
npm install
npm run build # outputs to ../frontend-dist/After building, the FastAPI server serves the SPA at / — no separate frontend process needed. Just start the backend as usual and open http://localhost:8000.
# Production — builds frontend and starts API in one container
docker-compose up
# Development — API + Vite hot-reload dev server side by side
docker-compose --profile dev upAll endpoints require the X-API-Key header matching the value set in .env.
X-API-Key: your-secret-key
All URL fields (audio_url, image_url, images[].url) accept:
| Scheme | Example | Notes |
|---|---|---|
https:// / http:// |
https://cdn.example.com/audio.mp3 |
Downloaded and cached by SHA-256 of URL |
file:// |
file:///Users/alice/audio.mp3 |
Read directly from the local filesystem — no caching |
| server-uploaded | file:///app/cache/abc123.mp3 |
URL returned by POST /files/upload |
Docker note: when using
file://paths inside a Docker container, the path must exist inside the container. Mount your local files as a volume, e.g.docker-compose run -v /local/media:/media audio2video.
Upload a local audio or image file to the server cache. The returned file_url can be used directly in any URL field of POST /convert.
curl -X POST http://localhost:8000/files/upload \
-H "X-API-Key: your-secret-key" \
-F "file=@/path/to/audio.mp3"Response:
{
"file_url": "file:///absolute/path/to/cache/abc123def456.mp3",
"preview_url": "http://localhost:8000/cache/abc123def456.mp3",
"filename": "audio.mp3"
}file_url— use inaudio_urlorimages[].urlof a/convertrequest.preview_url— browser-accessible HTTP URL for audio waveform preview in the Studio.
Use file_url directly in a subsequent /convert request:
curl -X POST http://localhost:8000/convert \
-H "X-API-Key: your-secret-key" \
-H "Content-Type: application/json" \
-d '{
"title": "My Podcast",
"audio_url": "file:///absolute/path/to/cache/abc123def456.mp3",
"image_url": "file:///Users/alice/cover.jpg"
}'Submit an audio + images conversion job. Returns immediately with a job_id.
Request body:
Notes:
- Either
imagesorimage_urlis required, not both. - If
imagesis provided, the first image (by marker order) is the cover image. - If
image_urlis provided, it is used as a static background for the full video. - Each image in
imagesis displayed from its marker until the next marker. - Chapters are automatically embedded from image markers.
titleon each image becomes the chapter name. - The output filename is a URL slug of
title, e.g.my-podcast-episode-1.mp4.
Response 202 Accepted:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"progress_url": "http://localhost:8000/jobs/550e8400.../progress",
"status_url": "http://localhost:8000/jobs/550e8400.../status",
"download_url": "http://localhost:8000/jobs/550e8400.../download"
}Stream conversion progress as Server-Sent Events. Keep the connection open until a complete or error event is received.
SSE event types:
type |
Fields | Description |
|---|---|---|
progress |
percent, time_processed |
Incremental progress update |
complete |
download_url |
Conversion finished |
error |
message |
Conversion failed |
ping |
— | Keep-alive (every 30 s of idle) |
Example stream:
data: {"type":"progress","percent":12.5,"time_processed":"00:00:45"}
data: {"type":"progress","percent":54.1,"time_processed":"00:03:12"}
data: {"type":"complete","download_url":"http://localhost:8000/jobs/.../download"}
curl example:
curl -N \
-H "X-API-Key: your-secret-key" \
http://localhost:8000/jobs/JOB_ID/progressNon-streaming status snapshot. Useful if SSE is not convenient.
Response:
{
"job_id": "550e8400-...",
"status": "processing", // pending | processing | complete | error
"percent": 54.1,
"time_processed": "00:03:12",
"error": null,
"download_url": null // set when status is "complete"
}Download the generated video file. Only available when status is complete.
Returns the file as an attachment with the slug-based filename (e.g. my-podcast-episode-1.mp4).
curl example:
curl -L \
-H "X-API-Key: your-secret-key" \
http://localhost:8000/jobs/JOB_ID/download \
-o my-podcast-episode-1.mp4# 1. Submit job
JOB=$(curl -s -X POST http://localhost:8000/convert \
-H "X-API-Key: your-secret-key" \
-H "Content-Type: application/json" \
-d '{
"title": "My Podcast Episode 1",
"description": "A great episode about things.",
"audio_url": "https://example.com/episode1.mp3",
"images": [
{"url": "https://example.com/cover.jpg", "marker": "00:00:00", "title": "Intro"},
{"url": "https://example.com/topic1.jpg", "marker": "00:02:30", "title": "Part 1"},
{"url": "https://example.com/topic2.jpg", "marker": "00:18:00", "title": "Part 2"},
{"url": "https://example.com/outro.jpg", "marker": "00:45:00", "title": "Outro"}
],
"options": {"resolution": "1920x1080", "crf": 23}
}')
JOB_ID=$(echo $JOB | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")
# 2. Stream progress
curl -N -H "X-API-Key: your-secret-key" \
http://localhost:8000/jobs/$JOB_ID/progress
# 3. Download
curl -L -H "X-API-Key: your-secret-key" \
http://localhost:8000/jobs/$JOB_ID/download \
-o episode1.mp4# Build and start
docker-compose up
# With a custom API key
API_KEY=mysecret docker-compose upThe cache/ and output/ directories are mounted as volumes so files persist across container restarts.
Downloaded audio and images are cached in CACHE_DIR keyed by SHA-256 of the URL. Re-submitting a job with the same URLs will reuse cached downloads. The cache is not automatically evicted — delete files from cache/ and output/ manually when no longer needed.
{ "title": "My Podcast Episode 1", // required — used as video metadata title and output filename slug "description": "Episode summary...", // optional — embedded as video comment/description "audio_url": "https://example.com/ep1.mp3", // required — publicly accessible audio URL // Provide ONE of: images (timed slides) OR image_url (static cover) "images": [ { "url": "https://example.com/cover.jpg", "marker": "00:00:00", // HH:MM:SS or MM:SS — when this image appears "title": "Introduction" // optional — used as chapter title; defaults to "Chapter N" }, { "url": "https://example.com/slide2.jpg", "marker": "00:03:43", "title": "Main Topic" }, { "url": "https://example.com/outro.jpg", "marker": "00:45:00" } ], // OR: a single static image for the full video duration "image_url": "https://example.com/cover.jpg", "options": { "format": "mp4", // "mp4" (default) | "mkv" | "webm" "crf": 23, // video quality 0–51; lower = better (default: 23) "video_bitrate": null, // e.g. "2M" — overrides crf if set "audio_bitrate": "192k", // default: "192k" "resolution": "1920x1080", // output canvas size (default: 1920x1080); images are padded to fit "normalize_audio": false // apply loudnorm filter to even out audio levels } }