Skip to content

rumsan/audio2video

Repository files navigation

audio2videoHttp

A FastAPI service that converts an audio file and a set of timed images into a video. Images switch at specified time markers, chapters are embedded, and conversion progress is streamed in real-time via Server-Sent Events (SSE).

A React single-page app (SPA) provides a multi-step UI for the full workflow — audio input, waveform-based image marker studio, live SSE progress tracking, and video preview/download — with additional pages for user documentation, API reference, and company information.

Requirements: Python 3.10+ and ffmpeg installed on PATH.


Setup

# 1. Clone / navigate to the project
cd audio2videoHttp

# 2. Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate        # macOS / Linux
# .venv\Scripts\activate         # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
# Edit .env — at minimum set API_KEY to a secure value

.env options

Variable Default Description
API_KEY changeme Secret key sent in X-API-Key header
HOST 0.0.0.0 Bind address
PORT 8000 Bind port
BASE_URL http://localhost:8000 Public base URL used in returned URLs
CACHE_DIR cache Directory for downloaded audio/images
OUTPUT_DIR output Directory for generated videos

Install ffmpeg (macOS)

brew install ffmpeg

Running the server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Interactive API docs available at http://localhost:8000/docs


Frontend

The React frontend is in the frontend/ directory. It provides a complete UI for the service:

Page Route Description
Home / Step 1 — choose audio by URL or file upload; set title & description
Studio /studio Step 2 — waveform preview, add timed image slides with chapter titles, configure output options
Progress /progress/:jobId Step 3 — live SSE progress bar with percentage and time processed
Preview /preview/:jobId Step 4 — in-browser video player, download button
Guide /guide User documentation — step-by-step workflow, supported formats, FAQ
API Docs /api-docs In-app API reference with all endpoints, fields, and curl examples
About /about About Rumsan, mission, values, and Audio2Video tool overview

Interactive Swagger UI (auto-generated by FastAPI) is also available at /docs on the backend.

Development (hot-reload)

Requires Node 18+ and the backend already running on port 8000.

cd frontend
npm install
cp .env.example .env   # default proxies to http://localhost:8000
npm run dev

Open http://localhost:5173 in your browser. The Vite dev server proxies API calls to the backend automatically.

Production build (served by the backend)

cd frontend
npm install
npm run build          # outputs to ../frontend-dist/

After building, the FastAPI server serves the SPA at / — no separate frontend process needed. Just start the backend as usual and open http://localhost:8000.

Docker (full stack)

# Production — builds frontend and starts API in one container
docker-compose up

# Development — API + Vite hot-reload dev server side by side
docker-compose --profile dev up

Authentication

All endpoints require the X-API-Key header matching the value set in .env.

X-API-Key: your-secret-key

API Reference

Local files and uploads

All URL fields (audio_url, image_url, images[].url) accept:

Scheme Example Notes
https:// / http:// https://cdn.example.com/audio.mp3 Downloaded and cached by SHA-256 of URL
file:// file:///Users/alice/audio.mp3 Read directly from the local filesystem — no caching
server-uploaded file:///app/cache/abc123.mp3 URL returned by POST /files/upload

Docker note: when using file:// paths inside a Docker container, the path must exist inside the container. Mount your local files as a volume, e.g. docker-compose run -v /local/media:/media audio2video.


POST /files/upload

Upload a local audio or image file to the server cache. The returned file_url can be used directly in any URL field of POST /convert.

curl -X POST http://localhost:8000/files/upload \
  -H "X-API-Key: your-secret-key" \
  -F "file=@/path/to/audio.mp3"

Response:

{
  "file_url": "file:///absolute/path/to/cache/abc123def456.mp3",
  "preview_url": "http://localhost:8000/cache/abc123def456.mp3",
  "filename": "audio.mp3"
}
  • file_url — use in audio_url or images[].url of a /convert request.
  • preview_url — browser-accessible HTTP URL for audio waveform preview in the Studio.

Use file_url directly in a subsequent /convert request:

curl -X POST http://localhost:8000/convert \
  -H "X-API-Key: your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "My Podcast",
    "audio_url": "file:///absolute/path/to/cache/abc123def456.mp3",
    "image_url": "file:///Users/alice/cover.jpg"
  }'

POST /convert

Submit an audio + images conversion job. Returns immediately with a job_id.

Request body:

{
  "title": "My Podcast Episode 1",          // required — used as video metadata title and output filename slug
  "description": "Episode summary...",       // optional — embedded as video comment/description
  "audio_url": "https://example.com/ep1.mp3", // required — publicly accessible audio URL

  // Provide ONE of: images (timed slides) OR image_url (static cover)
  "images": [
    {
      "url": "https://example.com/cover.jpg",
      "marker": "00:00:00",                  // HH:MM:SS or MM:SS — when this image appears
      "title": "Introduction"                // optional — used as chapter title; defaults to "Chapter N"
    },
    {
      "url": "https://example.com/slide2.jpg",
      "marker": "00:03:43",
      "title": "Main Topic"
    },
    {
      "url": "https://example.com/outro.jpg",
      "marker": "00:45:00"
    }
  ],

  // OR: a single static image for the full video duration
  "image_url": "https://example.com/cover.jpg",

  "options": {
    "format": "mp4",          // "mp4" (default) | "mkv" | "webm"
    "crf": 23,                // video quality 0–51; lower = better (default: 23)
    "video_bitrate": null,    // e.g. "2M" — overrides crf if set
    "audio_bitrate": "192k",  // default: "192k"
    "resolution": "1920x1080", // output canvas size (default: 1920x1080); images are padded to fit
    "normalize_audio": false  // apply loudnorm filter to even out audio levels
  }
}

Notes:

  • Either images or image_url is required, not both.
  • If images is provided, the first image (by marker order) is the cover image.
  • If image_url is provided, it is used as a static background for the full video.
  • Each image in images is displayed from its marker until the next marker.
  • Chapters are automatically embedded from image markers. title on each image becomes the chapter name.
  • The output filename is a URL slug of title, e.g. my-podcast-episode-1.mp4.

Response 202 Accepted:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "progress_url": "http://localhost:8000/jobs/550e8400.../progress",
  "status_url": "http://localhost:8000/jobs/550e8400.../status",
  "download_url": "http://localhost:8000/jobs/550e8400.../download"
}

GET /jobs/{job_id}/progress

Stream conversion progress as Server-Sent Events. Keep the connection open until a complete or error event is received.

SSE event types:

type Fields Description
progress percent, time_processed Incremental progress update
complete download_url Conversion finished
error message Conversion failed
ping Keep-alive (every 30 s of idle)

Example stream:

data: {"type":"progress","percent":12.5,"time_processed":"00:00:45"}
data: {"type":"progress","percent":54.1,"time_processed":"00:03:12"}
data: {"type":"complete","download_url":"http://localhost:8000/jobs/.../download"}

curl example:

curl -N \
  -H "X-API-Key: your-secret-key" \
  http://localhost:8000/jobs/JOB_ID/progress

GET /jobs/{job_id}/status

Non-streaming status snapshot. Useful if SSE is not convenient.

Response:

{
  "job_id": "550e8400-...",
  "status": "processing",       // pending | processing | complete | error
  "percent": 54.1,
  "time_processed": "00:03:12",
  "error": null,
  "download_url": null          // set when status is "complete"
}

GET /jobs/{job_id}/download

Download the generated video file. Only available when status is complete.

Returns the file as an attachment with the slug-based filename (e.g. my-podcast-episode-1.mp4).

curl example:

curl -L \
  -H "X-API-Key: your-secret-key" \
  http://localhost:8000/jobs/JOB_ID/download \
  -o my-podcast-episode-1.mp4

Full example

# 1. Submit job
JOB=$(curl -s -X POST http://localhost:8000/convert \
  -H "X-API-Key: your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "My Podcast Episode 1",
    "description": "A great episode about things.",
    "audio_url": "https://example.com/episode1.mp3",
    "images": [
      {"url": "https://example.com/cover.jpg",  "marker": "00:00:00", "title": "Intro"},
      {"url": "https://example.com/topic1.jpg", "marker": "00:02:30", "title": "Part 1"},
      {"url": "https://example.com/topic2.jpg", "marker": "00:18:00", "title": "Part 2"},
      {"url": "https://example.com/outro.jpg",  "marker": "00:45:00", "title": "Outro"}
    ],
    "options": {"resolution": "1920x1080", "crf": 23}
  }')
JOB_ID=$(echo $JOB | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")

# 2. Stream progress
curl -N -H "X-API-Key: your-secret-key" \
  http://localhost:8000/jobs/$JOB_ID/progress

# 3. Download
curl -L -H "X-API-Key: your-secret-key" \
  http://localhost:8000/jobs/$JOB_ID/download \
  -o episode1.mp4

Docker

# Build and start
docker-compose up

# With a custom API key
API_KEY=mysecret docker-compose up

The cache/ and output/ directories are mounted as volumes so files persist across container restarts.


Caching

Downloaded audio and images are cached in CACHE_DIR keyed by SHA-256 of the URL. Re-submitting a job with the same URLs will reuse cached downloads. The cache is not automatically evicted — delete files from cache/ and output/ manually when no longer needed.

About

A FastAPI service that converts an audio file and a set of timed images into a video.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors