Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,12 @@ services:
- ./server/.env
environment:
- IOREDIS_URL=redis://redis:6379
- DATABASE_URL=${DATABASE_URL}
depends_on:
redis:
condition: service_healthy
restart: unless-stopped

# ── Workers (transcode + HLS + thumbnail) ───────────────────────────
# ── Transcode Worker (FFmpeg — HLS + thumbnails) ─────────
worker:
build:
context: ./server
Expand All @@ -40,16 +39,37 @@ services:
- ./server/.env
environment:
- IOREDIS_URL=redis://redis:6379
- DATABASE_URL=${DATABASE_URL}
depends_on:
redis:
condition: service_healthy
restart: unless-stopped
# Workers may need more memory for FFmpeg transcoding
deploy:
resources:
limits:
memory: 2G

# ── AI Worker (FFmpeg + faster-whisper transcription) ────
ai-worker:
build:
context: ./server
dockerfile: Dockerfile
target: ai-worker
args:
# Change to tiny/small/medium/large as needed
WHISPER_MODEL: base
env_file:
- ./server/.env
environment:
- IOREDIS_URL=redis://redis:6379
depends_on:
redis:
condition: service_healthy
restart: unless-stopped
deploy:
resources:
limits:
# faster-whisper base model needs ~1 GB; bump for larger models
memory: 2G
Comment on lines +51 to +72
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify which env var(s) the DB config consumes, so the operator knows what
# must be present in ./server/.env for the new ai-worker.
fd -i 'db.ts' server/src/config -x cat {}
rg -nP -C2 'process\.env\.' server/src/config/db.ts || true

Repository: vector17002/video-transcoding

Length of output: 583


🏁 Script executed:

# Look for .env templates or examples
fd -i '\.env' server/ | head -20

# Check for any README or documentation about environment setup
fd -i 'readme|setup|env' server/ | head -20

# Verify transcribe.worker.ts exists and check what it imports from db
fd -i 'transcribe.worker.ts' server/src -x cat {}

Repository: vector17002/video-transcoding

Length of output: 2476


🏁 Script executed:

# Check the .env.example file to see if DATABASE_URL is documented
cat -n server/.env.example

# Also check redis config to see if it properly uses IOREDIS_URL
fd -i 'redis.ts' server/src/config -x cat {}

Repository: vector17002/video-transcoding

Length of output: 688


Ensure DATABASE_URL is populated in ./server/.env for production.

transcribe.worker.ts imports db from ../config/db.js and the db module requires process.env.DATABASE_URL, exiting if not set. While ./server/.env is loaded via env_file, the DATABASE_URL variable must be present there. The requirement is already documented in server/.env.example, so ensure operators include it in their production environment file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` around lines 51 - 72, The ai-worker service loads
./server/.env via the env_file but the DB module (imported by
transcribe.worker.ts from ../config/db.js) exits if process.env.DATABASE_URL is
missing; ensure the ./server/.env used in production contains a valid
DATABASE_URL entry (matching the format expected by ../config/db.js) so the db
initialization does not fail when the ai-worker starts.


volumes:
redis_data:
45 changes: 45 additions & 0 deletions server/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,48 @@ COPY --from=builder /app/dist ./dist
COPY package.json ./

CMD ["node", "dist/worker.js"]


# ── Stage 3c: AI Worker (FFmpeg + faster-whisper) ──────────────────────────
# Uses node:22-slim (Debian) instead of Alpine because PyAV (a faster-whisper
# dependency) has no pre-built musl/Alpine wheels and its Cython compilation
# fails on Python 3.12 + Alpine. Debian has pre-built wheels → no compilation
# needed, faster build, smaller attack surface.
#
# Available WHISPER_MODEL values (ascending size / accuracy):
# tiny (~75 MB) | base (~150 MB) | small (~500 MB) | medium | large
FROM node:22-slim AS ai-worker

# 1. System packages (Debian-based)
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
python3 \
python3-pip \
python3-venv \
&& rm -rf /var/lib/apt/lists/*

# 2. Isolated Python venv + faster-whisper (CTranslate2-based, no PyTorch)
# Pre-built PyAV wheels on Debian → no compilation required.
ENV VIRTUAL_ENV=/opt/whisper-venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

RUN python3 -m venv "$VIRTUAL_ENV" && \
pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir faster-whisper

# 3. Pre-download model weights at build time so startup is instant
# compute_type=int8 → efficient CPU inference, no accuracy loss for most tasks
ARG WHISPER_MODEL=base
ENV WHISPER_MODEL=${WHISPER_MODEL}
RUN python3 -c "\
from faster_whisper import WhisperModel; \
WhisperModel('${WHISPER_MODEL}', device='cpu', compute_type='int8')"

# 4. Node.js application
WORKDIR /app

COPY --from=prod-deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package.json ./

CMD ["node", "dist/ai-worker.js"]
Comment on lines +60 to +94
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Run the AI worker as a non-root user and consider HEALTHCHECK.

The image runs as root throughout (Checkov CKV_DOCKER_3). Combined with FFmpeg + a Python interpreter that processes user-supplied media, a non-root runtime user materially reduces blast radius. Note: if you add a USER directive later, the build-time model preload at line 83-85 caches under /root/.cache/huggingface/..., which the non-root runtime user won't be able to read — you'll need to either set HF_HOME/XDG_CACHE_HOME to a shared path or run the preload as the same user.

A minimal hardening:

🔧 Proposed fix
+ENV HF_HOME=/opt/hf-cache
+RUN mkdir -p /opt/hf-cache
 RUN python3 -c "\
 from faster_whisper import WhisperModel; \
 WhisperModel('${WHISPER_MODEL}', device='cpu', compute_type='int8')"
 ...
+RUN groupadd --system app && useradd --system --gid app --home /app app \
+    && chown -R app:app /app /opt/hf-cache /opt/whisper-venv
+USER app
 CMD ["node", "dist/ai-worker.js"]

A HEALTHCHECK is optional for a queue worker (Compose's depends_on covers Redis), so I'd skip CKV_DOCKER_2 unless you wire BullMQ readiness through an HTTP probe.

🧰 Tools
🪛 Checkov (3.2.524)

[low] 1-94: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[low] 1-94: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🪛 Hadolint (2.14.0)

[warning] 63-63: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)


[warning] 75-75: Pin versions in pip. Instead of pip install <package> use pip install <package>==<version> or pip install --requirement <requirements file>

(DL3013)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/Dockerfile` around lines 60 - 94, The Dockerfile runs the ai-worker
image as root and preloads Whisper model into root's cache, so add a non-root
runtime user (e.g., create and use a user like "app" via a USER directive) and
ensure the build-time model preload is accessible to that user by either setting
HF_HOME or XDG_CACHE_HOME to a shared path (e.g., /opt/hf-cache) before the RUN
python3 -c "... WhisperModel(...)" step or by performing the preload as the same
non-root user; update ENV VIRTUAL_ENV and PATH usage remains the same and ensure
the new cache dir is writable by the non-root user (chown/chmod during build).
Optionally add a HEALTHCHECK if you want runtime readiness probing, otherwise
skip it as noted.

2 changes: 2 additions & 0 deletions server/drizzle/0001_glorious_shooting_star.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
ALTER TABLE "videoTable" ADD COLUMN "transcript_key" text;--> statement-breakpoint
ALTER TABLE "videoTable" ADD CONSTRAINT "videoTable_user_id_userTable_id_fk" FOREIGN KEY ("user_id") REFERENCES "public"."userTable"("id") ON DELETE no action ON UPDATE no action;
Comment on lines +1 to +2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Confirm the FK ON DELETE NO ACTION behavior is intentional, and that the migration is safe to run on existing data.

Two things worth double-checking before this ships to production:

  1. ON DELETE NO ACTION means deleting a userTable row that still has rows in videoTable will hard-fail at the DB layer. If the product expectation is that deleting a user removes (or anonymizes) their videos, you probably want ON DELETE CASCADE or SET NULL here. If retention is intentional, ignore.
  2. Adding the FK constraint will fail the migration on any existing videoTable.user_id value that does not have a matching userTable.id (orphan rows). For a fresh dev DB this is fine; for prod, run a pre-flight check / cleanup before deploying:
SELECT v.id, v.user_id
FROM "videoTable" v
LEFT JOIN "userTable" u ON u.id = v.user_id
WHERE u.id IS NULL;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/drizzle/0001_glorious_shooting_star.sql` around lines 1 - 2, The
migration adds transcript_key to "videoTable" and a FK
"videoTable_user_id_userTable_id_fk" referencing "userTable" with ON DELETE NO
ACTION; confirm that NO ACTION is intentional—if deleting a user should remove
or null their videos change the constraint to ON DELETE CASCADE or ON DELETE SET
NULL on the FK in the migration (or adjust application behavior), and before
applying to prod run a pre-flight check for orphaned rows in "videoTable" (any
v.user_id without a matching userTable.id) and either delete or fix those rows
so adding the FK won't fail; update the migration SQL accordingly and document
the chosen strategy.

232 changes: 232 additions & 0 deletions server/drizzle/meta/0001_snapshot.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
{
"id": "5c118634-fc94-41c5-b74e-d0a850974c31",
"prevId": "586a540f-f245-48df-be82-03b662009de0",
"version": "7",
"dialect": "postgresql",
"tables": {
"public.userTable": {
"name": "userTable",
"schema": "",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true
},
"email": {
"name": "email",
"type": "text",
"primaryKey": false,
"notNull": true
},
"password": {
"name": "password",
"type": "text",
"primaryKey": false,
"notNull": true
},
"created_at": {
"name": "created_at",
"type": "timestamp",
"primaryKey": false,
"notNull": false,
"default": "now()"
},
"updated_at": {
"name": "updated_at",
"type": "timestamp",
"primaryKey": false,
"notNull": false,
"default": "now()"
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"policies": {},
"checkConstraints": {},
"isRLSEnabled": false
},
"public.videoTable": {
"name": "videoTable",
"schema": "",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true
},
"user_id": {
"name": "user_id",
"type": "text",
"primaryKey": false,
"notNull": true
},
"status": {
"name": "status",
"type": "status",
"typeSchema": "public",
"primaryKey": false,
"notNull": true,
"default": "'not-started'"
},
"trancodeStatus": {
"name": "trancodeStatus",
"type": "transcode_status",
"typeSchema": "public",
"primaryKey": false,
"notNull": true,
"default": "'not-started'"
},
"hlsStatus": {
"name": "hlsStatus",
"type": "hls_status",
"typeSchema": "public",
"primaryKey": false,
"notNull": true,
"default": "'not-started'"
},
"thumbnailStatus": {
"name": "thumbnailStatus",
"type": "thumbnail_status",
"typeSchema": "public",
"primaryKey": false,
"notNull": true,
"default": "'not-started'"
},
"transcriptStatus": {
"name": "transcriptStatus",
"type": "transcript_status",
"typeSchema": "public",
"primaryKey": false,
"notNull": true,
"default": "'not-started'"
},
"transcript_key": {
"name": "transcript_key",
"type": "text",
"primaryKey": false,
"notNull": false
},
"original_video_key": {
"name": "original_video_key",
"type": "text",
"primaryKey": false,
"notNull": false
},
"hls_manifest_key": {
"name": "hls_manifest_key",
"type": "text",
"primaryKey": false,
"notNull": false
},
"thumbnail_video_key": {
"name": "thumbnail_video_key",
"type": "text",
"primaryKey": false,
"notNull": false
},
"created_at": {
"name": "created_at",
"type": "timestamp",
"primaryKey": false,
"notNull": false,
"default": "now()"
},
"updated_at": {
"name": "updated_at",
"type": "timestamp",
"primaryKey": false,
"notNull": false,
"default": "now()"
}
},
"indexes": {},
"foreignKeys": {
"videoTable_user_id_userTable_id_fk": {
"name": "videoTable_user_id_userTable_id_fk",
"tableFrom": "videoTable",
"tableTo": "userTable",
"columnsFrom": [
"user_id"
],
"columnsTo": [
"id"
],
"onDelete": "no action",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"policies": {},
"checkConstraints": {},
"isRLSEnabled": false
}
},
"enums": {
"public.hls_status": {
"name": "hls_status",
"schema": "public",
"values": [
"not-started",
"processing",
"completed",
"failed"
]
},
"public.status": {
"name": "status",
"schema": "public",
"values": [
"not-started",
"processing",
"completed",
"failed"
]
},
"public.thumbnail_status": {
"name": "thumbnail_status",
"schema": "public",
"values": [
"not-started",
"processing",
"completed",
"failed"
]
},
"public.transcode_status": {
"name": "transcode_status",
"schema": "public",
"values": [
"not-started",
"processing",
"completed",
"failed"
]
},
"public.transcript_status": {
"name": "transcript_status",
"schema": "public",
"values": [
"not-started",
"processing",
"completed",
"failed"
]
}
},
"schemas": {},
"sequences": {},
"roles": {},
"policies": {},
"views": {},
"_meta": {
"columns": {},
"schemas": {},
"tables": {}
}
}
7 changes: 7 additions & 0 deletions server/drizzle/meta/_journal.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@
"when": 1776514578126,
"tag": "0000_far_gideon",
"breakpoints": true
},
{
"idx": 1,
"version": "7",
"when": 1776997181954,
"tag": "0001_glorious_shooting_star",
"breakpoints": true
}
]
}
10 changes: 10 additions & 0 deletions server/src/ai-worker.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import 'dotenv/config.js'
import './utils/logger.js'
import './config/db.js'

// Only the transcription worker runs in this container.
// The regular worker (transcode/hls/thumbnail) runs in the worker image
// which doesn't have Python or faster-whisper installed.
import './workers/transcribe.worker.js'

console.log('🚀 AI Worker started and listening for transcription jobs...')
Comment on lines +1 to +10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Mirrors worker.ts initialization order — LGTM.

The import order (dotenvloggerdb → worker) matches server/src/worker.ts and the dependency chain documented in transcribe.worker.ts.

One optional hardening: BullMQ workers running in a dedicated container benefit from a top-level process.on('unhandledRejection' | 'uncaughtException', ...) so a stray async error doesn't silently kill the container without producing a useful log line. Worth adding here (and in worker.ts) if not already centralized.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/src/ai-worker.ts` around lines 1 - 10, Add top-level handlers for
unhandled promise rejections and uncaught exceptions in ai-worker.ts by
registering process.on('unhandledRejection', ...) and
process.on('uncaughtException', ...) that log the error and exit cleanly; mirror
the same handlers in server/src/worker.ts so both worker containers produce
useful logs for stray async errors and terminate in a controlled way. Ensure the
handlers use the existing logger initialized by importing './utils/logger.js'
and include the error object/details in the log before calling process.exit(1).

4 changes: 3 additions & 1 deletion server/src/models/video.model.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { pgEnum, pgTable, serial, text, timestamp } from "drizzle-orm/pg-core";
import { userTable } from "./user.model.js";

export const statusEnum = pgEnum('status', ['not-started', 'processing', 'completed', 'failed']);
export const trancodeStatusEnum = pgEnum('transcode_status', ['not-started', 'processing', 'completed', 'failed']);
Expand All @@ -8,12 +9,13 @@ export const transciptStatusEnum = pgEnum('transcript_status', ['not-started', '

export const videoTable = pgTable('videoTable', {
id: text('id').primaryKey(),
userId: text('user_id').notNull(),
userId: text('user_id').notNull().references(() => userTable.id),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the video model file
find . -path "*server/src/models/video.model.ts" -type f

Repository: vector17002/video-transcoding

Length of output: 108


🏁 Script executed:

# Locate the migration file mentioned
find . -path "*drizzle*" -name "*.sql" | head -20

Repository: vector17002/video-transcoding

Length of output: 159


🏁 Script executed:

# Search for user deletion or deletion-related endpoints
rg -i "delete.*user|user.*delete" --type ts --type js -A 3 -B 1

Repository: vector17002/video-transcoding

Length of output: 55


🏁 Script executed:

# Check schema for FK constraints to understand project conventions
rg "references\(|onDelete|onUpdate" -A 2 -B 1 --type ts

Repository: vector17002/video-transcoding

Length of output: 442


🏁 Script executed:

# Read the video.model.ts file to see the full schema
cat -n server/src/models/video.model.ts

Repository: vector17002/video-transcoding

Length of output: 1737


🏁 Script executed:

# Read both migration files to see the FK details
cat -n server/drizzle/0000_far_gideon.sql

Repository: vector17002/video-transcoding

Length of output: 1658


🏁 Script executed:

# Read the second migration file
cat -n server/drizzle/0001_glorious_shooting_star.sql

Repository: vector17002/video-transcoding

Length of output: 349


🏁 Script executed:

# Check if there are user-related endpoints (especially delete endpoints)
find . -path "*server/src*" -name "*.ts" -type f | xargs rg -l "user" | head -10

Repository: vector17002/video-transcoding

Length of output: 450


🏁 Script executed:

# Look for any other FK references in the codebase to understand conventions
rg "\.references\(" server/src/models/ -A 1 -B 1

Repository: vector17002/video-transcoding

Length of output: 334


🏁 Script executed:

# Check user.service.ts for user deletion endpoints
cat -n server/src/services/user.service.ts

Repository: vector17002/video-transcoding

Length of output: 819


🏁 Script executed:

# Also check if there are any routes/controllers that handle user deletion
find . -path "*server/src*" -name "*.ts" -type f | xargs rg -l "delete|remove" | xargs rg -l "user" | head -5

Repository: vector17002/video-transcoding

Length of output: 108


🏁 Script executed:

# Check the schema creation order - when videoTable was first created
rg -A 5 "CREATE TABLE.*videoTable" server/drizzle/

Repository: vector17002/video-transcoding

Length of output: 548


Consider onDelete semantics when implementing user deletion.

The FK constraint is correctly added with ON DELETE NO ACTION (per 0001_glorious_shooting_star.sql). For a new project with an empty database, the migration poses no safety concerns.

However, when a user-deletion endpoint is implemented, NO ACTION will prevent deleting users who own videos. Either:

  • Accept this (preserve user history and associated video records), or
  • Switch to onDelete: 'cascade' (delete associated videos) or 'set null' (orphan video records).

Document whichever behavior is chosen in the model.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/src/models/video.model.ts` at line 12, The foreign-key on userId in
the Video model currently uses the default NO ACTION delete semantics which will
block deleting users who own videos; update the constraint on userId (reference
via userTable.id) to the desired behavior (e.g., onDelete: 'cascade' to remove
videos when a user is deleted, or onDelete: 'set null' to orphan them) and add a
brief JSDoc/comment in the Video model explaining the chosen policy so future
readers and the planned user-deletion endpoint know whether videos are cascaded,
nulled, or preserved; ensure the change is applied where userId:
text('user_id').notNull().references(() => userTable.id) is declared.

status: statusEnum().notNull().default('not-started'),
trancodeStatus: trancodeStatusEnum().notNull().default('not-started'),
hlsStatus: hlsStatusEnum().notNull().default('not-started'),
thumbnailStatus: thumbnailStatusEnum().notNull().default('not-started'),
transcriptStatus: transciptStatusEnum().notNull().default('not-started'),
transcriptKey: text('transcript_key'),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

transcriptKey is correctly nullable; consider also adding transcriptKey to the cleanup/lifecycle.

The nullable text column matches the async write path in transcribe.worker.ts (set to the S3 key after upload). No issue with the schema itself.

Minor note: the existing enum on line 8 is named transciptStatusEnum (missing the second r) and stored as transcript_status in the DB — the TS identifier typo is pre-existing but would be a good cleanup while you're touching this schema.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/src/models/video.model.ts` at line 18, The schema is fine but you
should (1) add transcriptKey (the nullable column defined as transcriptKey:
text('transcript_key')) into any cleanup/lifecycle paths so uploaded S3 keys are
removed when videos are deleted or expired—update the cleanup routine that
handles video rows to also delete the S3 object referenced by transcriptKey and
null/clear the field in the DB; and (2) fix the TypeScript enum identifier typo
transciptStatusEnum → transcriptStatusEnum (leave the DB enum name
transcript_status unchanged) so references to the enum (e.g., in the model and
any consumers) use the corrected TS identifier.

originalVideoKey: text('original_video_key'),
hlsManifestKey: text('hls_manifest_key'),
thumbnailVideoKey: text('thumbnail_video_key'),
Expand Down
2 changes: 1 addition & 1 deletion server/src/services/transcode.service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ const __dirname = path.dirname(__filename);

export const getPreSignedUrlForDownload = async (fileId: string, userId: string) => {
const currentEnv = process.env.NODE_ENV === 'development' ? 'dev' : 'prod';
const videoObjectId = `${currentEnv}/users/${userId}/original/${fileId}`;
const videoObjectId = `${currentEnv}/users/${userId}/${fileId}/original`;

const videoDownloadSignedUrl = await getDownloadUrl(videoObjectId)

Expand Down
Loading