-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/next migration #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -47,3 +47,48 @@ COPY --from=builder /app/dist ./dist | |
| COPY package.json ./ | ||
|
|
||
| CMD ["node", "dist/worker.js"] | ||
|
|
||
|
|
||
| # ── Stage 3c: AI Worker (FFmpeg + faster-whisper) ────────────────────────── | ||
| # Uses node:22-slim (Debian) instead of Alpine because PyAV (a faster-whisper | ||
| # dependency) has no pre-built musl/Alpine wheels and its Cython compilation | ||
| # fails on Python 3.12 + Alpine. Debian has pre-built wheels → no compilation | ||
| # needed, faster build, smaller attack surface. | ||
| # | ||
| # Available WHISPER_MODEL values (ascending size / accuracy): | ||
| # tiny (~75 MB) | base (~150 MB) | small (~500 MB) | medium | large | ||
| FROM node:22-slim AS ai-worker | ||
|
|
||
| # 1. System packages (Debian-based) | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| ffmpeg \ | ||
| python3 \ | ||
| python3-pip \ | ||
| python3-venv \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # 2. Isolated Python venv + faster-whisper (CTranslate2-based, no PyTorch) | ||
| # Pre-built PyAV wheels on Debian → no compilation required. | ||
| ENV VIRTUAL_ENV=/opt/whisper-venv | ||
| ENV PATH="$VIRTUAL_ENV/bin:$PATH" | ||
|
|
||
| RUN python3 -m venv "$VIRTUAL_ENV" && \ | ||
| pip install --no-cache-dir --upgrade pip && \ | ||
| pip install --no-cache-dir faster-whisper | ||
|
|
||
| # 3. Pre-download model weights at build time so startup is instant | ||
| # compute_type=int8 → efficient CPU inference, no accuracy loss for most tasks | ||
| ARG WHISPER_MODEL=base | ||
| ENV WHISPER_MODEL=${WHISPER_MODEL} | ||
| RUN python3 -c "\ | ||
| from faster_whisper import WhisperModel; \ | ||
| WhisperModel('${WHISPER_MODEL}', device='cpu', compute_type='int8')" | ||
|
|
||
| # 4. Node.js application | ||
| WORKDIR /app | ||
|
|
||
| COPY --from=prod-deps /app/node_modules ./node_modules | ||
| COPY --from=builder /app/dist ./dist | ||
| COPY package.json ./ | ||
|
|
||
| CMD ["node", "dist/ai-worker.js"] | ||
|
Comment on lines
+60
to
+94
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Run the AI worker as a non-root user and consider HEALTHCHECK. The image runs as root throughout (Checkov CKV_DOCKER_3). Combined with FFmpeg + a Python interpreter that processes user-supplied media, a non-root runtime user materially reduces blast radius. Note: if you add a A minimal hardening: 🔧 Proposed fix+ENV HF_HOME=/opt/hf-cache
+RUN mkdir -p /opt/hf-cache
RUN python3 -c "\
from faster_whisper import WhisperModel; \
WhisperModel('${WHISPER_MODEL}', device='cpu', compute_type='int8')"
...
+RUN groupadd --system app && useradd --system --gid app --home /app app \
+ && chown -R app:app /app /opt/hf-cache /opt/whisper-venv
+USER app
CMD ["node", "dist/ai-worker.js"]A 🧰 Tools🪛 Checkov (3.2.524)[low] 1-94: Ensure that HEALTHCHECK instructions have been added to container images (CKV_DOCKER_2) [low] 1-94: Ensure that a user for the container has been created (CKV_DOCKER_3) 🪛 Hadolint (2.14.0)[warning] 63-63: Pin versions in apt get install. Instead of (DL3008) [warning] 75-75: Pin versions in pip. Instead of (DL3013) 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| ALTER TABLE "videoTable" ADD COLUMN "transcript_key" text;--> statement-breakpoint | ||
| ALTER TABLE "videoTable" ADD CONSTRAINT "videoTable_user_id_userTable_id_fk" FOREIGN KEY ("user_id") REFERENCES "public"."userTable"("id") ON DELETE no action ON UPDATE no action; | ||
|
Comment on lines
+1
to
+2
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Confirm the FK Two things worth double-checking before this ships to production:
SELECT v.id, v.user_id
FROM "videoTable" v
LEFT JOIN "userTable" u ON u.id = v.user_id
WHERE u.id IS NULL;🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,232 @@ | ||
| { | ||
| "id": "5c118634-fc94-41c5-b74e-d0a850974c31", | ||
| "prevId": "586a540f-f245-48df-be82-03b662009de0", | ||
| "version": "7", | ||
| "dialect": "postgresql", | ||
| "tables": { | ||
| "public.userTable": { | ||
| "name": "userTable", | ||
| "schema": "", | ||
| "columns": { | ||
| "id": { | ||
| "name": "id", | ||
| "type": "text", | ||
| "primaryKey": true, | ||
| "notNull": true | ||
| }, | ||
| "email": { | ||
| "name": "email", | ||
| "type": "text", | ||
| "primaryKey": false, | ||
| "notNull": true | ||
| }, | ||
| "password": { | ||
| "name": "password", | ||
| "type": "text", | ||
| "primaryKey": false, | ||
| "notNull": true | ||
| }, | ||
| "created_at": { | ||
| "name": "created_at", | ||
| "type": "timestamp", | ||
| "primaryKey": false, | ||
| "notNull": false, | ||
| "default": "now()" | ||
| }, | ||
| "updated_at": { | ||
| "name": "updated_at", | ||
| "type": "timestamp", | ||
| "primaryKey": false, | ||
| "notNull": false, | ||
| "default": "now()" | ||
| } | ||
| }, | ||
| "indexes": {}, | ||
| "foreignKeys": {}, | ||
| "compositePrimaryKeys": {}, | ||
| "uniqueConstraints": {}, | ||
| "policies": {}, | ||
| "checkConstraints": {}, | ||
| "isRLSEnabled": false | ||
| }, | ||
| "public.videoTable": { | ||
| "name": "videoTable", | ||
| "schema": "", | ||
| "columns": { | ||
| "id": { | ||
| "name": "id", | ||
| "type": "text", | ||
| "primaryKey": true, | ||
| "notNull": true | ||
| }, | ||
| "user_id": { | ||
| "name": "user_id", | ||
| "type": "text", | ||
| "primaryKey": false, | ||
| "notNull": true | ||
| }, | ||
| "status": { | ||
| "name": "status", | ||
| "type": "status", | ||
| "typeSchema": "public", | ||
| "primaryKey": false, | ||
| "notNull": true, | ||
| "default": "'not-started'" | ||
| }, | ||
| "trancodeStatus": { | ||
| "name": "trancodeStatus", | ||
| "type": "transcode_status", | ||
| "typeSchema": "public", | ||
| "primaryKey": false, | ||
| "notNull": true, | ||
| "default": "'not-started'" | ||
| }, | ||
| "hlsStatus": { | ||
| "name": "hlsStatus", | ||
| "type": "hls_status", | ||
| "typeSchema": "public", | ||
| "primaryKey": false, | ||
| "notNull": true, | ||
| "default": "'not-started'" | ||
| }, | ||
| "thumbnailStatus": { | ||
| "name": "thumbnailStatus", | ||
| "type": "thumbnail_status", | ||
| "typeSchema": "public", | ||
| "primaryKey": false, | ||
| "notNull": true, | ||
| "default": "'not-started'" | ||
| }, | ||
| "transcriptStatus": { | ||
| "name": "transcriptStatus", | ||
| "type": "transcript_status", | ||
| "typeSchema": "public", | ||
| "primaryKey": false, | ||
| "notNull": true, | ||
| "default": "'not-started'" | ||
| }, | ||
| "transcript_key": { | ||
| "name": "transcript_key", | ||
| "type": "text", | ||
| "primaryKey": false, | ||
| "notNull": false | ||
| }, | ||
| "original_video_key": { | ||
| "name": "original_video_key", | ||
| "type": "text", | ||
| "primaryKey": false, | ||
| "notNull": false | ||
| }, | ||
| "hls_manifest_key": { | ||
| "name": "hls_manifest_key", | ||
| "type": "text", | ||
| "primaryKey": false, | ||
| "notNull": false | ||
| }, | ||
| "thumbnail_video_key": { | ||
| "name": "thumbnail_video_key", | ||
| "type": "text", | ||
| "primaryKey": false, | ||
| "notNull": false | ||
| }, | ||
| "created_at": { | ||
| "name": "created_at", | ||
| "type": "timestamp", | ||
| "primaryKey": false, | ||
| "notNull": false, | ||
| "default": "now()" | ||
| }, | ||
| "updated_at": { | ||
| "name": "updated_at", | ||
| "type": "timestamp", | ||
| "primaryKey": false, | ||
| "notNull": false, | ||
| "default": "now()" | ||
| } | ||
| }, | ||
| "indexes": {}, | ||
| "foreignKeys": { | ||
| "videoTable_user_id_userTable_id_fk": { | ||
| "name": "videoTable_user_id_userTable_id_fk", | ||
| "tableFrom": "videoTable", | ||
| "tableTo": "userTable", | ||
| "columnsFrom": [ | ||
| "user_id" | ||
| ], | ||
| "columnsTo": [ | ||
| "id" | ||
| ], | ||
| "onDelete": "no action", | ||
| "onUpdate": "no action" | ||
| } | ||
| }, | ||
| "compositePrimaryKeys": {}, | ||
| "uniqueConstraints": {}, | ||
| "policies": {}, | ||
| "checkConstraints": {}, | ||
| "isRLSEnabled": false | ||
| } | ||
| }, | ||
| "enums": { | ||
| "public.hls_status": { | ||
| "name": "hls_status", | ||
| "schema": "public", | ||
| "values": [ | ||
| "not-started", | ||
| "processing", | ||
| "completed", | ||
| "failed" | ||
| ] | ||
| }, | ||
| "public.status": { | ||
| "name": "status", | ||
| "schema": "public", | ||
| "values": [ | ||
| "not-started", | ||
| "processing", | ||
| "completed", | ||
| "failed" | ||
| ] | ||
| }, | ||
| "public.thumbnail_status": { | ||
| "name": "thumbnail_status", | ||
| "schema": "public", | ||
| "values": [ | ||
| "not-started", | ||
| "processing", | ||
| "completed", | ||
| "failed" | ||
| ] | ||
| }, | ||
| "public.transcode_status": { | ||
| "name": "transcode_status", | ||
| "schema": "public", | ||
| "values": [ | ||
| "not-started", | ||
| "processing", | ||
| "completed", | ||
| "failed" | ||
| ] | ||
| }, | ||
| "public.transcript_status": { | ||
| "name": "transcript_status", | ||
| "schema": "public", | ||
| "values": [ | ||
| "not-started", | ||
| "processing", | ||
| "completed", | ||
| "failed" | ||
| ] | ||
| } | ||
| }, | ||
| "schemas": {}, | ||
| "sequences": {}, | ||
| "roles": {}, | ||
| "policies": {}, | ||
| "views": {}, | ||
| "_meta": { | ||
| "columns": {}, | ||
| "schemas": {}, | ||
| "tables": {} | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| import 'dotenv/config.js' | ||
| import './utils/logger.js' | ||
| import './config/db.js' | ||
|
|
||
| // Only the transcription worker runs in this container. | ||
| // The regular worker (transcode/hls/thumbnail) runs in the worker image | ||
| // which doesn't have Python or faster-whisper installed. | ||
| import './workers/transcribe.worker.js' | ||
|
|
||
| console.log('🚀 AI Worker started and listening for transcription jobs...') | ||
|
Comment on lines
+1
to
+10
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial Mirrors The import order ( One optional hardening: BullMQ workers running in a dedicated container benefit from a top-level 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| import { pgEnum, pgTable, serial, text, timestamp } from "drizzle-orm/pg-core"; | ||
| import { userTable } from "./user.model.js"; | ||
|
|
||
| export const statusEnum = pgEnum('status', ['not-started', 'processing', 'completed', 'failed']); | ||
| export const trancodeStatusEnum = pgEnum('transcode_status', ['not-started', 'processing', 'completed', 'failed']); | ||
|
|
@@ -8,12 +9,13 @@ export const transciptStatusEnum = pgEnum('transcript_status', ['not-started', ' | |
|
|
||
| export const videoTable = pgTable('videoTable', { | ||
| id: text('id').primaryKey(), | ||
| userId: text('user_id').notNull(), | ||
| userId: text('user_id').notNull().references(() => userTable.id), | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # First, locate and read the video model file
find . -path "*server/src/models/video.model.ts" -type fRepository: vector17002/video-transcoding Length of output: 108 🏁 Script executed: # Locate the migration file mentioned
find . -path "*drizzle*" -name "*.sql" | head -20Repository: vector17002/video-transcoding Length of output: 159 🏁 Script executed: # Search for user deletion or deletion-related endpoints
rg -i "delete.*user|user.*delete" --type ts --type js -A 3 -B 1Repository: vector17002/video-transcoding Length of output: 55 🏁 Script executed: # Check schema for FK constraints to understand project conventions
rg "references\(|onDelete|onUpdate" -A 2 -B 1 --type tsRepository: vector17002/video-transcoding Length of output: 442 🏁 Script executed: # Read the video.model.ts file to see the full schema
cat -n server/src/models/video.model.tsRepository: vector17002/video-transcoding Length of output: 1737 🏁 Script executed: # Read both migration files to see the FK details
cat -n server/drizzle/0000_far_gideon.sqlRepository: vector17002/video-transcoding Length of output: 1658 🏁 Script executed: # Read the second migration file
cat -n server/drizzle/0001_glorious_shooting_star.sqlRepository: vector17002/video-transcoding Length of output: 349 🏁 Script executed: # Check if there are user-related endpoints (especially delete endpoints)
find . -path "*server/src*" -name "*.ts" -type f | xargs rg -l "user" | head -10Repository: vector17002/video-transcoding Length of output: 450 🏁 Script executed: # Look for any other FK references in the codebase to understand conventions
rg "\.references\(" server/src/models/ -A 1 -B 1Repository: vector17002/video-transcoding Length of output: 334 🏁 Script executed: # Check user.service.ts for user deletion endpoints
cat -n server/src/services/user.service.tsRepository: vector17002/video-transcoding Length of output: 819 🏁 Script executed: # Also check if there are any routes/controllers that handle user deletion
find . -path "*server/src*" -name "*.ts" -type f | xargs rg -l "delete|remove" | xargs rg -l "user" | head -5Repository: vector17002/video-transcoding Length of output: 108 🏁 Script executed: # Check the schema creation order - when videoTable was first created
rg -A 5 "CREATE TABLE.*videoTable" server/drizzle/Repository: vector17002/video-transcoding Length of output: 548 Consider The FK constraint is correctly added with However, when a user-deletion endpoint is implemented,
Document whichever behavior is chosen in the model. 🤖 Prompt for AI Agents |
||
| status: statusEnum().notNull().default('not-started'), | ||
| trancodeStatus: trancodeStatusEnum().notNull().default('not-started'), | ||
| hlsStatus: hlsStatusEnum().notNull().default('not-started'), | ||
| thumbnailStatus: thumbnailStatusEnum().notNull().default('not-started'), | ||
| transcriptStatus: transciptStatusEnum().notNull().default('not-started'), | ||
| transcriptKey: text('transcript_key'), | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial
The nullable text column matches the async write path in Minor note: the existing enum on line 8 is named 🤖 Prompt for AI Agents |
||
| originalVideoKey: text('original_video_key'), | ||
| hlsManifestKey: text('hls_manifest_key'), | ||
| thumbnailVideoKey: text('thumbnail_video_key'), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: vector17002/video-transcoding
Length of output: 583
🏁 Script executed:
Repository: vector17002/video-transcoding
Length of output: 2476
🏁 Script executed:
Repository: vector17002/video-transcoding
Length of output: 688
Ensure DATABASE_URL is populated in ./server/.env for production.
transcribe.worker.tsimportsdbfrom../config/db.jsand the db module requiresprocess.env.DATABASE_URL, exiting if not set. While./server/.envis loaded viaenv_file, theDATABASE_URLvariable must be present there. The requirement is already documented inserver/.env.example, so ensure operators include it in their production environment file.🤖 Prompt for AI Agents