Skip to content

fix: robust pdfjs-dist worker resolution + Docker deployment#2

Open
Chocksy wants to merge 1 commit intozmeyer44:mainfrom
Chocksy:pr/pdfjs-worker-docker
Open

fix: robust pdfjs-dist worker resolution + Docker deployment#2
Chocksy wants to merge 1 commit intozmeyer44:mainfrom
Chocksy:pr/pdfjs-worker-docker

Conversation

@Chocksy
Copy link
Copy Markdown

@Chocksy Chocksy commented Apr 7, 2026

Summary

  • Fix pdfjs-dist worker resolution using createRequire() instead of relative imports that break in bundled/Docker environments. This replaces the less robust fix in 3356286.
  • Add Docker deployment support: web app Dockerfile (multi-stage, standalone output), improved ingestion-worker Dockerfile, .dockerignore, and docker-compose.prod.yml for self-hosted deployment.

Changes

  • services/ingestion-worker/src/lib/pdf.ts — Use createRequire to resolve worker path reliably + explicit pdfjs options
  • apps/web/Dockerfile — Multi-stage Node 20 Alpine build with standalone output
  • services/ingestion-worker/Dockerfile — Improved build
  • .dockerignore — Standard exclusions
  • docker-compose.prod.yml — Production compose with web, worker, postgres, redis

Test plan

  • Verify PDF parsing works in Docker container (both digital and scanned PDFs)
  • Verify docker compose -f docker-compose.prod.yml up starts all services
  • Verify web app builds and serves correctly from standalone output

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 7, 2026

@Chocksy is attempting to deploy a commit to the Zach's Projects Team on Vercel.

A member of the Team first needs to authorize it.

@Chocksy Chocksy force-pushed the pr/pdfjs-worker-docker branch from 4828549 to 8a7e832 Compare April 7, 2026 12:28
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 7, 2026

Greptile Summary

This PR fixes PDF.js worker resolution in Docker/bundled environments using createRequire and adds a full Docker deployment stack (multi-stage web Dockerfile, improved worker Dockerfile, .dockerignore, docker-compose.prod.yml).

  • The web Dockerfile depends on Next.js standalone output (copies .next/standalone, runs server.js), but apps/web/next.config.ts does not set output: 'standalone' — the standalone directory will never be generated, breaking the container on startup.
  • web and worker in docker-compose.prod.yml depend only on postgres:service_healthy, not on the one-shot migrate container completing, creating a race condition where the app starts against an empty schema.

Confidence Score: 4/5

Not safe to merge as-is: the web container will fail to start due to missing standalone config, and a race condition exists between migrations and app startup.

Two P1 defects block the primary deployment path — the standalone output misconfiguration makes the web Dockerfile non-functional, and the missing migrate dependency causes unpredictable startup failures.

apps/web/next.config.ts (needs output: standalone) and docker-compose.prod.yml (web/worker depends_on migrate)

Important Files Changed

Filename Overview
apps/web/Dockerfile Multi-stage standalone build — broken because next.config.ts lacks output: 'standalone', making the standalone directory and server.js unavailable at runtime
docker-compose.prod.yml Production compose with postgres, web, worker, and one-shot migrate — web and worker miss a dependency on migrate completing, risking startup against an empty schema
services/ingestion-worker/Dockerfile Improved multi-stage build; copies pdf.worker.mjs into dist/ but that file is never referenced at runtime (dead copy)
services/ingestion-worker/src/lib/pdf.ts Robust worker path resolution via createRequire; additional pdfjs options (useWorkerFetch, isEvalSupported, useSystemFonts) are reasonable for a Node.js environment
.dockerignore Standard exclusions including node_modules, .env*, .next, .git — no issues

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[docker compose up] --> B[postgres]
    B -->|service_healthy| C[migrate]
    B -->|service_healthy| D[web]
    B -->|service_healthy| E[worker]
    C -->|one-shot: runs db:migrate| F[migrations complete]
    D -->|starts immediately after postgres| G[web app running]
    E -->|starts immediately after postgres| H[worker running]
    F -.->|race condition: may finish AFTER web/worker| G
    F -.->|race condition: may finish AFTER web/worker| H
    style F fill:#f9f,stroke:#f00
    style G fill:#faa,stroke:#f00
    style H fill:#faa,stroke:#f00
Loading

Reviews (1): Last reviewed commit: "fix: robust pdfjs-dist worker resolution..." | Re-trigger Greptile

Comment thread apps/web/Dockerfile
Comment on lines +47 to +58
COPY --from=base /app/apps/web/.next/standalone ./
COPY --from=base /app/apps/web/.next/static ./apps/web/.next/static
COPY --from=base /app/apps/web/public ./apps/web/public

USER nextjs

EXPOSE 3000

ENV PORT=3000
ENV HOSTNAME="0.0.0.0"

CMD ["node", "apps/web/server.js"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 output: 'standalone' not set in next.config.ts

The Dockerfile copies from .next/standalone and runs node apps/web/server.js, but apps/web/next.config.ts does not include output: 'standalone'. Without this option Next.js will not generate the standalone directory or the server.js entrypoint — the COPY --from=base /app/apps/web/.next/standalone ./ step copies nothing, and the container will crash on startup.

Add to apps/web/next.config.ts:

const nextConfig: NextConfig = {
  output: 'standalone',
  transpilePackages: [
    '@openvitals/common',
    // ...
  ],
};

Comment thread docker-compose.prod.yml
Comment on lines +39 to +41
depends_on:
postgres:
condition: service_healthy
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 web and worker can start before migrations complete

Both services depend only on postgres:service_healthy, not on the one-shot migrate container finishing. If either starts while migrations are still running the application will hit an empty schema and crash. Add migrate: condition: service_completed_successfully to both depends_on blocks.

Suggested change
depends_on:
postgres:
condition: service_healthy
depends_on:
postgres:
condition: service_healthy
migrate:
condition: service_completed_successfully

Comment thread services/ingestion-worker/Dockerfile Outdated
Comment on lines +28 to +29
RUN pnpm run build && \
cp $(find /app/node_modules -path "*/pdfjs-dist/legacy/build/pdf.worker.mjs" | head -1) dist/pdf.worker.mjs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Copied worker file is never used

The cp $(find ...) step copies pdf.worker.mjs into dist/, but the production stage does not include that file, and pdf.ts resolves the worker path at runtime via createRequire pointing to node_modules/pdfjs-dist/ — not dist/. This copy is a no-op in the final image and can be removed.

Comment thread docker-compose.prod.yml
- Fix pdfjs-dist worker resolution using createRequire() instead of
  relative imports, which break in bundled/Docker environments
- Add explicit worker options (useWorkerFetch, isEvalSupported,
  useSystemFonts) for reliable PDF parsing
- Add web app Dockerfile with multi-stage build (alpine, standalone output)
- Add ingestion-worker Dockerfile improvements
- Add .dockerignore and docker-compose.prod.yml for self-hosted deployment
@Chocksy Chocksy force-pushed the pr/pdfjs-worker-docker branch from 8a7e832 to 90d2646 Compare April 7, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant