feat(HNT-2097): scaffold monorepo with deployable healthcheck services#1
feat(HNT-2097): scaffold monorepo with deployable healthcheck services#1
Conversation
Initialize hnt-content as a pnpm/Turborepo monorepo with: - Root config: Node 24, pnpm 10, ESM, strict TypeScript - packages/crawl-common: shared library with tsup (CJS+ESM) - packages/eslint-config-custom: ESLint + Prettier config - services/crawl-agent: scheduler with /healthz (stale tick detection) - services/crawl-worker: worker with /healthz - Multi-stage Dockerfile using turbo prune (SCOPE build arg) - Jest test configs for all packages Both services build, containerize, and respond to healthchecks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review fixes (from parallel Plan agent + Codex reviews): - Rewrite ESLint config for flat config (ESLint 10), remove eslint-plugin-prettier - Fix Jest for ESM: .js configs, --experimental-vm-modules, --passWithNoTests - Fix crawl-common exports map (types-first ordering, correct CJS/ESM paths) - Add turbo lint dependsOn ^build to prevent CI race conditions - Add root globalDependencies for tsconfig/eslint/prettier cache invalidation - Clean dist before tsc build to prevent stale spec files in output - Exclude test files from tsc build via tsconfig exclude - Add SIGTERM/SIGINT handlers to both services for graceful K8s shutdown - Restructure eslint-config-custom with proper peerDependencies - Remove redundant preinstall script (corepack handles enforcement) - Simplify corepack setup in Dockerfile - Expand .dockerignore with security patterns Dockerfile modernization (based on ChatGPT/Gemini research + Codex testing): - Use native pnpm 10 deploy (not --legacy) with injectWorkspacePackages - Add BuildKit cache mounts for pnpm store on install and deploy - Copy only pnpm deploy output to runner stage (not full workspace) - Add BuildKit syntax directive Other: - Refactor services into app.ts + main.ts for testability - Add healthcheck tests (supertest): worker 200, agent 200/500 - Add README with quick start, structure, commands, and architecture - Add root prettier format/format:check scripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jest required --experimental-vm-modules for ESM support, which relies on unstable Node VM APIs with no stabilization timeline. Vitest runs ESM natively with a Jest-compatible API, so existing tests needed zero changes. - Replace jest + ts-jest with vitest in all packages - Remove NODE_OPTIONS hack from test scripts - Add vitest.config.ts per package (globals mode) - Add vitest/globals types to root tsconfig - Add onlyBuiltDependencies for esbuild in pnpm-workspace.yaml - Add CLAUDE.md with test runner and debugging instructions - Net removal of 244 packages from the dependency tree Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove eslint-config-custom workspace package — flat config makes it unnecessary. ESLint config is now directly in the root eslint.config.js. Reorganize README by intent: development, architecture, deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add config.ts per service following content-monorepo pattern — plain objects with env var overrides and documented defaults. Replace magic numbers in healthcheck threshold, tick interval, and port. Compensate for tick duration so the scheduler maintains a consistent interval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5efb40b to
7b591c8
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7b591c8 to
62bb5e0
Compare
Build, test, and lint run as parallel jobs on every PR. Claude Code reviews PRs with restricted tools and comment-actor allowlisting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Not available in the Mozilla GitHub org. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Scaffolds the new hnt-content monorepo with Turborepo + pnpm workspaces, introduces a shared crawl-common package, and adds two deployable Node/Express services (crawl-agent, crawl-worker) that expose /healthz endpoints (with tick-staleness detection in crawl-agent).
Changes:
- Add Turborepo + pnpm workspace + TypeScript/Vitest + ESLint/Prettier scaffolding for a Node 24 ESM monorepo.
- Introduce
crawl-agentandcrawl-workerExpress services with healthcheck endpoints and basic test coverage. - Add Dockerfile + GitHub Actions PR workflow for build/test/lint and docs/templates for repo conventions.
Reviewed changes
Copilot reviewed 32 out of 35 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| turbo.json | Defines Turbo tasks for build/test/lint/dev across the monorepo. |
| tsconfig.json | Establishes base TypeScript (ESM/NodeNext) compiler settings for the workspace. |
| package.json | Root workspace scripts (turbo run), tooling deps, Node engine, pnpm pin. |
| eslint.config.js | Flat-config ESLint setup using typescript-eslint + Prettier config. |
| pnpm-workspace.yaml | Declares workspace package globs and pnpm workspace settings. |
| pnpm-lock.yaml | Locks all dependency versions for the new workspace. |
| Dockerfile | Multi-stage Docker build using turbo prune + pnpm deploy for service images. |
| README.md | Documents dev workflow, architecture, and Docker build instructions. |
| CODE_OF_CONDUCT.md | Adds Mozilla Community Participation Guidelines link. |
| CLAUDE.md | Documents testing/debugging commands and git conventions. |
| .prettierrc | Prettier formatting rules for the repo. |
| .nvmrc | Pins local dev Node major version (24). |
| .gitignore | Ignores node_modules, dist, turbo cache, env files, etc. |
| .dockerignore | Reduces Docker build context size and avoids copying secrets/dev files. |
| .github/workflows/pull-request.yml | CI jobs for build/test/lint on PRs. |
| .github/PULL_REQUEST_TEMPLATE.md | Standardizes PR description format. |
| packages/crawl-common/package.json | Defines crawl-common package build/test/lint and ESM exports. |
| packages/crawl-common/tsconfig.json | Package-level TS config (outDir/rootDir) extending root. |
| packages/crawl-common/tsup.config.ts | TSUP config to build ESM + d.ts for the shared package. |
| packages/crawl-common/vitest.config.ts | Vitest config for the shared package. |
| packages/crawl-common/src/index.ts | Initial placeholder entrypoint for shared code. |
| services/crawl-agent/package.json | Service package definition and scripts for build/dev/test/lint. |
| services/crawl-agent/tsconfig.json | Service TS config (dist output) extending root. |
| services/crawl-agent/vitest.config.ts | Vitest config for crawl-agent tests. |
| services/crawl-agent/src/app.ts | Express app + /healthz endpoint with tick staleness logic. |
| services/crawl-agent/src/config.ts | Service configuration values (port/tick intervals). |
| services/crawl-agent/src/main.ts | HTTP server startup + placeholder tick loop + shutdown handling. |
| services/crawl-agent/src/app.spec.ts | Tests for /healthz success and stale tick failure behavior. |
| services/crawl-worker/package.json | Service package definition and scripts for build/dev/test/lint. |
| services/crawl-worker/tsconfig.json | Service TS config (dist output) extending root. |
| services/crawl-worker/vitest.config.ts | Vitest config for crawl-worker tests. |
| services/crawl-worker/src/app.ts | Express app + /healthz endpoint. |
| services/crawl-worker/src/config.ts | Service configuration values (port). |
| services/crawl-worker/src/main.ts | HTTP server startup + SIGTERM shutdown handling. |
| services/crawl-worker/src/app.spec.ts | Tests for /healthz endpoint. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export default { | ||
| port: parseInt(process.env.PORT ?? '8080', 10), | ||
| tickIntervalMs: 60_000, // ms — how often the scheduler ticks | ||
| staleTickThresholdMinutes: 10, // minutes — healthcheck fails if no tick within this window | ||
| }; |
There was a problem hiding this comment.
staleTickThresholdMinutes (and tickIntervalMs) are hard-coded, but the PR description says the healthcheck threshold is configurable. Consider reading these values from env (with validation and sensible defaults) so the threshold can be tuned per environment without a rebuild/redeploy.
services/crawl-agent/src/main.ts
Outdated
| let sleepTimer: ReturnType<typeof setTimeout> | undefined; | ||
|
|
||
| function shutdown() { | ||
| console.log('Shutting down'); | ||
| stopRunning(); | ||
| if (sleepTimer) clearTimeout(sleepTimer); | ||
| server.close(); | ||
| } |
There was a problem hiding this comment.
shutdown() clears the active sleep timeout, but sleep() never resolves/rejects when that happens. If the agent is in the await sleep(...) call during shutdown, the run() loop will remain stuck and never reach the while (isRunning()) check again (which becomes important if you later add cleanup logic after the loop). Make the sleep cancellable (e.g., resolve immediately on shutdown, or use an AbortSignal) so the loop can exit deterministically.
| export default { | ||
| port: parseInt(process.env.PORT ?? '8080', 10), |
There was a problem hiding this comment.
parseInt(process.env.PORT ?? '8080', 10) can yield NaN for invalid input (e.g. PORT=abc), which will later cause app.listen to throw with a less actionable error. Validate the parsed value (finite integer within 1–65535) and throw a clear startup error if it’s invalid.
| export default { | |
| port: parseInt(process.env.PORT ?? '8080', 10), | |
| const rawPort = process.env.PORT ?? '8080'; | |
| const port = Number.parseInt(rawPort, 10); | |
| if ( | |
| !Number.isFinite(port) || | |
| !Number.isInteger(port) || | |
| port < 1 || | |
| port > 65535 | |
| ) { | |
| throw new Error( | |
| `Invalid PORT environment variable "${rawPort}". ` + | |
| 'Expected an integer between 1 and 65535.' | |
| ); | |
| } | |
| export default { | |
| port, |
| export default { | ||
| port: parseInt(process.env.PORT ?? '8080', 10), |
There was a problem hiding this comment.
parseInt(process.env.PORT ?? '8080', 10) can yield NaN for invalid input (e.g. PORT=abc), which will later cause app.listen to throw with a less actionable error. Validate the parsed value (finite integer within 1–65535) and throw a clear startup error if it’s invalid.
| export default { | |
| port: parseInt(process.env.PORT ?? '8080', 10), | |
| const rawPort = process.env.PORT ?? '8080'; | |
| const parsedPort = Number.parseInt(rawPort, 10); | |
| if (!Number.isFinite(parsedPort) || parsedPort < 1 || parsedPort > 65535) { | |
| throw new Error( | |
| `Invalid PORT environment variable "${rawPort}". Expected an integer between 1 and 65535.`, | |
| ); | |
| } | |
| export default { | |
| port: parsedPort, |
There was a problem hiding this comment.
I don't think this is worth the complexity.
Dockerfile
Outdated
| ARG PORT=8080 | ||
|
|
||
| RUN apk add --no-cache curl libc6-compat | ||
| RUN corepack enable |
There was a problem hiding this comment.
The Docker image build enables Corepack but never pins/activates the repo’s declared pnpm version (package.json has packageManager: pnpm@10.28.0). This can make Docker builds non-reproducible (Node’s bundled pnpm/Corepack version may differ) and may break pnpm deploy semantics across Node image updates. Activate the pinned pnpm version explicitly (e.g., corepack prepare pnpm@10.28.0 --activate, or copy package.json early and run corepack install).
| RUN corepack enable | |
| RUN corepack enable | |
| RUN corepack prepare pnpm@10.28.0 --activate |
Dockerfile
Outdated
| RUN --mount=type=cache,id=pnpm,target=/pnpm/store \ | ||
| pnpm install --frozen-lockfile | ||
|
|
There was a problem hiding this comment.
The --mount=type=cache,...,target=/pnpm/store cache is likely ineffective because pnpm’s store directory isn’t configured to /pnpm/store (by default it uses a path under the user’s home). Configure the store dir (e.g., pnpm config set store-dir /pnpm/store or ENV PNPM_STORE_DIR=/pnpm/store) so the cache mount is actually used.
Sleep now resolves immediately on SIGTERM via AbortController so the tick loop can exit cleanly. Fix pnpm store cache mount path in Dockerfile to match the actual default store location. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 35 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ac.signal.addEventListener('abort', () => { | ||
| clearTimeout(timer); | ||
| resolve(); | ||
| }); |
There was a problem hiding this comment.
sleep() adds a new abort event listener on every tick and never removes it. Over time this will accumulate listeners and can cause memory growth / MaxListeners warnings in a long-running pod. Register the handler with { once: true } and/or remove the listener after it fires.
| }); | |
| }, { once: true }); |
| process.on('SIGTERM', () => { | ||
| console.log('SIGTERM received, shutting down'); | ||
| server.close(); | ||
| }); |
There was a problem hiding this comment.
Only SIGTERM is handled here. For local dev (Ctrl+C) and some orchestrators, SIGINT is also common. Consider wiring the same shutdown path for SIGINT, and (similar to other services) consider observing the server.close callback to ensure graceful termination.
| COPY --from=setup /app/out/full/ ./ | ||
| COPY turbo.json turbo.json | ||
| COPY tsconfig.json tsconfig.json | ||
| RUN pnpm run build --filter=${SCOPE}... |
There was a problem hiding this comment.
pnpm run build --filter=${SCOPE}... likely passes --filter through to the build script instead of applying pnpm's workspace filter, which can break the Docker build. Use pnpm's filter syntax (e.g. place --filter before the command) so only the selected scope (and deps) is built.
| RUN pnpm run build --filter=${SCOPE}... | |
| RUN pnpm --filter=${SCOPE}... run build |
corepack enable only installs shims — the pinned pnpm version from packageManager is resolved on first invocation. Without prepare, the base stage uses corepack's default pnpm (no package.json yet) while the builder stage switches to 10.28.0. Explicit prepare ensures a single pnpm version throughout the build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8215e3b to
79c7035
Compare
Pin node:24.14-alpine to avoid floating across minor versions, matching the content-monorepo convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7f4a43b to
d64d82f
Compare
Goal
HNT-2097 Scaffold the hnt-content monorepo with two deployable services for the Crawling Pipeline v2.
Implementation Decisions
"type": "module",moduleResolution: "nodenext") — no CJS compatibility layer--experimental-vm-modulesdependency for ESM testingpnpm deploy --prodwithinjectWorkspacePackagesfor Docker isolation instead of copying the full workspace