Skip to content

feat(HNT-2097): scaffold monorepo with deployable healthcheck services#1

Open
mmiermans wants to merge 13 commits intomainfrom
scaffold-repo
Open

feat(HNT-2097): scaffold monorepo with deployable healthcheck services#1
mmiermans wants to merge 13 commits intomainfrom
scaffold-repo

Conversation

@mmiermans
Copy link
Copy Markdown
Collaborator

@mmiermans mmiermans commented Mar 28, 2026

Goal

HNT-2097 Scaffold the hnt-content monorepo with two deployable services for the Crawling Pipeline v2.

Implementation Decisions

  • ESM-only ("type": "module", moduleResolution: "nodenext") — no CJS compatibility layer
  • Vitest instead of Jest — avoids --experimental-vm-modules dependency for ESM testing
  • pnpm deploy --prod with injectWorkspacePackages for Docker isolation instead of copying the full workspace
  • Crawl-agent healthcheck fails if no tick within a configurable threshold, so K8s restarts stuck pods

mmiermans and others added 6 commits March 27, 2026 16:04
Initialize hnt-content as a pnpm/Turborepo monorepo with:
- Root config: Node 24, pnpm 10, ESM, strict TypeScript
- packages/crawl-common: shared library with tsup (CJS+ESM)
- packages/eslint-config-custom: ESLint + Prettier config
- services/crawl-agent: scheduler with /healthz (stale tick detection)
- services/crawl-worker: worker with /healthz
- Multi-stage Dockerfile using turbo prune (SCOPE build arg)
- Jest test configs for all packages

Both services build, containerize, and respond to healthchecks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review fixes (from parallel Plan agent + Codex reviews):
- Rewrite ESLint config for flat config (ESLint 10), remove eslint-plugin-prettier
- Fix Jest for ESM: .js configs, --experimental-vm-modules, --passWithNoTests
- Fix crawl-common exports map (types-first ordering, correct CJS/ESM paths)
- Add turbo lint dependsOn ^build to prevent CI race conditions
- Add root globalDependencies for tsconfig/eslint/prettier cache invalidation
- Clean dist before tsc build to prevent stale spec files in output
- Exclude test files from tsc build via tsconfig exclude
- Add SIGTERM/SIGINT handlers to both services for graceful K8s shutdown
- Restructure eslint-config-custom with proper peerDependencies
- Remove redundant preinstall script (corepack handles enforcement)
- Simplify corepack setup in Dockerfile
- Expand .dockerignore with security patterns

Dockerfile modernization (based on ChatGPT/Gemini research + Codex testing):
- Use native pnpm 10 deploy (not --legacy) with injectWorkspacePackages
- Add BuildKit cache mounts for pnpm store on install and deploy
- Copy only pnpm deploy output to runner stage (not full workspace)
- Add BuildKit syntax directive

Other:
- Refactor services into app.ts + main.ts for testability
- Add healthcheck tests (supertest): worker 200, agent 200/500
- Add README with quick start, structure, commands, and architecture
- Add root prettier format/format:check scripts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jest required --experimental-vm-modules for ESM support, which relies on
unstable Node VM APIs with no stabilization timeline. Vitest runs ESM
natively with a Jest-compatible API, so existing tests needed zero changes.

- Replace jest + ts-jest with vitest in all packages
- Remove NODE_OPTIONS hack from test scripts
- Add vitest.config.ts per package (globals mode)
- Add vitest/globals types to root tsconfig
- Add onlyBuiltDependencies for esbuild in pnpm-workspace.yaml
- Add CLAUDE.md with test runner and debugging instructions
- Net removal of 244 packages from the dependency tree

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove eslint-config-custom workspace package — flat config makes it
unnecessary. ESLint config is now directly in the root eslint.config.js.
Reorganize README by intent: development, architecture, deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add config.ts per service following content-monorepo pattern — plain
objects with env var overrides and documented defaults. Replace magic
numbers in healthcheck threshold, tick interval, and port. Compensate
for tick duration so the scheduler maintains a consistent interval.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mmiermans and others added 3 commits March 27, 2026 22:24
Build, test, and lint run as parallel jobs on every PR. Claude Code
reviews PRs with restricted tools and comment-actor allowlisting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Not available in the Mozilla GitHub org.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Scaffolds the new hnt-content monorepo with Turborepo + pnpm workspaces, introduces a shared crawl-common package, and adds two deployable Node/Express services (crawl-agent, crawl-worker) that expose /healthz endpoints (with tick-staleness detection in crawl-agent).

Changes:

  • Add Turborepo + pnpm workspace + TypeScript/Vitest + ESLint/Prettier scaffolding for a Node 24 ESM monorepo.
  • Introduce crawl-agent and crawl-worker Express services with healthcheck endpoints and basic test coverage.
  • Add Dockerfile + GitHub Actions PR workflow for build/test/lint and docs/templates for repo conventions.

Reviewed changes

Copilot reviewed 32 out of 35 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
turbo.json Defines Turbo tasks for build/test/lint/dev across the monorepo.
tsconfig.json Establishes base TypeScript (ESM/NodeNext) compiler settings for the workspace.
package.json Root workspace scripts (turbo run), tooling deps, Node engine, pnpm pin.
eslint.config.js Flat-config ESLint setup using typescript-eslint + Prettier config.
pnpm-workspace.yaml Declares workspace package globs and pnpm workspace settings.
pnpm-lock.yaml Locks all dependency versions for the new workspace.
Dockerfile Multi-stage Docker build using turbo prune + pnpm deploy for service images.
README.md Documents dev workflow, architecture, and Docker build instructions.
CODE_OF_CONDUCT.md Adds Mozilla Community Participation Guidelines link.
CLAUDE.md Documents testing/debugging commands and git conventions.
.prettierrc Prettier formatting rules for the repo.
.nvmrc Pins local dev Node major version (24).
.gitignore Ignores node_modules, dist, turbo cache, env files, etc.
.dockerignore Reduces Docker build context size and avoids copying secrets/dev files.
.github/workflows/pull-request.yml CI jobs for build/test/lint on PRs.
.github/PULL_REQUEST_TEMPLATE.md Standardizes PR description format.
packages/crawl-common/package.json Defines crawl-common package build/test/lint and ESM exports.
packages/crawl-common/tsconfig.json Package-level TS config (outDir/rootDir) extending root.
packages/crawl-common/tsup.config.ts TSUP config to build ESM + d.ts for the shared package.
packages/crawl-common/vitest.config.ts Vitest config for the shared package.
packages/crawl-common/src/index.ts Initial placeholder entrypoint for shared code.
services/crawl-agent/package.json Service package definition and scripts for build/dev/test/lint.
services/crawl-agent/tsconfig.json Service TS config (dist output) extending root.
services/crawl-agent/vitest.config.ts Vitest config for crawl-agent tests.
services/crawl-agent/src/app.ts Express app + /healthz endpoint with tick staleness logic.
services/crawl-agent/src/config.ts Service configuration values (port/tick intervals).
services/crawl-agent/src/main.ts HTTP server startup + placeholder tick loop + shutdown handling.
services/crawl-agent/src/app.spec.ts Tests for /healthz success and stale tick failure behavior.
services/crawl-worker/package.json Service package definition and scripts for build/dev/test/lint.
services/crawl-worker/tsconfig.json Service TS config (dist output) extending root.
services/crawl-worker/vitest.config.ts Vitest config for crawl-worker tests.
services/crawl-worker/src/app.ts Express app + /healthz endpoint.
services/crawl-worker/src/config.ts Service configuration values (port).
services/crawl-worker/src/main.ts HTTP server startup + SIGTERM shutdown handling.
services/crawl-worker/src/app.spec.ts Tests for /healthz endpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +5
export default {
port: parseInt(process.env.PORT ?? '8080', 10),
tickIntervalMs: 60_000, // ms — how often the scheduler ticks
staleTickThresholdMinutes: 10, // minutes — healthcheck fails if no tick within this window
};
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

staleTickThresholdMinutes (and tickIntervalMs) are hard-coded, but the PR description says the healthcheck threshold is configurable. Consider reading these values from env (with validation and sensible defaults) so the threshold can be tuned per environment without a rebuild/redeploy.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +15
let sleepTimer: ReturnType<typeof setTimeout> | undefined;

function shutdown() {
console.log('Shutting down');
stopRunning();
if (sleepTimer) clearTimeout(sleepTimer);
server.close();
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdown() clears the active sleep timeout, but sleep() never resolves/rejects when that happens. If the agent is in the await sleep(...) call during shutdown, the run() loop will remain stuck and never reach the while (isRunning()) check again (which becomes important if you later add cleanup logic after the loop). Make the sleep cancellable (e.g., resolve immediately on shutdown, or use an AbortSignal) so the loop can exit deterministically.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +2
export default {
port: parseInt(process.env.PORT ?? '8080', 10),
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseInt(process.env.PORT ?? '8080', 10) can yield NaN for invalid input (e.g. PORT=abc), which will later cause app.listen to throw with a less actionable error. Validate the parsed value (finite integer within 1–65535) and throw a clear startup error if it’s invalid.

Suggested change
export default {
port: parseInt(process.env.PORT ?? '8080', 10),
const rawPort = process.env.PORT ?? '8080';
const port = Number.parseInt(rawPort, 10);
if (
!Number.isFinite(port) ||
!Number.isInteger(port) ||
port < 1 ||
port > 65535
) {
throw new Error(
`Invalid PORT environment variable "${rawPort}". ` +
'Expected an integer between 1 and 65535.'
);
}
export default {
port,

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +2
export default {
port: parseInt(process.env.PORT ?? '8080', 10),
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseInt(process.env.PORT ?? '8080', 10) can yield NaN for invalid input (e.g. PORT=abc), which will later cause app.listen to throw with a less actionable error. Validate the parsed value (finite integer within 1–65535) and throw a clear startup error if it’s invalid.

Suggested change
export default {
port: parseInt(process.env.PORT ?? '8080', 10),
const rawPort = process.env.PORT ?? '8080';
const parsedPort = Number.parseInt(rawPort, 10);
if (!Number.isFinite(parsedPort) || parsedPort < 1 || parsedPort > 65535) {
throw new Error(
`Invalid PORT environment variable "${rawPort}". Expected an integer between 1 and 65535.`,
);
}
export default {
port: parsedPort,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is worth the complexity.

Dockerfile Outdated
ARG PORT=8080

RUN apk add --no-cache curl libc6-compat
RUN corepack enable
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker image build enables Corepack but never pins/activates the repo’s declared pnpm version (package.json has packageManager: pnpm@10.28.0). This can make Docker builds non-reproducible (Node’s bundled pnpm/Corepack version may differ) and may break pnpm deploy semantics across Node image updates. Activate the pinned pnpm version explicitly (e.g., corepack prepare pnpm@10.28.0 --activate, or copy package.json early and run corepack install).

Suggested change
RUN corepack enable
RUN corepack enable
RUN corepack prepare pnpm@10.28.0 --activate

Copilot uses AI. Check for mistakes.
Dockerfile Outdated
Comment on lines +35 to +37
RUN --mount=type=cache,id=pnpm,target=/pnpm/store \
pnpm install --frozen-lockfile

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --mount=type=cache,...,target=/pnpm/store cache is likely ineffective because pnpm’s store directory isn’t configured to /pnpm/store (by default it uses a path under the user’s home). Configure the store dir (e.g., pnpm config set store-dir /pnpm/store or ENV PNPM_STORE_DIR=/pnpm/store) so the cache mount is actually used.

Copilot uses AI. Check for mistakes.
Sleep now resolves immediately on SIGTERM via AbortController so the
tick loop can exit cleanly. Fix pnpm store cache mount path in
Dockerfile to match the actual default store location.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 35 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ac.signal.addEventListener('abort', () => {
clearTimeout(timer);
resolve();
});
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep() adds a new abort event listener on every tick and never removes it. Over time this will accumulate listeners and can cause memory growth / MaxListeners warnings in a long-running pod. Register the handler with { once: true } and/or remove the listener after it fires.

Suggested change
});
}, { once: true });

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +11
process.on('SIGTERM', () => {
console.log('SIGTERM received, shutting down');
server.close();
});
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only SIGTERM is handled here. For local dev (Ctrl+C) and some orchestrators, SIGINT is also common. Consider wiring the same shutdown path for SIGINT, and (similar to other services) consider observing the server.close callback to ensure graceful termination.

Copilot uses AI. Check for mistakes.
COPY --from=setup /app/out/full/ ./
COPY turbo.json turbo.json
COPY tsconfig.json tsconfig.json
RUN pnpm run build --filter=${SCOPE}...
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pnpm run build --filter=${SCOPE}... likely passes --filter through to the build script instead of applying pnpm's workspace filter, which can break the Docker build. Use pnpm's filter syntax (e.g. place --filter before the command) so only the selected scope (and deps) is built.

Suggested change
RUN pnpm run build --filter=${SCOPE}...
RUN pnpm --filter=${SCOPE}... run build

Copilot uses AI. Check for mistakes.
corepack enable only installs shims — the pinned pnpm version from
packageManager is resolved on first invocation. Without prepare, the
base stage uses corepack's default pnpm (no package.json yet) while
the builder stage switches to 10.28.0. Explicit prepare ensures a
single pnpm version throughout the build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pin node:24.14-alpine to avoid floating across minor versions, matching
the content-monorepo convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants