chore: add monorepo workspace skeleton (no behavior change)#89
Open
nicklamonov wants to merge 3 commits into
Open
chore: add monorepo workspace skeleton (no behavior change)#89nicklamonov wants to merge 3 commits into
nicklamonov wants to merge 3 commits into
Conversation
This is PR #1 of the planned migration to host the URL-to-Markdown actor (formerly apify/page-scraper) as a sibling actor in this repo. Adds the workspace scaffolding only — RAG Web Browser's source layout, build, and runtime behavior are unchanged in this PR. Subsequent PRs will: - PR #2: relocate RAG's src/ and .actor/ into packages/actors/rag-web-browser/ - PR #3: extract the shared scraping engine into packages/scraping-engine/ - PR #4: add packages/actors/url-to-markdown/ consuming the engine - PR #5: switch CI to a matrix push for both actors What this PR changes: - package.json: add "private": true, "workspaces": ["packages/*", "packages/actors/*"], "packageManager": "npm@10.9.2", and lerna + turbo as devDependencies. - lerna.json: independent versioning, conventional commits, github releases (matching apify/actor-scraper's setup). - turbo.json: build / test / lint / clean tasks with the standard dependsOn:["^build"] graph and dist/** outputs. - tsconfig.base.json: shared base config (extends @apify/tsconfig) that future workspace packages will extend. RAG's own tsconfig.json is unchanged. - packages/.gitkeep: placeholder so the empty workspace dir is tracked. - .gitignore: ignore .turbo cache. Verification: - npm install completes (1159 packages, patch-package runs). - npm run build (tsc) succeeds. - npx turbo run build runs cleanly with "0 packages in scope" (as expected — no workspace packages exist yet). - Non-Playwright tests pass (9/11). The 2 Playwright tests fail locally only because Playwright browsers aren't installed; this is independent of the workspace changes. Tooling note: matches apify/actor-scraper's stack exactly — npm workspaces + Lerna (independent versioning) + Turbo. The earlier draft plan referenced pnpm; npm is the right call to mirror the reference monorepo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8fc4ad0 to
00e585c
Compare
The previous `node-version: 'latest'` resolved to Node 26.2.0 on current ubuntu-latest runners. Playwright 1.46.0's installer was released August 2024 and only officially supports Node 18 / 20 / 22 — on Node 26 its post-download `unzip` step hangs silently with no progress output, causing the CI step to time out. Pinning to Node 22: - Inside Playwright 1.46.0's supported matrix - Current Node LTS - Matches the production base image (apify/actor-node-playwright-firefox:22-*) Master's last successful CI run on 2026-05-01 happened to land on a Node version that worked with Playwright; the implicit `'latest'` pointer rolled over to Node 26 since then. This pin fixes that drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the reference monorepo (apify/actor-scraper), which uses pnpm + Lerna + Turbo — not npm. The earlier "corrected to npm to mirror the reference" note was based on a misread; the reference is on pnpm. - package.json: drop the npm `workspaces` field (pnpm reads pnpm-workspace.yaml), set packageManager to pnpm@10.33.4, add the devEngines block. - pnpm-workspace.yaml: workspace globs + nodeLinker: hoisted + onlyBuiltDependencies (esbuild, playwright), matching actor-scraper. - Regenerate the lockfile (package-lock.json -> pnpm-lock.yaml). - lerna.json: npmClient: pnpm. - checks.yml: install via apify/actions/pnpm-install, run via pnpm, add concurrency/cancel-in-progress. Node stays pinned to 22. patch-package is intentionally kept (not migrated to pnpm's native patchedDependencies): the production actor image builds with npm, which does not understand pnpm patches, so a native migration silently dropped the playwright-core Firefox patch from the prod image. patch-package's postinstall runs under both npm and pnpm. Verified: pnpm build/lint/test green; test results match the npm baseline (9/11 local, browser-dependent failures unchanged); full docker build succeeds and the Firefox patch is present in the final image. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 1 of the monorepo migration tracked in epic #90 — fold the URL-to-Markdown actor (currently the standalone
apify/page-scraperrepo) into this repo as a sibling actor sharing the scraping engine in-process instead of over HTTP.This PR adds the workspace scaffolding only — RAG's source layout, build, and runtime behavior are unchanged.
Tracking: #90
What changes
package.json— adds"private": true,"packageManager": "pnpm@10.33.4"+ adevEngines.packageManagerblock, andlerna+turboas devDependencies. Workspaces are declared inpnpm-workspace.yaml, not the npmworkspacesfield.patch-packagestays independencies(see Tooling note).pnpm-workspace.yaml— workspace globs (packages/*,packages/actors/*),nodeLinker: hoisted, andonlyBuiltDependencies(esbuild,playwright). Mirrorsapify/actor-scraper.pnpm-lock.yaml— replacespackage-lock.json.lerna.json— independent per-package versioning, conventional commits, GitHub releases,npmClient: pnpm. Mirrorsapify/actor-scraper.turbo.json— standard build/test/lint/clean task graph (dependsOn: ["^build"], outputsdist/**).tsconfig.base.json— shared base config that future workspace packages will extend. RAG's owntsconfig.jsonis untouched.packages/.gitkeep— placeholder for the (currently empty) workspace dir..gitignore— ignore.turbocache..github/workflows/checks.yml— pin Node to22(was'latest', which now resolves to Node 26) and switch the package manager from npm to pnpm (install viaapify/actions/pnpm-install, build/lint/test viapnpm), plusconcurrency+cancel-in-progress. Fixes a CI hang: Playwright 1.46.0's installer only supports Node 18/20/22, and on Node 26 its post-download unzip step stalls silently until the run is cancelled. Node 22 also matches the production base image (apify/actor-node-playwright-firefox:22-*).What is not changed
src/,.actor/,Dockerfile,tsconfig.json, build scripts, tests.npm run build/npm run start:dev/apify pushflows.apify/actor-scraper); pnpm is only the dev/CI/workspace layer.Verification
pnpm installcompletes (1119 packages);patch-packageappliesplaywright-core@1.46.0under both pnpm and npm.pnpm run build(tsc) andpnpm run lintsucceed.turbo run buildruns cleanly with "0 packages in scope" — expected since no workspace packages exist yet.docker buildof.actor/Dockerfilesucceeds, and the Firefoxplaywright-corepatch is confirmed present in the final image (applied bypatch-package's postinstall during the image'snpm install).actionlintpasses clean; anactdry-run resolves the full job graph, including theapify/actions/pnpm-installcomposite (which runspnpm installand caches bypnpm-lock.yamlhash).Upcoming PRs (tracked in #90)
src/and.actor/intopackages/actors/rag-web-browser/.packages/scraping-engine/.packages/actors/url-to-markdown/consuming the engine.apify/page-scraperrepo + actor.Tooling note
Matches
apify/actor-scraper's stack — pnpm workspaces + Lerna (independent versioning) + Turbo. (An earlier draft of this plan said npm "to mirror the reference"; that was a misread — the reference monorepo is on pnpm [pnpm-lock.yaml,packageManager: pnpm@10.33.4], so this PR uses pnpm.)The patch on
playwright-coreis intentionally kept onpatch-packagerather than migrated to pnpm's nativepatchedDependencies: the production actor image builds with npm, which does not understand pnpm patches — a native migration silently dropped the Firefox patch from the prod image.patch-package's postinstall runs under both npm (Docker) and pnpm (dev/CI), so both paths apply the patch.🤖 Generated with Claude Code