perf: exclude node_modules from content globbing in monorepos#12199
perf: exclude node_modules from content globbing in monorepos#12199srpatcha wants to merge 2 commits into
Conversation
In monorepo setups with hoisted/linked dependencies, Globby scans through node_modules directories when sourcing content, causing extreme startup slowness (34s to 330ms in the reported case). Changes: - Add **/node_modules/** to GlobExcludeDefault in globUtils.ts - Add ignore pattern to readCategoriesMetadata() in sidebars/index.ts which had no ignore patterns at all - Add test coverage for node_modules exclusion Fixes facebook#12128
✅ [V2]Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
| async function readCategoriesMetadata(contentPath: string) { | ||
| const categoryFiles = await Globby('**/_category_.{json,yml,yaml}', { | ||
| cwd: contentPath, | ||
| ignore: ['**/node_modules/**'], |
There was a problem hiding this comment.
See my comment here: #12129 (comment)
This will make another legit use-case impossible.
We'd like to optimize, without preventing the content path being node_modules/@myCompany/docs. I'd like to see a test covering this edge case that keeps passing after the performance optimization.
Add test proving that when contentPath is inside node_modules (e.g., node_modules/@myCompany/docs), content files at that root are NOT excluded. This works because relative paths computed from the root folder do not contain node_modules. Addresses review feedback from @slorber. Signed-off-by: Srikanth Patchava <srpatcha@users.noreply.github.com>
|
Hi @slorber, thanks for the feedback! You're right to flag this edge case. I've added tests in commit dc24d95 proving that content sourced from When New test cases added: // createAbsoluteFilePathMatcher with root inside node_modules
const nmMatcher = createAbsoluteFilePathMatcher(GlobExcludeDefault, [
'/project/node_modules/@myCompany/docs',
]);
// Content at root NOT excluded
nmMatcher('/project/node_modules/@myCompany/docs/intro.md') → false ✓
nmMatcher('/project/node_modules/@myCompany/docs/guide/setup.mdx') → false ✓
// Nested node_modules inside that root IS excluded
nmMatcher('/project/node_modules/@myCompany/docs/node_modules/dep/file.md') → true ✓The same reasoning applies to |
Motivation
Fixes #12128
In monorepo setups with hoisted/linked workspace dependencies (pnpm, yarn workspaces), Globby scans through
ode_modules\ directories when sourcing docs/blog/pages content. This causes extreme startup slowness — the reporter measured 34 seconds reduced to 330ms after adding node_modules to the ignore list.
Changes
\packages/docusaurus-utils/src/globUtils.ts**: Add '/node_modules/**'\ to \GlobExcludeDefault, which is used by the docs, blog, and pages plugins when globbing content files.
\packages/docusaurus-plugin-content-docs/src/sidebars/index.ts**: Add \ignore: ['/node_modules/**']\ to
eadCategoriesMetadata()\ — this Globby call had no ignore patterns at all, meaning it scanned every _category_.{json,yml,yaml}\ inside
ode_modules.
**\packages/docusaurus-utils/src/tests/globUtils.test.ts**: Add test coverage for node_modules exclusion in both \createMatcher\ and \createAbsoluteFilePathMatcher.
Test Plan