Skip to content

perf: exclude node_modules from content globbing in monorepos#12199

Draft
srpatcha wants to merge 2 commits into
facebook:mainfrom
srpatcha:fix/globby-ignore-node-modules
Draft

perf: exclude node_modules from content globbing in monorepos#12199
srpatcha wants to merge 2 commits into
facebook:mainfrom
srpatcha:fix/globby-ignore-node-modules

Conversation

@srpatcha

Copy link
Copy Markdown
Contributor

Motivation

Fixes #12128

In monorepo setups with hoisted/linked workspace dependencies (pnpm, yarn workspaces), Globby scans through
ode_modules\ directories when sourcing docs/blog/pages content. This causes extreme startup slowness — the reporter measured 34 seconds reduced to 330ms after adding node_modules to the ignore list.

Changes

  1. \packages/docusaurus-utils/src/globUtils.ts**: Add '/node_modules/**'\ to \GlobExcludeDefault, which is used by the docs, blog, and pages plugins when globbing content files.

  2. \packages/docusaurus-plugin-content-docs/src/sidebars/index.ts**: Add \ignore: ['/node_modules/**']\ to
    eadCategoriesMetadata()\ — this Globby call had no ignore patterns at all, meaning it scanned every _category_.{json,yml,yaml}\ inside
    ode_modules.

  3. **\packages/docusaurus-utils/src/tests/globUtils.test.ts**: Add test coverage for node_modules exclusion in both \createMatcher\ and \createAbsoluteFilePathMatcher.

Test Plan

  • Added unit tests verifying node_modules paths are excluded by the default glob patterns
  • No behavioral change for users without node_modules inside their content directories

In monorepo setups with hoisted/linked dependencies, Globby scans
through node_modules directories when sourcing content, causing
extreme startup slowness (34s to 330ms in the reported case).

Changes:
- Add **/node_modules/** to GlobExcludeDefault in globUtils.ts
- Add ignore pattern to readCategoriesMetadata() in sidebars/index.ts
  which had no ignore patterns at all
- Add test coverage for node_modules exclusion

Fixes facebook#12128
@meta-cla meta-cla Bot added the CLA Signed Signed Facebook CLA label Jun 25, 2026
@netlify

netlify Bot commented Jun 25, 2026

Copy link
Copy Markdown

[V2]

Built without sensitive environment variables

Name Link
🔨 Latest commit dc24d95
🔍 Latest deploy log https://app.netlify.com/projects/docusaurus-2/deploys/6a3ec60befed9c0008bd62b5
😎 Deploy Preview https://deploy-preview-12199--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

async function readCategoriesMetadata(contentPath: string) {
const categoryFiles = await Globby('**/_category_.{json,yml,yaml}', {
cwd: contentPath,
ignore: ['**/node_modules/**'],

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here: #12129 (comment)

This will make another legit use-case impossible.

We'd like to optimize, without preventing the content path being node_modules/@myCompany/docs. I'd like to see a test covering this edge case that keeps passing after the performance optimization.

@slorber slorber marked this pull request as draft June 25, 2026 12:14
Add test proving that when contentPath is inside node_modules
(e.g., node_modules/@myCompany/docs), content files at that root
are NOT excluded. This works because relative paths computed from
the root folder do not contain node_modules.

Addresses review feedback from @slorber.

Signed-off-by: Srikanth Patchava <srpatcha@users.noreply.github.com>
@srpatcha

Copy link
Copy Markdown
Contributor Author

Hi @slorber, thanks for the feedback! You're right to flag this edge case.

I've added tests in commit dc24d95 proving that content sourced from node_modules/@myCompany/docs is NOT excluded. Here's why it works:

When contentPath is set to node_modules/@myCompany/docs, Globby uses it as cwd. Files inside are resolved to relative paths like intro.md or guide/setup.mdx — which do not contain node_modules and therefore are not matched by **/node_modules/**.

New test cases added:

// createAbsoluteFilePathMatcher with root inside node_modules
const nmMatcher = createAbsoluteFilePathMatcher(GlobExcludeDefault, [
  '/project/node_modules/@myCompany/docs',
]);
// Content at root NOT excluded
nmMatcher('/project/node_modules/@myCompany/docs/intro.md')  false 
nmMatcher('/project/node_modules/@myCompany/docs/guide/setup.mdx')  false 
// Nested node_modules inside that root IS excluded
nmMatcher('/project/node_modules/@myCompany/docs/node_modules/dep/file.md')  true 

The same reasoning applies to readCategoriesMetadata in sidebars/index.ts — the ignore: ['**/node_modules/**'] pattern with cwd: contentPath only matches nested node_modules directories, not the root itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Signed Facebook CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docusaurus slow on a monorepo site due to globbing node_modules

2 participants