Add unused assets checker script by fulldecent · Pull Request #180 · fulldecent/github-pages-template

fulldecent · 2025-12-04T17:10:55Z

Introduces a new test script to identify potentially unused assets in the build directory.

The script scans all files, creates a mapping of base filenames, and searches for references. Assets whose base filenames are never found are reported as potentially unused.

Features:

Finds orphaned image files, CSS, JS, and other assets
Configurable exclusion patterns (HTML, robots.txt, favicon.ico, sitemap.xml)
Detailed reporting with relative paths
Exits with error code when unused assets found (suitable for CI)

Run with: yarn test:unused-assets

Note: This is a naive checker. Always verify manually before deleting files.

- Scans build directory for files that are never referenced anywhere - Creates base filename mappings and searches for references - Excludes HTML files, robots.txt, favicon.ico, and sitemap.xml by default - Provides detailed reporting of potentially unused assets - Includes useful test script and warning about manual verification Features: - Skips binary files (images, fonts, etc.) to avoid parsing errors - Case-sensitive filename matching - Provides relative paths in the build directory - Exits with error code when unused assets are found Note: This is a naive checker that may have false positives/negatives

fulldecent · 2025-12-04T17:11:39Z

this replaces #152

Copilot

Pull request overview

This PR introduces an automated unused assets detection tool to help maintain a clean build directory by identifying orphaned files that are not referenced anywhere in the codebase.

Key Changes:

Adds a new test script (test/unused-assets.mjs) that scans the build directory for potentially unreferenced assets
Integrates the unused assets check into the main test suite via package.json
Implements a naive string-matching approach that searches for base filenames across all build files

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
test/unused-assets.mjs	New script that identifies potentially unused assets by searching for base filename references across all build files, with configurable exclusion patterns and detailed reporting
package.json	Adds `test:unused-assets` script and integrates it into the main test command chain

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-04T17:13:48Z

test/unused-assets.mjs

+function main() {
+  console.log("🔍 Scanning for unused assets in the build directory...\n");
+
+  // Get all files
+  const allFiles = getAllBuildFiles();
+  console.log(`ℹ️  Found ${allFiles.length} total files in build directory`);


The main() function doesn't check if the BUILD_DIR exists before scanning. If the build directory doesn't exist (e.g., before running yarn build), the script will fail ungracefully. Consider adding a check and providing a helpful error message:

if (!fs.existsSync(BUILD_DIR)) { console.error(`❌ Error: Build directory not found at ${BUILD_DIR}`); console.error('ℹ️ Please run "yarn build" first.'); process.exit(1); }

Copilot · 2025-12-04T17:13:49Z

test/unused-assets.mjs

+
+    // Check each remaining base name to see if it appears in this file
+    for (const baseName of unreferenced.keys()) {
+      if (content.includes(baseName)) {


Using content.includes(baseName) for searching could produce false positives if the base name is a substring of other content (e.g., searching for "app.js" would match "webapp.js" or "my-app.js-backup"). While this is acceptable for the stated "naive" approach and the script already documents this as a limitation, consider documenting this specific case in the limitations section at the top of the file.

Copilot · 2025-12-04T17:13:49Z

test/unused-assets.mjs

+ * Limitations (under reporting):
+ * 1. If the basename of an asset is mentioned anywhere (even if not actually used), it will NOT be reported
+ *
+ * Limitations (over reporting):
+ * 1. If another website or external source references the asset, it will be reported as unused
+ * 2. If the asset is referenced in a way that does not include the base filename (e.g., dynamically constructed paths), it will be reported as unused
+ * 3. If the asset is referenced with URL encoding or special characters, it may be reported as unused
+ *


The documentation states "Limitations (under reporting)" and "Limitations (over reporting)" but these terms may be confusing. Consider using clearer terminology:

"False negatives" instead of "under reporting" (assets that ARE unused but NOT reported)

"False positives" instead of "over reporting" (assets that ARE reported but NOT actually unused)

This would make it clearer what each limitation means.

Copilot · 2025-12-04T17:13:49Z

package.json

    "build": "bundle exec jekyll build",
    "generate-sitemap": "node scripts/generate-sitemap.mjs",
-    "test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\"' --",
+    "test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets' --",


[nitpick] The test:unused-assets script is not following the same pattern as other test scripts. While test:html-validate and test:dirty-file-paths-checker accept command-line arguments through "$@", this script doesn't pass arguments. For consistency, consider using:

"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets \"$@\"' --",

This would maintain consistency even if the unused-assets script doesn't currently use the arguments.

Suggested change

"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets' --",

"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets \"$@\"' --",

That is intentional because you must always consider the whole site to figure out which assets are not being used

Copilot · 2025-12-04T17:13:50Z

test/unused-assets.mjs

+
+  for (const file of allFiles) {
+    // Skip binary files that we can't meaningfully search
+    if (/\.(png|jpg|jpeg|gif|ico|woff|woff2|ttf|eot|otf)$/i.test(file)) {


The binary file exclusion pattern is missing several common binary file types that could cause the script to fail or behave unexpectedly when attempting to read them as UTF-8. Consider adding extensions like .webp, .svg, .pdf, .zip, .mp4, .webm, .mp3, .wav, etc.

Also note that .svg files are actually text-based XML and could be searched, so they might want to be excluded from the binary list while other formats should be added.

Suggested change

if (/\.(png|jpg|jpeg|gif|ico|woff|woff2|ttf|eot|otf)$/i.test(file)) {

if (/\.(png|jpg|jpeg|gif|ico|webp|pdf|zip|mp4|webm|mp3|wav|woff|woff2|ttf|eot|otf)$/i.test(file)) {

Copilot · 2025-12-04T17:13:50Z

test/unused-assets.mjs

+    // Check each remaining base name to see if it appears in this file
+    for (const baseName of unreferenced.keys()) {
+      if (content.includes(baseName)) {
+        unreferenced.delete(baseName);
+
+        // Early exit if we've found all files
+        if (unreferenced.size === 0) {
+          return unreferenced;
+        }
+      }
+    }


Iterating over all remaining base names for every file could be inefficient when there are many assets and files. The current implementation has O(n*m) complexity where n is the number of files and m is the number of unreferenced assets. Consider optimizing by:

Building a single search pattern with all base names (e.g., using a regex with alternation)

Or at minimum, converting the unreferenced.keys() to an array once before the outer loop to avoid repeated iterator creation

Suggested change

// Check each remaining base name to see if it appears in this file

for (const baseName of unreferenced.keys()) {

if (content.includes(baseName)) {

unreferenced.delete(baseName);

// Early exit if we've found all files

if (unreferenced.size === 0) {

return unreferenced;

}

}

}

// Build a regex that matches any of the remaining base names

const baseNames = Array.from(unreferenced.keys());

if (baseNames.length === 0) {

return unreferenced;

}

// Escape regex special characters in base names

const escapedBaseNames = baseNames.map(name => name.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'));

const pattern = new RegExp(escapedBaseNames.join('|'), 'g');

const matches = new Set();

let match;

while ((match = pattern.exec(content)) !== null) {

matches.add(match[0]);

}

for (const found of matches) {

unreferenced.delete(found);

}

// Early exit if we've found all files

if (unreferenced.size === 0) {

return unreferenced;

}

Copilot · 2025-12-04T17:13:50Z

test/unused-assets.mjs

+
+// Configurable: Regular expressions to exclude files from being checked as potential unused assets
+// By default, we exclude HTML files since they are the primary content, not assets
+// If there are assets in this site which are not referenced by filename


There's an incomplete comment that ends abruptly: "If there are assets in this site which are not referenced by filename". This should be completed or removed to maintain code clarity.

Suggested change

// If there are assets in this site which are not referenced by filename

- Updated canonical link URLs in experiment template source files - Removed trailing slashes from canonical links - Tests now pass successfully

Copilot AI review requested due to automatic review settings December 4, 2025 17:10

Copilot started reviewing on behalf of fulldecent December 4, 2025 17:11 View session

Copilot finished reviewing on behalf of fulldecent December 4, 2025 17:12

Copilot AI reviewed Dec 4, 2025

View reviewed changes

Fix test failures in experiment template files

0b3f05f

- Updated canonical link URLs in experiment template source files - Removed trailing slashes from canonical links - Tests now pass successfully

fulldecent merged commit a996299 into main Dec 4, 2025
3 of 5 checks passed

fulldecent deleted the feature/unused-assets-checker branch December 8, 2025 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add unused assets checker script#180

Add unused assets checker script#180
fulldecent merged 2 commits intomainfrom
feature/unused-assets-checker

fulldecent commented Dec 4, 2025

Uh oh!

fulldecent commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

fulldecent Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets' --",
	"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets \"$@\"' --",

	if (/\.(png\|jpg\|jpeg\|gif\|ico\|woff\|woff2\|ttf\|eot\|otf)$/i.test(file)) {
	if (/\.(png\|jpg\|jpeg\|gif\|ico\|webp\|pdf\|zip\|mp4\|webm\|mp3\|wav\|woff\|woff2\|ttf\|eot\|otf)$/i.test(file)) {

-    // Check each remaining base name to see if it appears in this file
-    for (const baseName of unreferenced.keys()) {
-      if (content.includes(baseName)) {
-        unreferenced.delete(baseName);
-        // Early exit if we've found all files
-        if (unreferenced.size === 0) {
-          return unreferenced;
-        }
-      }
-    }
+    // Build a regex that matches any of the remaining base names
+    const baseNames = Array.from(unreferenced.keys());
+    if (baseNames.length === 0) {
+      return unreferenced;
+    }
+    // Escape regex special characters in base names
+    const escapedBaseNames = baseNames.map(name => name.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'));
+    const pattern = new RegExp(escapedBaseNames.join('|'), 'g');
+    const matches = new Set();
+    let match;
+    while ((match = pattern.exec(content)) !== null) {
+      matches.add(match[0]);
+    }
+    for (const found of matches) {
+      unreferenced.delete(found);
+    }
+    // Early exit if we've found all files
+    if (unreferenced.size === 0) {
+      return unreferenced;
+    }

Comments

Conversation

fulldecent commented Dec 4, 2025

Uh oh!

fulldecent commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

fulldecent Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant