Skip to content

Comments

Add unused assets checker script#180

Merged
fulldecent merged 2 commits intomainfrom
feature/unused-assets-checker
Dec 4, 2025
Merged

Add unused assets checker script#180
fulldecent merged 2 commits intomainfrom
feature/unused-assets-checker

Conversation

@fulldecent
Copy link
Owner

Introduces a new test script to identify potentially unused assets in the build directory.

The script scans all files, creates a mapping of base filenames, and searches for references. Assets whose base filenames are never found are reported as potentially unused.

Features:

  • Finds orphaned image files, CSS, JS, and other assets
  • Configurable exclusion patterns (HTML, robots.txt, favicon.ico, sitemap.xml)
  • Detailed reporting with relative paths
  • Exits with error code when unused assets found (suitable for CI)

Run with: yarn test:unused-assets

Note: This is a naive checker. Always verify manually before deleting files.

- Scans build directory for files that are never referenced anywhere
- Creates base filename mappings and searches for references
- Excludes HTML files, robots.txt, favicon.ico, and sitemap.xml by default
- Provides detailed reporting of potentially unused assets
- Includes useful test script and warning about manual verification

Features:
- Skips binary files (images, fonts, etc.) to avoid parsing errors
- Case-sensitive filename matching
- Provides relative paths in the build directory
- Exits with error code when unused assets are found

Note: This is a naive checker that may have false positives/negatives
Copilot AI review requested due to automatic review settings December 4, 2025 17:10
@fulldecent
Copy link
Owner Author

this replaces #152

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an automated unused assets detection tool to help maintain a clean build directory by identifying orphaned files that are not referenced anywhere in the codebase.

Key Changes:

  • Adds a new test script (test/unused-assets.mjs) that scans the build directory for potentially unreferenced assets
  • Integrates the unused assets check into the main test suite via package.json
  • Implements a naive string-matching approach that searches for base filenames across all build files

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
test/unused-assets.mjs New script that identifies potentially unused assets by searching for base filename references across all build files, with configurable exclusion patterns and detailed reporting
package.json Adds test:unused-assets script and integrates it into the main test command chain

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +121 to +126
function main() {
console.log("🔍 Scanning for unused assets in the build directory...\n");

// Get all files
const allFiles = getAllBuildFiles();
console.log(`ℹ️ Found ${allFiles.length} total files in build directory`);
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main() function doesn't check if the BUILD_DIR exists before scanning. If the build directory doesn't exist (e.g., before running yarn build), the script will fail ungracefully. Consider adding a check and providing a helpful error message:

if (!fs.existsSync(BUILD_DIR)) {
  console.error(`❌ Error: Build directory not found at ${BUILD_DIR}`);
  console.error('ℹ️  Please run "yarn build" first.');
  process.exit(1);
}

Copilot uses AI. Check for mistakes.

// Check each remaining base name to see if it appears in this file
for (const baseName of unreferenced.keys()) {
if (content.includes(baseName)) {
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using content.includes(baseName) for searching could produce false positives if the base name is a substring of other content (e.g., searching for "app.js" would match "webapp.js" or "my-app.js-backup"). While this is acceptable for the stated "naive" approach and the script already documents this as a limitation, consider documenting this specific case in the limitations section at the top of the file.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +24
* Limitations (under reporting):
* 1. If the basename of an asset is mentioned anywhere (even if not actually used), it will NOT be reported
*
* Limitations (over reporting):
* 1. If another website or external source references the asset, it will be reported as unused
* 2. If the asset is referenced in a way that does not include the base filename (e.g., dynamically constructed paths), it will be reported as unused
* 3. If the asset is referenced with URL encoding or special characters, it may be reported as unused
*
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "Limitations (under reporting)" and "Limitations (over reporting)" but these terms may be confusing. Consider using clearer terminology:

  • "False negatives" instead of "under reporting" (assets that ARE unused but NOT reported)
  • "False positives" instead of "over reporting" (assets that ARE reported but NOT actually unused)

This would make it clearer what each limitation means.

Copilot uses AI. Check for mistakes.
"build": "bundle exec jekyll build",
"generate-sitemap": "node scripts/generate-sitemap.mjs",
"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\"' --",
"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets' --",
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The test:unused-assets script is not following the same pattern as other test scripts. While test:html-validate and test:dirty-file-paths-checker accept command-line arguments through "$@", this script doesn't pass arguments. For consistency, consider using:

"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets \"$@\"' --",

This would maintain consistency even if the unused-assets script doesn't currently use the arguments.

Suggested change
"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets' --",
"test": "sh -c 'yarn test:html-validate \"$@\" && yarn test:dirty-file-paths-checker \"$@\" && yarn test:unused-assets \"$@\"' --",

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is intentional because you must always consider the whole site to figure out which assets are not being used


for (const file of allFiles) {
// Skip binary files that we can't meaningfully search
if (/\.(png|jpg|jpeg|gif|ico|woff|woff2|ttf|eot|otf)$/i.test(file)) {
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary file exclusion pattern is missing several common binary file types that could cause the script to fail or behave unexpectedly when attempting to read them as UTF-8. Consider adding extensions like .webp, .svg, .pdf, .zip, .mp4, .webm, .mp3, .wav, etc.

Also note that .svg files are actually text-based XML and could be searched, so they might want to be excluded from the binary list while other formats should be added.

Suggested change
if (/\.(png|jpg|jpeg|gif|ico|woff|woff2|ttf|eot|otf)$/i.test(file)) {
if (/\.(png|jpg|jpeg|gif|ico|webp|pdf|zip|mp4|webm|mp3|wav|woff|woff2|ttf|eot|otf)$/i.test(file)) {

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +112
// Check each remaining base name to see if it appears in this file
for (const baseName of unreferenced.keys()) {
if (content.includes(baseName)) {
unreferenced.delete(baseName);

// Early exit if we've found all files
if (unreferenced.size === 0) {
return unreferenced;
}
}
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over all remaining base names for every file could be inefficient when there are many assets and files. The current implementation has O(n*m) complexity where n is the number of files and m is the number of unreferenced assets. Consider optimizing by:

  1. Building a single search pattern with all base names (e.g., using a regex with alternation)
  2. Or at minimum, converting the unreferenced.keys() to an array once before the outer loop to avoid repeated iterator creation
Suggested change
// Check each remaining base name to see if it appears in this file
for (const baseName of unreferenced.keys()) {
if (content.includes(baseName)) {
unreferenced.delete(baseName);
// Early exit if we've found all files
if (unreferenced.size === 0) {
return unreferenced;
}
}
}
// Build a regex that matches any of the remaining base names
const baseNames = Array.from(unreferenced.keys());
if (baseNames.length === 0) {
return unreferenced;
}
// Escape regex special characters in base names
const escapedBaseNames = baseNames.map(name => name.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'));
const pattern = new RegExp(escapedBaseNames.join('|'), 'g');
const matches = new Set();
let match;
while ((match = pattern.exec(content)) !== null) {
matches.add(match[0]);
}
for (const found of matches) {
unreferenced.delete(found);
}
// Early exit if we've found all files
if (unreferenced.size === 0) {
return unreferenced;
}

Copilot uses AI. Check for mistakes.

// Configurable: Regular expressions to exclude files from being checked as potential unused assets
// By default, we exclude HTML files since they are the primary content, not assets
// If there are assets in this site which are not referenced by filename
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an incomplete comment that ends abruptly: "If there are assets in this site which are not referenced by filename". This should be completed or removed to maintain code clarity.

Suggested change
// If there are assets in this site which are not referenced by filename

Copilot uses AI. Check for mistakes.
- Updated canonical link URLs in experiment template source files
- Removed trailing slashes from canonical links
- Tests now pass successfully
@fulldecent fulldecent merged commit a996299 into main Dec 4, 2025
3 of 5 checks passed
@fulldecent fulldecent deleted the feature/unused-assets-checker branch December 8, 2025 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant