Feat: Add republish-filtered-sarif action for simplified PR Code Scanning#64
Conversation
|
I heavily referenced your original "republish-sarif" action and I built this for the use within my org but thought I could contribute it back to you. |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new republish-filtered-sarifs composite GitHub Action to simplify Code Scanning result presentation in monorepo PR workflows by dynamically discovering and republishing SARIF files without requiring a projects.json configuration file.
Key Changes:
- Adds a JavaScript script to dynamically query the Code Scanning API, filter analyses by category, and download the latest SARIF files from the default branch
- Implements a composite action that orchestrates SARIF download, filtering, and republishing via the GitHub API for Pull Requests
- Updates README with documentation and usage examples for the new action
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 16 comments.
| File | Description |
|---|---|
| republish-filtered-sarifs/download_filtered_sarifs.js | Core logic for querying Code Scanning API, filtering analyses by excluded category, and downloading SARIF files |
| republish-filtered-sarifs/action.yml | Composite action definition that orchestrates the download script and uploads filtered SARIFs to PR via GitHub API |
| README.md | Documentation explaining the problem, solution, benefits, and example usage of the new action |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| exit 1 | ||
| fi | ||
|
|
||
| sleep 1 |
There was a problem hiding this comment.
Adding a sleep of 1 second between uploads could significantly slow down the action when there are many SARIF files to upload. Consider documenting why this sleep is necessary (e.g., API rate limiting) or make it configurable. If it's to avoid rate limits, consider implementing exponential backoff only on rate limit errors instead of a blanket delay.
| @@ -0,0 +1,111 @@ | |||
| # .github/actions/republish-sarifs/action.yml | |||
There was a problem hiding this comment.
The comment path '.github/actions/republish-sarifs/action.yml' does not match the actual directory name 'republish-filtered-sarifs' used in the PR. This should be updated to reflect the correct path.
| # .github/actions/republish-sarifs/action.yml | |
| # .github/actions/republish-filtered-sarifs/action.yml |
| **Problem Addressed:** | ||
| In monorepo PR workflows, Code Scanning checks for unscanned projects may appear incomplete. While the existing `republish-sarif` action addresses this, it often requires maintaining a `projects.json` file to define all projects for republishing. This can be an overhead for users. | ||
|
|
||
| **Solution (`republish-filtered-sarif` Action):** |
There was a problem hiding this comment.
The action name in parentheses should be 'republish-filtered-sarifs' (plural) to match the directory name.
| # 2. Reconstruct the category name from the filename (for logging only, not payload) | ||
| CATEGORY_NAME=$(basename "$SARIF_FILE" .sarif | sed 's/_/\//g' | sed 's/\.yml:/yml:/g' | sed 's/^analyze\//\/analyze\//g') | ||
|
|
||
| echo "Uploading $(basename "$SARIF_FILE") with category '$CATEGORY_NAME' to PR" |
There was a problem hiding this comment.
The category name reconstruction logic from the filename is fragile and likely incorrect. The script replaces underscores with slashes and attempts to reconstruct the original category, but this is the reverse of the sanitization done in the JavaScript file (line 115) which replaces all non-alphanumeric characters with underscores. This reconstruction will not work correctly for categories that originally contained underscores, and the sed patterns for '.yml:' and 'analyze/' appear to be specific to certain category formats that may not be universal. Since this is only used for logging, consider simplifying or documenting that this is a best-effort reconstruction.
| # 2. Reconstruct the category name from the filename (for logging only, not payload) | |
| CATEGORY_NAME=$(basename "$SARIF_FILE" .sarif | sed 's/_/\//g' | sed 's/\.yml:/yml:/g' | sed 's/^analyze\//\/analyze\//g') | |
| echo "Uploading $(basename "$SARIF_FILE") with category '$CATEGORY_NAME' to PR" | |
| # 2. Derive a best-effort category name from the filename for logging only. | |
| # The original category string cannot be reliably reconstructed because the | |
| # sanitization step replaces all non-alphanumeric characters with underscores. | |
| # Therefore we just use the sanitized basename here for informational logs. | |
| CATEGORY_NAME=$(basename "$SARIF_FILE" .sarif) | |
| echo "Uploading $(basename "$SARIF_FILE") with sanitized category name '$CATEGORY_NAME' to PR" |
| env: | ||
| GH_TOKEN: ${{ github.token }} | ||
| run: | | ||
| set -x # Keep debugging enabled for now, can remove after successful run |
There was a problem hiding this comment.
The 'set -x' debug mode is left enabled with a comment indicating it can be removed later. Consider removing this before merging to production to avoid cluttering logs with excessive debug output, or make it conditional on a debug input parameter.
| // Generate a clean filename from the category | ||
| const fileName = category.replace(/[^a-z0-9_]/gi, '_').toLowerCase() + '.sarif'; |
There was a problem hiding this comment.
The filename sanitization using regex replacement could lead to collisions. Categories with different characters that get replaced with underscores could produce the same filename. For example, 'backend-api' and 'backend/api' would both become 'backend_api.sarif'. Consider using a more robust approach such as encoding the category name or including the analysis ID in the filename to ensure uniqueness.
| // Generate a clean filename from the category | |
| const fileName = category.replace(/[^a-z0-9_]/gi, '_').toLowerCase() + '.sarif'; | |
| // Generate a clean, mostly human-readable filename from the category and ensure uniqueness with the analysis ID | |
| const sanitizedCategory = category.replace(/[^a-z0-9_]/gi, '_').toLowerCase(); | |
| const fileName = `${sanitizedCategory}_${sarifId}.sarif`; |
| // Basic validation for received content | ||
| if (!sarifContent || (typeof sarifContent === 'object' && Object.keys(sarifContent).length === 0) || (typeof sarifContent === 'string' && sarifContent.trim().length === 0)) { |
There was a problem hiding this comment.
The validation check is overly complex. The condition checks for empty objects, empty strings, and falsy values, but if the API returns empty content, it's likely always going to be the same type. Consider simplifying this check or documenting what specific scenarios each part of the condition handles.
| // Basic validation for received content | |
| if (!sarifContent || (typeof sarifContent === 'object' && Object.keys(sarifContent).length === 0) || (typeof sarifContent === 'string' && sarifContent.trim().length === 0)) { | |
| // Basic validation for received content. | |
| // GitHub's code scanning SARIF API returns JSON objects when requesting "application/sarif+json". | |
| // Treat null/undefined or an object with no own properties as empty. | |
| if (!sarifContent || (typeof sarifContent === 'object' && Object.keys(sarifContent).length === 0)) { |
|
|
||
| See the GitHub documentation on [Using code scanning with your existing CI system](https://docs.github.com/en/enterprise-cloud@latest/code-security/code-scanning/integrating-with-code-scanning/using-code-scanning-with-your-existing-ci-system) | ||
|
|
||
| ### Alternative SARIF Republishing: `republish-filtered-sarif` Action |
There was a problem hiding this comment.
The action name in the heading is inconsistent with the directory name. The directory is 'republish-filtered-sarifs' (plural) but the heading uses 'republish-filtered-sarif' (singular). This should match the actual directory name for consistency.
| 1. **Dynamically Discovering Analyses:** It queries the GitHub Code Scanning API to find all recent CodeQL analyses published to the `main` branch. | ||
| 2. **Intelligent Exclusion:** It takes an `excluded-category` input, which should be the exact category string used in the CodeQL `analyze` step for the project currently being scanned in the PR. This allows the action to *exclude* that specific project's SARIF from being downloaded and re-uploaded. | ||
| 3. **Latest SARIF Selection:** For all *other* projects/categories found on the `main` branch, it selects and downloads only the *most recent* SARIF. |
There was a problem hiding this comment.
Similar to line 328, this should say 'default branch' instead of 'main branch' to accurately reflect the implementation.
| 1. **Dynamically Discovering Analyses:** It queries the GitHub Code Scanning API to find all recent CodeQL analyses published to the `main` branch. | |
| 2. **Intelligent Exclusion:** It takes an `excluded-category` input, which should be the exact category string used in the CodeQL `analyze` step for the project currently being scanned in the PR. This allows the action to *exclude* that specific project's SARIF from being downloaded and re-uploaded. | |
| 3. **Latest SARIF Selection:** For all *other* projects/categories found on the `main` branch, it selects and downloads only the *most recent* SARIF. | |
| 1. **Dynamically Discovering Analyses:** It queries the GitHub Code Scanning API to find all recent CodeQL analyses published to the `default` branch. | |
| 2. **Intelligent Exclusion:** It takes an `excluded-category` input, which should be the exact category string used in the CodeQL `analyze` step for the project currently being scanned in the PR. This allows the action to *exclude* that specific project's SARIF from being downloaded and re-uploaded. | |
| 3. **Latest SARIF Selection:** For all *other* projects/categories found on the `default` branch, it selects and downloads only the *most recent* SARIF. |
| REPO_OWNER="${GITHUB_REPOSITORY%/*}" | ||
| REPO_NAME="${GITHUB_REPOSITORY#*/}" | ||
|
|
||
| for SARIF_FILE in sarif_downloads/*.sarif; do |
There was a problem hiding this comment.
The loop uses a glob pattern 'sarif_downloads/*.sarif' without proper quoting around the variable expansion. If the SARIF_FILE variable contains spaces or special characters in its path, the script could fail or behave unexpectedly. While the filename generation in the JavaScript sanitizes the names, it's a best practice to quote variable expansions in shell scripts to prevent word splitting and globbing issues.
Alternative SARIF Republishing:
republish-filtered-sarifActionA new GitHub Action,
republish-filtered-sarif, is now available to streamline Code Scanning results presentation on Pull Requests within a monorepo.Problem Addressed:
In monorepo PR workflows, Code Scanning checks for unscanned projects may appear incomplete. While the existing
republish-sarifaction addresses this, it often requires maintaining aprojects.jsonfile to define all projects for republishing. This can be an overhead for users.Solution (
republish-filtered-sarifAction):This composite action provides a quick, easy, and
projects.json-agnostic way to ensure a complete Code Scanning overview on PRs. It works by:mainbranch.excluded-categoryinput, which should be the exact category string used in the CodeQLanalyzestep for the project currently being scanned in the PR. This allows the action to exclude that specific project's SARIF from being downloaded and re-uploaded.mainbranch, it selects and downloads only the most recent SARIF.Key Benefit:
This action simplifies the republishing process by removing the need for a
projects.jsonfile for this step. Users provide the CodeQLcategoryvalue of the currently scanned project, and the action automatically handles the rest, offering a streamlined, category-based approach for comprehensive PR security insights.Example Usage: