Skip to content

Feat: Add republish-filtered-sarif action for simplified PR Code Scanning#64

Open
varunbhardwaj-gmail wants to merge 12 commits intoadvanced-security:mainfrom
varunbhardwaj-gmail:republish-filtered-sarif
Open

Feat: Add republish-filtered-sarif action for simplified PR Code Scanning#64
varunbhardwaj-gmail wants to merge 12 commits intoadvanced-security:mainfrom
varunbhardwaj-gmail:republish-filtered-sarif

Conversation

@varunbhardwaj-gmail
Copy link
Copy Markdown

@varunbhardwaj-gmail varunbhardwaj-gmail commented Jul 23, 2025

Alternative SARIF Republishing: republish-filtered-sarif Action

A new GitHub Action, republish-filtered-sarif, is now available to streamline Code Scanning results presentation on Pull Requests within a monorepo.

Problem Addressed:
In monorepo PR workflows, Code Scanning checks for unscanned projects may appear incomplete. While the existing republish-sarif action addresses this, it often requires maintaining a projects.json file to define all projects for republishing. This can be an overhead for users.

Solution (republish-filtered-sarif Action):
This composite action provides a quick, easy, and projects.json-agnostic way to ensure a complete Code Scanning overview on PRs. It works by:

  1. Dynamically Discovering Analyses: It queries the GitHub Code Scanning API to find all recent CodeQL analyses published to the main branch.
  2. Intelligent Exclusion: It takes an excluded-category input, which should be the exact category string used in the CodeQL analyze step for the project currently being scanned in the PR. This allows the action to exclude that specific project's SARIF from being downloaded and re-uploaded.
  3. Latest SARIF Selection: For all other projects/categories found on the main branch, it selects and downloads only the most recent SARIF.
  4. Direct API Republishing: These filtered SARIFs are then directly uploaded to the current Pull Request's commit SHA via the GitHub Code Scanning API.

Key Benefit:
This action simplifies the republishing process by removing the need for a projects.json file for this step. Users provide the CodeQL category value of the currently scanned project, and the action automatically handles the rest, offering a streamlined, category-based approach for comprehensive PR security insights.

Example Usage:

# In your monorepo workflow (e.g., for a 'backend' project)

jobs:
  analyze:
    steps:
      # ... (checkout, setup, CodeQL init, build steps) ...

      - name: Perform CodeQL Analysis (Backend)
        uses: github/codeql-action/analyze@v3
        with:
          category: 'backend' # <--- This unique category string is key!          

      # ... (other steps, like dependency submission) ...

      - name: Republish other SARIFs for PR
        if: github.event_name == 'pull_request'
        uses: your-org/your-repo/.github/actions/republish-filtered-sarif@main # Update this path
        with:
          excluded-category: 'backend' # Pass the category of the project just scanned
          

@varunbhardwaj-gmail varunbhardwaj-gmail requested a review from a team as a code owner July 23, 2025 17:03
@varunbhardwaj-gmail
Copy link
Copy Markdown
Author

varunbhardwaj-gmail commented Jul 23, 2025

I heavily referenced your original "republish-sarif" action and I built this for the use within my org but thought I could contribute it back to you.

Comment thread republish-filtered-sarifs/action.yml Outdated
Comment thread republish-filtered-sarifs/download_filtered_sarifs.js Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new republish-filtered-sarifs composite GitHub Action to simplify Code Scanning result presentation in monorepo PR workflows by dynamically discovering and republishing SARIF files without requiring a projects.json configuration file.

Key Changes:

  • Adds a JavaScript script to dynamically query the Code Scanning API, filter analyses by category, and download the latest SARIF files from the default branch
  • Implements a composite action that orchestrates SARIF download, filtering, and republishing via the GitHub API for Pull Requests
  • Updates README with documentation and usage examples for the new action

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 16 comments.

File Description
republish-filtered-sarifs/download_filtered_sarifs.js Core logic for querying Code Scanning API, filtering analyses by excluded category, and downloading SARIF files
republish-filtered-sarifs/action.yml Composite action definition that orchestrates the download script and uploads filtered SARIFs to PR via GitHub API
README.md Documentation explaining the problem, solution, benefits, and example usage of the new action

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

exit 1
fi

sleep 1
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a sleep of 1 second between uploads could significantly slow down the action when there are many SARIF files to upload. Consider documenting why this sleep is necessary (e.g., API rate limiting) or make it configurable. If it's to avoid rate limits, consider implementing exponential backoff only on rate limit errors instead of a blanket delay.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,111 @@
# .github/actions/republish-sarifs/action.yml
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment path '.github/actions/republish-sarifs/action.yml' does not match the actual directory name 'republish-filtered-sarifs' used in the PR. This should be updated to reflect the correct path.

Suggested change
# .github/actions/republish-sarifs/action.yml
# .github/actions/republish-filtered-sarifs/action.yml

Copilot uses AI. Check for mistakes.
Comment thread README.md
**Problem Addressed:**
In monorepo PR workflows, Code Scanning checks for unscanned projects may appear incomplete. While the existing `republish-sarif` action addresses this, it often requires maintaining a `projects.json` file to define all projects for republishing. This can be an overhead for users.

**Solution (`republish-filtered-sarif` Action):**
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action name in parentheses should be 'republish-filtered-sarifs' (plural) to match the directory name.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +63
# 2. Reconstruct the category name from the filename (for logging only, not payload)
CATEGORY_NAME=$(basename "$SARIF_FILE" .sarif | sed 's/_/\//g' | sed 's/\.yml:/yml:/g' | sed 's/^analyze\//\/analyze\//g')

echo "Uploading $(basename "$SARIF_FILE") with category '$CATEGORY_NAME' to PR"
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The category name reconstruction logic from the filename is fragile and likely incorrect. The script replaces underscores with slashes and attempts to reconstruct the original category, but this is the reverse of the sanitization done in the JavaScript file (line 115) which replaces all non-alphanumeric characters with underscores. This reconstruction will not work correctly for categories that originally contained underscores, and the sed patterns for '.yml:' and 'analyze/' appear to be specific to certain category formats that may not be universal. Since this is only used for logging, consider simplifying or documenting that this is a best-effort reconstruction.

Suggested change
# 2. Reconstruct the category name from the filename (for logging only, not payload)
CATEGORY_NAME=$(basename "$SARIF_FILE" .sarif | sed 's/_/\//g' | sed 's/\.yml:/yml:/g' | sed 's/^analyze\//\/analyze\//g')
echo "Uploading $(basename "$SARIF_FILE") with category '$CATEGORY_NAME' to PR"
# 2. Derive a best-effort category name from the filename for logging only.
# The original category string cannot be reliably reconstructed because the
# sanitization step replaces all non-alphanumeric characters with underscores.
# Therefore we just use the sanitized basename here for informational logs.
CATEGORY_NAME=$(basename "$SARIF_FILE" .sarif)
echo "Uploading $(basename "$SARIF_FILE") with sanitized category name '$CATEGORY_NAME' to PR"

Copilot uses AI. Check for mistakes.
env:
GH_TOKEN: ${{ github.token }}
run: |
set -x # Keep debugging enabled for now, can remove after successful run
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'set -x' debug mode is left enabled with a comment indicating it can be removed later. Consider removing this before merging to production to avoid cluttering logs with excessive debug output, or make it conditional on a debug input parameter.

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +115
// Generate a clean filename from the category
const fileName = category.replace(/[^a-z0-9_]/gi, '_').toLowerCase() + '.sarif';
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename sanitization using regex replacement could lead to collisions. Categories with different characters that get replaced with underscores could produce the same filename. For example, 'backend-api' and 'backend/api' would both become 'backend_api.sarif'. Consider using a more robust approach such as encoding the category name or including the analysis ID in the filename to ensure uniqueness.

Suggested change
// Generate a clean filename from the category
const fileName = category.replace(/[^a-z0-9_]/gi, '_').toLowerCase() + '.sarif';
// Generate a clean, mostly human-readable filename from the category and ensure uniqueness with the analysis ID
const sanitizedCategory = category.replace(/[^a-z0-9_]/gi, '_').toLowerCase();
const fileName = `${sanitizedCategory}_${sarifId}.sarif`;

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +133
// Basic validation for received content
if (!sarifContent || (typeof sarifContent === 'object' && Object.keys(sarifContent).length === 0) || (typeof sarifContent === 'string' && sarifContent.trim().length === 0)) {
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation check is overly complex. The condition checks for empty objects, empty strings, and falsy values, but if the API returns empty content, it's likely always going to be the same type. Consider simplifying this check or documenting what specific scenarios each part of the condition handles.

Suggested change
// Basic validation for received content
if (!sarifContent || (typeof sarifContent === 'object' && Object.keys(sarifContent).length === 0) || (typeof sarifContent === 'string' && sarifContent.trim().length === 0)) {
// Basic validation for received content.
// GitHub's code scanning SARIF API returns JSON objects when requesting "application/sarif+json".
// Treat null/undefined or an object with no own properties as empty.
if (!sarifContent || (typeof sarifContent === 'object' && Object.keys(sarifContent).length === 0)) {

Copilot uses AI. Check for mistakes.
Comment thread README.md

See the GitHub documentation on [Using code scanning with your existing CI system](https://docs.github.com/en/enterprise-cloud@latest/code-security/code-scanning/integrating-with-code-scanning/using-code-scanning-with-your-existing-ci-system)

### Alternative SARIF Republishing: `republish-filtered-sarif` Action
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action name in the heading is inconsistent with the directory name. The directory is 'republish-filtered-sarifs' (plural) but the heading uses 'republish-filtered-sarif' (singular). This should match the actual directory name for consistency.

Copilot uses AI. Check for mistakes.
Comment thread README.md
Comment on lines +328 to +330
1. **Dynamically Discovering Analyses:** It queries the GitHub Code Scanning API to find all recent CodeQL analyses published to the `main` branch.
2. **Intelligent Exclusion:** It takes an `excluded-category` input, which should be the exact category string used in the CodeQL `analyze` step for the project currently being scanned in the PR. This allows the action to *exclude* that specific project's SARIF from being downloaded and re-uploaded.
3. **Latest SARIF Selection:** For all *other* projects/categories found on the `main` branch, it selects and downloads only the *most recent* SARIF.
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to line 328, this should say 'default branch' instead of 'main branch' to accurately reflect the implementation.

Suggested change
1. **Dynamically Discovering Analyses:** It queries the GitHub Code Scanning API to find all recent CodeQL analyses published to the `main` branch.
2. **Intelligent Exclusion:** It takes an `excluded-category` input, which should be the exact category string used in the CodeQL `analyze` step for the project currently being scanned in the PR. This allows the action to *exclude* that specific project's SARIF from being downloaded and re-uploaded.
3. **Latest SARIF Selection:** For all *other* projects/categories found on the `main` branch, it selects and downloads only the *most recent* SARIF.
1. **Dynamically Discovering Analyses:** It queries the GitHub Code Scanning API to find all recent CodeQL analyses published to the `default` branch.
2. **Intelligent Exclusion:** It takes an `excluded-category` input, which should be the exact category string used in the CodeQL `analyze` step for the project currently being scanned in the PR. This allows the action to *exclude* that specific project's SARIF from being downloaded and re-uploaded.
3. **Latest SARIF Selection:** For all *other* projects/categories found on the `default` branch, it selects and downloads only the *most recent* SARIF.

Copilot uses AI. Check for mistakes.
REPO_OWNER="${GITHUB_REPOSITORY%/*}"
REPO_NAME="${GITHUB_REPOSITORY#*/}"

for SARIF_FILE in sarif_downloads/*.sarif; do
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop uses a glob pattern 'sarif_downloads/*.sarif' without proper quoting around the variable expansion. If the SARIF_FILE variable contains spaces or special characters in its path, the script could fail or behave unexpectedly. While the filename generation in the JavaScript sanitizes the names, it's a best practice to quote variable expansions in shell scripts to prevent word splitting and globbing issues.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants