Skip to content

feat: Attestation Path Validation for Unexpected Files#18

Open
corepacket wants to merge 1 commit intoSBOMit:masterfrom
corepacket:feature/exclude-flag
Open

feat: Attestation Path Validation for Unexpected Files#18
corepacket wants to merge 1 commit intoSBOMit:masterfrom
corepacket:feature/exclude-flag

Conversation

@corepacket
Copy link
Copy Markdown

@corepacket corepacket commented Apr 4, 2026

Feature: Attestation Path Validation for Unexpected Files

Fixes #17

Problem

When generating SBOMs using sbomit generate, attestations often contain noisy or unexpected artifacts such as:

  • .git/ directories and hooks
  • .log files
  • System cache directories

While it is tempting to exclude these files, removing them masks potential security anomalies and violates the absolute integrity of the attestation data. We must process the attestations exactly as they were recorded, but users still need visibility into anomalous files present in the pipeline.


Solution

This PR introduces an active validation and warning mechanism rather than a destructive drop-filter.

It securely identifies unexpected file patterns during the extraction phase and alerts the user by printing a warning directly to stderr, while ensuring the files remain 100% intact in the final generated SBOM output.


Implementation Details

  • Validation Logic:

    • Implemented hardcoded warning bounds in pkg/attestation/filter.go.
    • Targeted patterns: **/*.log, **/.git/**, .git/**.
  • Pattern Matching Mechanism:

    • Replaced Go's standard filepath.Match with github.com/bmatcuk/doublestar/v4.
    • doublestar handles nested directory boundaries seamlessly (e.g. **/.git/** reliably catches deep nested hooks across full absolute file strings).
  • Architectural Integration:

    • Validation is executed early during extraction in pkg/attestation/extractor.go.
    • Triggers a non-blocking fmt.Fprintf(os.Stderr) warning constraint.
    • Ensures zero missing data for downstream processes like resolving or signature integrity checking.

Design Considerations

  • Integrity First: Designed strictly as an observation layer. Zero data loss.
  • Accuracy: Adopted doublestar specifically because the standard library's filepath.Match silent-fails across path boundaries (/).

Commands & Verification Plan

Reviewers can verify the changes using the following commands locally:

1. Build and Run Unit Tests

Confirm the extraction modifications pass the test suite:

go mod tidy
go test ./...

(Tests should return ok across all packages)

2. End-to-End Validation Run

Run the generate command passing in the mock attestation file:

go run main.go generate test/sample-attestation.json --output test-baseline.json

3. Expected Result Check

When running the command above, you can verify the dual-action success:

  1. The CLI correctly fires the .git hook warnings to stderr:
Parsed attestations (3 total): command-run=1, material=1, product=1
WARNING: unexpected file in attestation: .git/hooks/pre-commit.sample
WARNING: unexpected file in attestation: .git/config
WARNING: unexpected file in attestation: .git/description
  1. Output Intact: Open exactly generated test-baseline.json. You will find that the .git files still exist within the SBOM relationships, proving zero data was dropped.

Benefits

  • Provides transparency into noisy or suspicious build environment elements.
  • Protects the absolute integrity and trustworthiness of the source attestations.
  • Eliminates the risk of a user accidentally filtering out severely critical files.

Files Modified

  • cmd/generate.go → Removed previous CLI flags and struct hooks.
  • pkg/attestation/extractor.go → Injected ValidatePath() into extraction loop.
  • pkg/attestation/filter.go → Refactored internal file evaluation to use doublestar.
  • pkg/attestation/parser.go → Removed string slice argument wrapper.
  • pkg/generator/generator.go → Cleaned initialization options.

@Elvand-Lie
Copy link
Copy Markdown

Hey, @corepacket spotted a bug in filter.go. filepath.Match treats * as matching anything except the path separator /, so patterns like .log and .git/ won't match against full absolute paths from attestations:

filepath.Match(".log", "/home/user/build/output.log") // false
filepath.Match(".git/
", "/home/user/project/.git/config") // false

All the patterns advertised in the PR description would silently fail. You'd need to either match against filepath.Base(path) for filename patterns, or check path segments for directory patterns. Or just bring in doublestar now rather than as a follow-up since the standard library can't handle this cleanly.

Also why is resolver.go in this PR? Looking at it the only change appears to be a comment edit on the found_by JSON tag which has nothing to do with file exclusion.

And the question from #17 still stands. If .git and .log files are showing up in SBOM output, is that because the attestation itself recorded them at build time? If so this flag treats the symptom rather than the cause. Would be good to confirm this is happening with a real attestation before the approach gets locked into the pipeline.

@corepacket
Copy link
Copy Markdown
Author

Thanks for the detailed feedback, @Elvand-Lie — this is really helpful.

You’re absolutely right about the architecture. Since SBOMit consumes in-toto attestations rather than scanning the filesystem, any entries like .git or .log would originate from the attestation generated upstream. Ideally, this should be handled at the attestation generation stage rather than during extraction.

At the moment, this is more of a precautionary feature rather than based on a confirmed real-world attestation case. The intention behind the --exclude flag was to provide an optional post-processing mechanism for users who may not control how attestations are produced (e.g., external pipelines or pre-generated artifacts), but I understand the concern that this could mask upstream issues.

@corepacket
Copy link
Copy Markdown
Author

Before I proceed with updating the PR, I wanted to confirm the preferred direction.

Instead of filtering entries via --exclude, would it make more sense to shift this toward a validation approach — for example, surfacing warnings when unexpected file patterns appear in attestations, rather than removing them?

I can also remove the unrelated changes and fix the matching logic as discussed.

Just wanted to align on the approach before making the changes — happy to proceed based on your suggestion.

@Elvand-Lie
Copy link
Copy Markdown

@corepacket The validation approach sounds more aligned with what SBOMit is actually doing but surfacing unexpected entries rather than hiding them makes more sense for an attestation-based tool. But that's a bigger design question that the maintainers should weigh in on before you rework the PR. Worth waiting for their input before proceeding. I would recommend to just wait for maintainers signal rather than doing PRs where we're uncertain on what to do since there's no issue yet by the maintainers, but since you've done the work already I think this approach would be a better last nail for an improvement PR.

@corepacket corepacket force-pushed the feature/exclude-flag branch from 7a863f9 to 90f823f Compare April 5, 2026 06:13
@corepacket corepacket changed the title feat: add --exclude flag to filter files by glob feat: Attestation Path Validation for Unexpected Files Apr 5, 2026
@Elvand-Lie
Copy link
Copy Markdown

Hey @corepacket, it looks like you accidentally committed your local test output files (out2.txt, out3.txt, test-baseline.json, etc.). The diff is currently sitting at +53,900 lines. You will need to drop these generated files from the branch so the actual code changes can be reviewed.

As requested during code review:

- The '--exclude' flag is entirely removed from the CLI interface.

- Pattern matching now utilizes 'doublestar' correctly across path boundaries.

- Instead of silently dropping files, anomalous files (e.g. '.git/', '*.log') generate a warning to stderr while preserving output integrity.

Signed-off-by: corepacket <wbn453177@gmail.com>
@corepacket corepacket force-pushed the feature/exclude-flag branch from 90f823f to 747747f Compare April 5, 2026 08:53
@corepacket
Copy link
Copy Markdown
Author

hello @Elvand-Lie can please check now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Prevent SBOM bloat with configurable file exclusion flag (--exclude)

3 participants