feat: Attestation Path Validation for Unexpected Files#18
feat: Attestation Path Validation for Unexpected Files#18corepacket wants to merge 1 commit intoSBOMit:masterfrom
Conversation
|
Hey, @corepacket spotted a bug in filter.go. filepath.Match treats * as matching anything except the path separator /, so patterns like .log and .git/ won't match against full absolute paths from attestations: filepath.Match(".log", "/home/user/build/output.log") // false All the patterns advertised in the PR description would silently fail. You'd need to either match against filepath.Base(path) for filename patterns, or check path segments for directory patterns. Or just bring in doublestar now rather than as a follow-up since the standard library can't handle this cleanly. Also why is resolver.go in this PR? Looking at it the only change appears to be a comment edit on the found_by JSON tag which has nothing to do with file exclusion. And the question from #17 still stands. If .git and .log files are showing up in SBOM output, is that because the attestation itself recorded them at build time? If so this flag treats the symptom rather than the cause. Would be good to confirm this is happening with a real attestation before the approach gets locked into the pipeline. |
|
Thanks for the detailed feedback, @Elvand-Lie — this is really helpful. You’re absolutely right about the architecture. Since SBOMit consumes in-toto attestations rather than scanning the filesystem, any entries like At the moment, this is more of a precautionary feature rather than based on a confirmed real-world attestation case. The intention behind the |
|
Before I proceed with updating the PR, I wanted to confirm the preferred direction. Instead of filtering entries via I can also remove the unrelated changes and fix the matching logic as discussed. Just wanted to align on the approach before making the changes — happy to proceed based on your suggestion. |
|
@corepacket The validation approach sounds more aligned with what SBOMit is actually doing but surfacing unexpected entries rather than hiding them makes more sense for an attestation-based tool. But that's a bigger design question that the maintainers should weigh in on before you rework the PR. Worth waiting for their input before proceeding. I would recommend to just wait for maintainers signal rather than doing PRs where we're uncertain on what to do since there's no issue yet by the maintainers, but since you've done the work already I think this approach would be a better last nail for an improvement PR. |
7a863f9 to
90f823f
Compare
|
Hey @corepacket, it looks like you accidentally committed your local test output files (out2.txt, out3.txt, test-baseline.json, etc.). The diff is currently sitting at +53,900 lines. You will need to drop these generated files from the branch so the actual code changes can be reviewed. |
As requested during code review: - The '--exclude' flag is entirely removed from the CLI interface. - Pattern matching now utilizes 'doublestar' correctly across path boundaries. - Instead of silently dropping files, anomalous files (e.g. '.git/', '*.log') generate a warning to stderr while preserving output integrity. Signed-off-by: corepacket <wbn453177@gmail.com>
90f823f to
747747f
Compare
|
hello @Elvand-Lie can please check now |
Feature: Attestation Path Validation for Unexpected Files
Fixes #17
Problem
When generating SBOMs using
sbomit generate, attestations often contain noisy or unexpected artifacts such as:.git/directories and hooks.logfilesWhile it is tempting to exclude these files, removing them masks potential security anomalies and violates the absolute integrity of the attestation data. We must process the attestations exactly as they were recorded, but users still need visibility into anomalous files present in the pipeline.
Solution
This PR introduces an active validation and warning mechanism rather than a destructive drop-filter.
It securely identifies unexpected file patterns during the extraction phase and alerts the user by printing a warning directly to
stderr, while ensuring the files remain 100% intact in the final generated SBOM output.Implementation Details
Validation Logic:
pkg/attestation/filter.go.**/*.log,**/.git/**,.git/**.Pattern Matching Mechanism:
filepath.Matchwithgithub.com/bmatcuk/doublestar/v4.doublestarhandles nested directory boundaries seamlessly (e.g.**/.git/**reliably catches deep nested hooks across full absolute file strings).Architectural Integration:
pkg/attestation/extractor.go.fmt.Fprintf(os.Stderr)warning constraint.Design Considerations
doublestarspecifically because the standard library'sfilepath.Matchsilent-fails across path boundaries (/).Commands & Verification Plan
Reviewers can verify the changes using the following commands locally:
1. Build and Run Unit Tests
Confirm the extraction modifications pass the test suite:
go mod tidy go test ./...(Tests should return
okacross all packages)2. End-to-End Validation Run
Run the generate command passing in the mock attestation file:
3. Expected Result Check
When running the command above, you can verify the dual-action success:
.githook warnings tostderr:test-baseline.json. You will find that the.gitfiles still exist within the SBOM relationships, proving zero data was dropped.Benefits
Files Modified
cmd/generate.go→ Removed previous CLI flags and struct hooks.pkg/attestation/extractor.go→ InjectedValidatePath()into extraction loop.pkg/attestation/filter.go→ Refactored internal file evaluation to usedoublestar.pkg/attestation/parser.go→ Removed string slice argument wrapper.pkg/generator/generator.go→ Cleaned initialization options.