Skip to content

Conversation

@frjcomp
Copy link
Collaborator

@frjcomp frjcomp commented Jan 20, 2026

No description provided.

…erns

- Detect COPY/ADD patterns that include entire build context (secrets, .git, etc.)
- Support for wildcards: ., ./, *, ./* patterns
- Check for .dockerignore file presence
- Detect multistage builds (multiple FROM statements)
- Fetch container registry metadata (tag, last update)
- Clean output with url, file, content, has_dockerignore, is_multistage, registry info
- Supports --owned, --member, --repo, --namespace, --search filters
- Output message: 'Identified'
- Added pkg/github/container/ with scanner.go, types.go, patterns.go
- Added internal/cmd/github/container/container.go command interface
- Registered container command in internal/cmd/github/github.go
- Mirrors GitLab container scan functionality
- Detects dangerous COPY/ADD patterns that expose secrets
- Checks for .dockerignore and multistage builds
- Fetches container registry metadata when available
GitHub:
- Use Search.Code API to find all Dockerfile/Containerfile files
- Support scanning up to 50 Dockerfiles per repository
- Search across entire repository tree recursively

GitLab:
- Added findDockerfiles function with common search paths
- Checks root and 12+ common subdirectories (docker/, app/, services/, etc)
- Supports up to 50 Dockerfiles per project

Both:
- Log count of discovered Dockerfiles
- Scan all found Dockerfiles instead of just first one
- Improved detection for monorepos and multi-service projects
- Use git.Repositories.ListTree() for true recursive file discovery
- No hardcoded directory paths - scans entire repository tree
- Implements breadth-first search using queue for directory traversal
- Efficiently finds all Dockerfiles at any nesting level
- Successfully found 8 Dockerfiles in omnibus-mirror/distribution repo
- Maintains 50 Dockerfile limit to prevent scanning huge repos

Both GitHub and GitLab now use proper recursive discovery:
- GitHub: Search.Code API with filename matching
- GitLab: Repositories.ListTree with recursive traversal
- New flag to filter for public repositories only
- Adds 'is:public' to GitHub search query when flag is used
- Can be combined with other flags like --search, --repo
- Updated help text to mention --public option
- Tested successfully scanning public repos
- Remove all dockerignore file checks from GitLab and GitHub container scanners
- Remove HasDockerignore field from shared Finding struct
- Optimize registry metadata fetch: use ListRegistryRepositoryTags data directly instead of calling GetRegistryRepositoryTagDetail for each tag
- Add registry_last_update back to logging output
- Implement recursive tree API for GitLab Dockerfile discovery with depth limit (max 2 levels)
- Add timing instrumentation for Dockerfile discovery and registry metadata fetch phases

Performance improvements:
- Dockerfile discovery: ~122s -> 347-817ms (95%+ faster)
- Registry metadata: Skip expensive per-tag detail API calls
- Total container scan time: ~2m47s -> ~2.5s (94% faster on tested repos)

This eliminates unnecessary API calls and improves security scanning throughput significantly.
Copilot AI review requested due to automatic review settings January 20, 2026 12:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements container image scanning functionality for detecting dangerous Dockerfile/Containerfile patterns in both GitLab and GitHub repositories. The feature identifies risky build patterns like COPY . / that may expose sensitive files.

Changes:

  • Added container scanning packages for GitLab and GitHub with pattern detection logic
  • Implemented shared container scanning utilities including pattern definitions and multistage build detection
  • Added comprehensive e2e tests for GitLab container scanning functionality
  • Integrated container scan commands into GitLab and GitHub CLI interfaces

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/e2e/gitlab/container/container_test.go E2e tests for GitLab container scanning with various scenarios
pkg/gitlab/container/types.go ScanOptions struct for GitLab container scanning
pkg/gitlab/container/scanner.go Main scanner implementation for GitLab projects
pkg/gitlab/container/patterns.go Wrapper delegating to shared pattern functions
pkg/github/container/types.go ScanOptions struct for GitHub container scanning
pkg/github/container/scanner.go Main scanner implementation for GitHub repositories
pkg/github/container/patterns.go Wrapper delegating to shared pattern functions
pkg/container/types.go Shared type definitions for container scanning
pkg/container/scanner.go Shared scanning utility functions
pkg/container/patterns.go Default dangerous pattern definitions
internal/cmd/gitlab/gitlab.go Registered container scan command in GitLab CLI
internal/cmd/gitlab/container/container.go GitLab container scan command implementation
internal/cmd/github/github.go Registered container scan command in GitHub CLI
internal/cmd/github/container/container.go GitHub container scan command implementation
containers_github_com Log output file showing error handling

Comment on lines +48 to +50
if strings.Contains(r.URL.Path, "/repository/files") && strings.Contains(r.URL.Path, "Dockerfile") {
// Repository tree endpoint - for finding Dockerfiles
if strings.Contains(r.URL.Path, "/repository/tree") {
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable code: line 48 checks for '/repository/files' and 'Dockerfile', but line 50 checks for '/repository/tree'. Since the first condition requires '/repository/files' in the path, the nested check for '/repository/tree' can never be true.

Copilot uses AI. Check for mistakes.
// Find the most recent version
var mostRecentVersion *github.PackageVersion
for _, ver := range versions {
if ver.GetCreatedAt().Time.After(mostRecentVersion.GetCreatedAt().Time) || mostRecentVersion == nil {
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential nil dereference: The condition checks mostRecentVersion.GetCreatedAt() before verifying if mostRecentVersion is nil. The condition should be reordered to mostRecentVersion == nil || ver.GetCreatedAt().Time.After(mostRecentVersion.GetCreatedAt().Time) to avoid calling methods on a nil pointer.

Suggested change
if ver.GetCreatedAt().Time.After(mostRecentVersion.GetCreatedAt().Time) || mostRecentVersion == nil {
if mostRecentVersion == nil || ver.GetCreatedAt().Time.After(mostRecentVersion.GetCreatedAt().Time) {

Copilot uses AI. Check for mistakes.
repoName := repo.GetName()

// List container packages for the repository
packages, _, err := client.Organizations.ListPackages(ctx, owner, &github.PackageListOptions{
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code assumes the repository owner is an organization, but repositories can also be owned by individual users. For user-owned repositories, this API call will fail. Consider using a check to distinguish between user and organization repositories, or use the appropriate API method for user packages.

Copilot uses AI. Check for mistakes.
}

elapsed := time.Since(startTime)
log.Debug().Str("project", project.PathWithNamespace).Int("found", len(dockerfiles)).Dur("elapsed_ms", elapsed).Msg("Dockerfile search complete")
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name 'elapsed_ms' suggests milliseconds, but the value is a time.Duration which is in nanoseconds. Either rename the field to 'elapsed' or convert the duration to milliseconds to match the field name.

Suggested change
log.Debug().Str("project", project.PathWithNamespace).Int("found", len(dockerfiles)).Dur("elapsed_ms", elapsed).Msg("Dockerfile search complete")
log.Debug().Str("project", project.PathWithNamespace).Int("found", len(dockerfiles)).Int("elapsed_ms", int(elapsed.Milliseconds())).Msg("Dockerfile search complete")

Copilot uses AI. Check for mistakes.
Str("repo", repo.Path).
Str("tag", mostRecentTag.Name).
Str("last_update", metadata.LastUpdate).
Dur("elapsed_ms", elapsed).
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name 'elapsed_ms' suggests milliseconds, but the value is a time.Duration which is in nanoseconds. Either rename the field to 'elapsed' or convert the duration to milliseconds to match the field name.

Suggested change
Dur("elapsed_ms", elapsed).
Int64("elapsed_ms", elapsed.Milliseconds()).

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

@frjcomp frjcomp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests for the gh scan artipacked command as well.


// ParseCustomPatterns parses a comma-separated string of patterns into a slice of Pattern objects
// The patterns are treated as regex strings
func ParseCustomPatterns(patternsStr string) []Pattern {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove custom patterns

FileName string
MatchedPattern string
LineContent string
PatternSeverity string
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed. also remove all references of severity

return sharedcontainer.DefaultPatterns()
}

// ParseCustomPatterns parses a comma-separated string of patterns by delegating to the shared package
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed anymore

}
}

func fetchRepositories(ctx context.Context, client *github.Client, patterns []sharedcontainer.Pattern, opts ScanOptions) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not implement these flags again, the gh scan command already implemented it and gh renovate enum as well. reuse existing code

lines := strings.Split(content, "\n")

// Check against all patterns
for _, pattern := range patterns {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is shared logic. move it to the pkg/container package and use it here and in the gitlab implementations.

return sharedcontainer.IsMultistage(string(decodedContent))
}

func scanDockerfile(git *gitlab.Client, project *gitlab.Project, file *gitlab.File, fileName string, patterns []sharedcontainer.Pattern, isMultistage bool) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is shared logic mostly, extract and use it from the shared package

@@ -0,0 +1,5 @@
2026-01-20T09:35:57Z info Log level set to info (default)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this file!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants