Summary
For Docker image scans, sbomify-action's generator registry today prefers CdxgenImageGenerator (priority 20) over SyftImageGenerator (priority 35) — sbomify_action/_generation/generator.py:74-88, registry.py:97-98 (lower priority number wins). This is the default path for any image scan that doesn't match the Chainguard or Docker Hub auto-detection branches.
The result: cdxgen -t oci <image> produces strictly worse SBOMs than Syft for interpreted-language ecosystems, because cdxgen's OCI walker emits pkg:generic/<filename> for venv bin shims and shared libraries instead of resolving Python .dist-info, Node node_modules, gem indexes, etc.
Concrete evidence — same image, two scanners
Test image: a FastAPI app built on cgr.dev/chainguard/python:latest with 17 transitive PyPI dependencies (fastapi, httpx, uvicorn, pydantic, starlette, …).
| Scanner |
Total components |
pkg:pypi/* |
pkg:apk/* |
pkg:generic/* |
Lockfile (cdxgen-fs, pyproject.toml/uv.lock) |
17 |
17 |
0 |
0 |
Image: cdxgen -t oci (current default) |
78 |
2 |
0 |
76 |
| Image: Syft (priority 35, currently fallback) |
1809* |
17 |
25 |
0 |
*The 1809 from Syft includes 42 type=library packages + 1766 type=file filesystem entries (separately reported, not conflated with packages) + 1 OS component.
cdxgen-image emits fastapi, httpx, uvicorn as type=file + pkg:generic/fastapi (resolved from /opt/venv/bin/fastapi entry-point scripts), and bundles 73 shared-library entries (libcrypto.so.3, libc.so.6, …) as pkg:generic/*. The actual PyPI package metadata in the venv .dist-info directories is not surfaced.
Syft walks the venv .dist-info directories correctly and produces the same 17-component PyPI closure as the lockfile mode, plus the Chainguard base image's 25 apk packages — without the false-positive pkg:generic/* noise.
Why this isn't Python-specific
The pattern affects every interpreted ecosystem where dependencies live in app-readable form (venv, node_modules, gem caches, vendored directories). Cross-referencing the existing test fixtures (tests/test-data/cgr.dev_chainguard_wolfi-base_latest_cdxgen.cdx.json), all 27 components are pkg:generic/* with cdx:bom:componentTypes: "generic" — confirming cdxgen's OCI walker only produces real purls for OS packages (apk/deb/rpm).
| Ecosystem in image |
cdxgen -t oci |
Syft |
| OS packages (apk/deb/rpm) |
Good purls |
Good purls |
Python (venv .dist-info) |
Generic shims |
Proper pypi purls |
Node.js (node_modules) |
Generic shims |
Proper npm purls |
| Ruby (gem index) |
Likely generic |
Proper gem purls |
| Java (JARs) |
JAR manifest parsing |
Worth testing both |
| Go / Rust (static binaries) |
n/a — no in-image data |
n/a — no in-image data |
For Java specifically, cdxgen has explicit JAR-manifest parsing and may genuinely beat Syft on JAR-heavy images — that case warrants benchmarking before any wholesale removal of CdxgenImageGenerator.
Proposed fix
Three options, in increasing aggressiveness:
Option A — Swap priorities
CdxgenImageGenerator.priority = 40 (or higher), SyftImageGenerator.priority = 20. Syft becomes the default; cdxgen remains a fallback if Syft is unavailable. Single-line change in _generation/generators/syft.py and _generation/generators/cdxgen.py.
Option B — Per-ecosystem priority
If JAR images deserve cdxgen, gate via supports(): cdxgen-image returns True only when the image looks JVM-flavored (detect via docker inspect env vars, or via a --prefer-cdxgen-jvm env flag). Syft becomes the default for everything else.
Option C — Remove CdxgenImageGenerator entirely
If JAR images turn out not to be a real cdxgen advantage in practice (benchmark TBD), retire the image-mode cdxgen path. Lockfile-mode cdxgen (CdxgenFsGenerator) is unaffected and still the default for filesystem/lockfile scans — that path remains valuable for multi-ecosystem lockfile resolution.
Recommended starting point: Option A (priority swap). Reversible, single-line, immediately measurable improvement for the vast majority of users.
Adjacent context
PR #225 already moves toward Syft as the image scanner inside its docker-hub-upstream + syft-overlay merge path (cli/main.py:1258-1267). This issue generalizes that direction to the default image-scan path — every image scan benefits from Syft, not just library/* and dhi.io/* ones.
A complementary issue (filed separately) proposes that the Chainguard auto-detection branch be extended to merge with Syft as well — today it bypasses Syft entirely, dropping COPYd app content. That's a shape change in cli/main.py:1153-1198. This issue is the improvement to the non-detected path.
Reproducer
git clone https://github.com/nissessenap/sbom-generation-example
cd sbom-generation-example
make image-python # build the Chainguard-based Python image
make sbomify-python # lockfile mode → 17 pypi components
make sbomify-image-python # image mode → 2 pypi + 76 generic
# Then run syft directly for comparison:
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
--entrypoint syft sbomifyhub/sbomify-action:26.2.0 \
sbom-generation-example/python:dev -o cyclonedx-json \
| jq '[.components[] | select(.purl|startswith("pkg:pypi"))] | length'
# → 17
Summary
For Docker image scans, sbomify-action's generator registry today prefers
CdxgenImageGenerator(priority 20) overSyftImageGenerator(priority 35) —sbomify_action/_generation/generator.py:74-88,registry.py:97-98(lower priority number wins). This is the default path for any image scan that doesn't match the Chainguard or Docker Hub auto-detection branches.The result:
cdxgen -t oci <image>produces strictly worse SBOMs than Syft for interpreted-language ecosystems, because cdxgen's OCI walker emitspkg:generic/<filename>for venv bin shims and shared libraries instead of resolving Python.dist-info, Nodenode_modules, gem indexes, etc.Concrete evidence — same image, two scanners
Test image: a FastAPI app built on
cgr.dev/chainguard/python:latestwith 17 transitive PyPI dependencies (fastapi,httpx,uvicorn,pydantic,starlette, …).pkg:pypi/*pkg:apk/*pkg:generic/*pyproject.toml/uv.lock)-t oci(current default)*The 1809 from Syft includes 42
type=librarypackages + 1766type=filefilesystem entries (separately reported, not conflated with packages) + 1 OS component.cdxgen-image emits
fastapi,httpx,uvicornastype=file+pkg:generic/fastapi(resolved from/opt/venv/bin/fastapientry-point scripts), and bundles 73 shared-library entries (libcrypto.so.3,libc.so.6, …) aspkg:generic/*. The actual PyPI package metadata in the venv.dist-infodirectories is not surfaced.Syft walks the venv
.dist-infodirectories correctly and produces the same 17-component PyPI closure as the lockfile mode, plus the Chainguard base image's 25 apk packages — without the false-positivepkg:generic/*noise.Why this isn't Python-specific
The pattern affects every interpreted ecosystem where dependencies live in app-readable form (venv,
node_modules, gem caches, vendored directories). Cross-referencing the existing test fixtures (tests/test-data/cgr.dev_chainguard_wolfi-base_latest_cdxgen.cdx.json), all 27 components arepkg:generic/*withcdx:bom:componentTypes: "generic"— confirming cdxgen's OCI walker only produces real purls for OS packages (apk/deb/rpm).-t oci.dist-info)node_modules)For Java specifically, cdxgen has explicit JAR-manifest parsing and may genuinely beat Syft on JAR-heavy images — that case warrants benchmarking before any wholesale removal of
CdxgenImageGenerator.Proposed fix
Three options, in increasing aggressiveness:
Option A — Swap priorities
CdxgenImageGenerator.priority = 40(or higher),SyftImageGenerator.priority = 20. Syft becomes the default; cdxgen remains a fallback if Syft is unavailable. Single-line change in_generation/generators/syft.pyand_generation/generators/cdxgen.py.Option B — Per-ecosystem priority
If JAR images deserve cdxgen, gate via
supports(): cdxgen-image returnsTrueonly when the image looks JVM-flavored (detect viadocker inspectenv vars, or via a--prefer-cdxgen-jvmenv flag). Syft becomes the default for everything else.Option C — Remove
CdxgenImageGeneratorentirelyIf JAR images turn out not to be a real cdxgen advantage in practice (benchmark TBD), retire the image-mode cdxgen path. Lockfile-mode cdxgen (
CdxgenFsGenerator) is unaffected and still the default for filesystem/lockfile scans — that path remains valuable for multi-ecosystem lockfile resolution.Recommended starting point: Option A (priority swap). Reversible, single-line, immediately measurable improvement for the vast majority of users.
Adjacent context
PR #225 already moves toward Syft as the image scanner inside its
docker-hub-upstream + syft-overlaymerge path (cli/main.py:1258-1267). This issue generalizes that direction to the default image-scan path — every image scan benefits from Syft, not justlibrary/*anddhi.io/*ones.A complementary issue (filed separately) proposes that the Chainguard auto-detection branch be extended to merge with Syft as well — today it bypasses Syft entirely, dropping
COPYd app content. That's a shape change incli/main.py:1153-1198. This issue is the improvement to the non-detected path.Reproducer