Skip to content

feat(cli): explode archive materials in chainloop att add#3253

Closed
javirln wants to merge 14 commits into
chainloop-dev:mainfrom
javirln:zip-files
Closed

feat(cli): explode archive materials in chainloop att add#3253
javirln wants to merge 14 commits into
chainloop-dev:mainfrom
javirln:zip-files

Conversation

@javirln

@javirln javirln commented Jun 30, 2026

Copy link
Copy Markdown
Member

Summary

chainloop attestation add --kind <KIND> --value <archive> now unpacks .zip, .tar, .tar.gz, and .tgz archives and records each contained regular file as its own material of the given kind, instead of requiring one att add invocation per file.

Behavior

  • When --kind is provided and the value resolves to a supported archive, each contained regular file is added as an individual material of that kind.
  • Archive-native kinds (e.g. ZAP_DAST_ZIP), whose value is meant to be the archive itself, are still recorded whole.
  • Per-entry material names are derived from the entry filename (sanitized to DNS-1123, with collision suffixes); --name, when provided, is used as a name prefix.
  • Annotations passed via --annotation are applied to every extracted material.
  • The operation is atomic: either all qualifying entries are added or none.
  • Extraction is guarded against zip bombs and path traversal: --max-extract-entries (default 10000) and --max-extract-size (default 1GiB) are enforced while streaming, and directory, symlink, absolute, and path-traversal entries are skipped or rejected.

With no --kind, and for non-archive values, behavior is unchanged. There are no server-side, contract, or gRPC changes.

AI disclosure

This contribution was assisted by Claude Code.

Review in cubic

javirln added 14 commits June 29, 2026 17:45
Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>
Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Strengthen path validation in archive extraction to explicitly reject:
- Absolute paths (e.g., /etc/passwd)
- Relative path traversal attempts (e.g., ../ or foo/../..)

Added direct unit test of safeArchivePath and integration test via
tar with traversal entry to verify rejection.

Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Adds an exported AddMaterialsFromArchive method on the Crafter that walks
every entry in a zip/tar/tar.gz archive, stages each entry as an independent
material via stageMaterial, and commits the in-memory state in a single
stateManager.Write call. If any entry fails, previously staged entries are
rolled back so no partial state is ever persisted.

Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
…ry on archive expand

Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Add MaxExtractEntries/MaxExtractSize to AttestationAddOpts and wire them
into AttestationAdd with defaults from materials.DefaultArchiveLimits().
Change Run to return ([]*AttestationStatusMaterial, error) and insert an
explode branch before the single-material switch: when --kind is set and
the value is a non-archive-native archive, delegate to
crafter.AddMaterialsFromArchive and return N results; otherwise the
single-material path continues unchanged and its result is wrapped in a
1-element slice. Add shouldExplode helper and TestShouldExplode.

Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Add TestAddMaterialsFromArchiveBehavior to crafter_test.go covering five
end-to-end scenarios via AddMaterialsFromArchive: name collision suffixing
(scan-json / scan-json-1), name prefix, skipping dirs and symlinks in tar.gz,
path-traversal rejection with atomic rollback, and tar.gz happy path with two
materials. Fixtures are built programmatically with buildZip/buildTarGz helpers
using t.TempDir(); no binary fixtures committed.

Also regenerate app/cli/documentation/cli-reference.mdx to include the
--max-extract-entries and --max-extract-size flags added in earlier tasks.

Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
… tests

Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
…uards

- detectByMagic now returns (ArchiveNone, nil) when os.Open fails so
  non-file material values (STRING, CONTAINER_IMAGE) never produce a
  spurious "no such file" error through the shouldExplode path.
- safeArchivePath drops the over-broad strings.Contains(name, "..")
  early-return that wrongly rejected legitimate filenames like
  foo..bar.json; traversal detection now relies solely on path.Clean
  against a virtual root (/root/), which is the only reliable check.
- Add a Warn log when --policy-input-from-file is supplied with an
  archive value so users know evidence cross-links are skipped on the
  explode path.
- Name per-entry temp files with the allocated unique material name
  (matName) instead of filepath.Base(name) to eliminate basename
  collisions; remove each temp file immediately after staging to keep
  disk usage bounded.
- Extend TestDetectArchive with .tar and .tgz cases; add writeTar
  helper mirroring writeTarGz without the gzip layer.

Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: ef6d3cdb-5a23-445c-b39a-510b659023e4
Render multi-material explode output as a single JSON array so
--output json stays a parseable document; only the table renderer is
emitted per material. Drop the unused size parameter from the archive
visit signature, document the intentional zero-length-entry skips and
the zip symlink-detection limitation, and correct the --max-extract-size
default label to 1GiB. Build the crafter archive test fixture in-process
and drop the checked-in binary blob and SDD process artifacts.

Assisted-by: Claude Code
Signed-off-by: Javier Rodriguez <javier@chainloop.dev>

Chainloop-Trace-Sessions: da72e107-14e9-4da1-add0-28004f542628, ef6d3cdb-5a23-445c-b39a-510b659023e4
@chainloop-platform

Copy link
Copy Markdown
Contributor

AI Session Analysis

Missing AI Coding Sessions

We detected commits in this PR that were AI-assisted, but the matching Chainloop Trace session(s) could not be found in Chainloop.

Please make sure the AI coding session evidence has been sent by the Chainloop CLI, or add the skip-ai-session label to this PR to bypass this check.

Learn more about Chainloop Trace.


Powered by Chainloop and Chainloop Trace

@javirln

javirln commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

Superseded by the same-repo PR (branch pushed directly to chainloop-dev/chainloop so CI runs with full access).

@javirln javirln closed this Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant