fix(checkpoint): redact secrets in committed and temporary metadata#333
fix(checkpoint): redact secrets in committed and temporary metadata#333andreidavid wants to merge 5 commits intoentireio:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the Entire CLI checkpoint persistence layer by redacting secrets before writing checkpoint artifacts to both committed metadata branches and temporary shadow-branch checkpoints, reducing the risk of leaking secrets into git history.
Changes:
- Redacts committed session metadata JSON (
metadata.json) and summary updates before persisting blobs. - Redacts task and subagent transcripts during temporary task checkpoint writes, with JSONL-aware redaction and plain-text fallback.
- Redacts files copied from the metadata directory during temporary checkpoint writes and adds regression tests for the new redaction paths.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/strategy/phase_postcommit_test.go | Deduplicates repeated transcript test data via a shared constant. |
| cmd/entire/cli/strategy/manual_commit_test.go | Reuses shared transcript test data constant to reduce duplication. |
| cmd/entire/cli/checkpoint/temporary.go | Adds redaction for temporary task transcripts and metadata-dir files written into shadow-branch commits. |
| cmd/entire/cli/checkpoint/committed.go | Redacts committed session metadata JSON and summary updates before blob persistence. |
| cmd/entire/cli/checkpoint/checkpoint_test.go | Adds regression tests covering committed metadata and temporary checkpoint redaction behavior. |
| return fmt.Errorf("failed to get relative path for %s: %w", path, err) | ||
| } | ||
|
|
||
| blobHash, mode, err := createBlobFromFile(repo, path) | ||
| treePath := filepath.Join(dirPathRel, relWithinDir) | ||
| blobHash, mode, err := createRedactedBlobFromFile(repo, path, treePath) |
There was a problem hiding this comment.
addDirectoryToEntriesWithAbsPath will follow symlinks when it calls createRedactedBlobFromFile (which uses os.Stat/os.ReadFile). A malicious or accidental symlink inside the metadata dir could cause temporary checkpoints to read and persist files outside the metadata directory (e.g. ~/.ssh/*). Please skip symlinks (and ideally enforce that the walked path stays within dirPathAbs), similar to the protections used when copying committed metadata.
| if err := os.WriteFile(fullPath, []byte(`{"content":"secret=`+highEntropySecret+`"}`+"\n"), 0644); err != nil { | ||
| t.Fatalf("failed to write full.jsonl: %v", err) | ||
| } | ||
|
|
||
| notePath := filepath.Join(metadataDir, "notes.txt") | ||
| if err := os.WriteFile(notePath, []byte("secret="+highEntropySecret), 0644); err != nil { | ||
| t.Fatalf("failed to write notes.txt: %v", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
Temporary checkpoint metadata copying/redaction is security-sensitive (symlinks inside the metadata dir can point outside the repo). There is already a committed-path regression test for skipping symlinks (TestCopyMetadataDir_SkipsSymlinks), but this new temporary-path test suite doesn’t cover that case. Adding a similar symlink test for WriteTemporary (or addDirectoryToEntriesWithAbsPath) would help prevent regressions once symlink handling is added.
| if err := os.WriteFile(fullPath, []byte(`{"content":"secret=`+highEntropySecret+`"}`+"\n"), 0644); err != nil { | |
| t.Fatalf("failed to write full.jsonl: %v", err) | |
| } | |
| notePath := filepath.Join(metadataDir, "notes.txt") | |
| if err := os.WriteFile(notePath, []byte("secret="+highEntropySecret), 0644); err != nil { | |
| t.Fatalf("failed to write notes.txt: %v", err) | |
| } | |
| if err := os.WriteFile(fullPath, []byte(`{"content":"secret=`+highEntropySecret+`"}`+"\n"), 0o644); err != nil { | |
| t.Fatalf("failed to write full.jsonl: %v", err) | |
| } | |
| notePath := filepath.Join(metadataDir, "notes.txt") | |
| if err := os.WriteFile(notePath, []byte("secret="+highEntropySecret), 0o644); err != nil { | |
| t.Fatalf("failed to write notes.txt: %v", err) | |
| } | |
| // Regression fixture: symlink inside metadata dir pointing outside the repo/metadata tree. | |
| // This mirrors the committed-path symlink test (TestCopyMetadataDir_SkipsSymlinks) for the | |
| // temporary-path flow, ensuring that temporary checkpoint metadata handling is exercised | |
| // in the presence of symlinks that could otherwise escape the repository. | |
| outsideTarget := filepath.Join(tempDir, "outside-metadata-target.txt") | |
| if err := os.WriteFile(outsideTarget, []byte("outside metadata"), 0o644); err != nil { | |
| t.Fatalf("failed to write outside-metadata-target.txt: %v", err) | |
| } | |
| symlinkPath := filepath.Join(metadataDir, "symlink-outside") | |
| if err := os.Symlink(outsideTarget, symlinkPath); err != nil { | |
| t.Fatalf("failed to create symlink in metadata dir: %v", err) | |
| } |
- redact committed session metadata JSON before persistence (WriteCommitted session payloads) - redact summary updates before rewriting committed metadata - redact temporary task transcripts and subagent transcripts before checkpoint write - redact metadata directory files when writing temporary checkpoints - add regression tests for committed summary and temporary metadata redaction paths
07a772d to
f09af6a
Compare
|
Implemented all Copilot review feedback and pushed commit f09af6a. |
| metadataJSON, err := jsonutil.MarshalIndentWithNewline(sessionMetadata, "", " ") | ||
| if err != nil { | ||
| return filePaths, fmt.Errorf("failed to marshal session metadata: %w", err) | ||
| } | ||
| metadataJSON = redact.Bytes(metadataJSON) | ||
| metadataHash, err := CreateBlobFromContent(s.repo, metadataJSON) | ||
| if err != nil { |
There was a problem hiding this comment.
Redacting the entire marshaled metadata.json with redact.Bytes can unintentionally redact high-entropy identifier fields (e.g., session_id, tool_use_id, checkpoint_uuid) because redact.Bytes is context-free and doesn’t apply the JSONL “skip *id fields” logic used elsewhere. That can make persisted metadata inconsistent or harder to correlate back to sessions/tasks. Consider redacting only the free-form fields (e.g., Summary and other user/LLM-generated text) before marshaling, or introduce a JSON-aware redaction helper that traverses the object and skips keys ending in id/ids (similar to redact.JSONLContent).
| metadataJSON, err := jsonutil.MarshalIndentWithNewline(existingMetadata, "", " ") | ||
| if err != nil { | ||
| return fmt.Errorf("failed to marshal metadata: %w", err) | ||
| } | ||
| metadataJSON = redact.Bytes(metadataJSON) | ||
| metadataHash, err := CreateBlobFromContent(s.repo, metadataJSON) |
There was a problem hiding this comment.
Same concern as the initial write path: applying redact.Bytes to the full marshaled session metadata.json during UpdateSummary may over-redact identifier fields (not just the updated summary) because it doesn’t skip *id keys. Prefer redacting only the new summary value (or using a JSON-aware redaction that skips IDs) so the update doesn’t mutate unrelated metadata fields.
|
Implemented all Copilot review feedback and pushed commit 4b8748f. |
| if err != nil { | ||
| return fmt.Errorf("failed to get relative path for %s: %w", path, err) | ||
| } | ||
| if strings.HasPrefix(relWithinDir, "..") { |
There was a problem hiding this comment.
The path traversal check strings.HasPrefix(relWithinDir, "..") will also reject legitimate files within the directory whose relative path begins with .. (e.g. a file named ..notes.txt). This can cause temporary checkpoint writes to fail even when there is no traversal.
Prefer a stricter check like rel == ".." or strings.HasPrefix(rel, ".."+string(filepath.Separator)) (and/or using filepath.Clean) to only reject actual parent-directory escapes.
| if strings.HasPrefix(relWithinDir, "..") { | |
| relClean := filepath.Clean(relWithinDir) | |
| if relClean == ".." || strings.HasPrefix(relClean, ".."+string(filepath.Separator)) { |
redact/redact.go
Outdated
| repls := collectJSONLReplacements(parsed) | ||
| if len(repls) == 0 { | ||
| return content, nil |
There was a problem hiding this comment.
JSONContent now reuses collectJSONLReplacements / shouldSkipJSONLField, but those names are JSONL-specific and become misleading when the same logic is used for non-JSONL content. Consider renaming these helpers (or adding JSON-generic wrappers) to reflect their broader use and reduce future confusion.
redact/redact.go
Outdated
| // JSONContent parses a JSON string and redacts string values while skipping | ||
| // identifier fields (keys ending in "id") and non-user payload fields. | ||
| func JSONContent(content string) (string, error) { | ||
| var parsed any | ||
| if err := json.Unmarshal([]byte(content), &parsed); err != nil { | ||
| return "", err | ||
| } | ||
|
|
||
| repls := collectJSONLReplacements(parsed) | ||
| if len(repls) == 0 { | ||
| return content, nil | ||
| } | ||
|
|
||
| result := content | ||
| for _, r := range repls { | ||
| origJSON, err := jsonEncodeString(r[0]) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| replJSON, err := jsonEncodeString(r[1]) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| result = strings.ReplaceAll(result, origJSON, replJSON) | ||
| } | ||
| return result, nil |
There was a problem hiding this comment.
JSONContent/JSONBytes claim to preserve identifier fields (keys ending in id), but the current implementation collects (orig, redacted) string pairs and then uses strings.ReplaceAll on the entire JSON text. If the same string value appears in both a skipped ID field and a non-skipped field, the replacement will redact the ID field too, breaking the “preserve ID fields” guarantee.
Consider redacting structurally instead (walk the parsed JSON and replace string values in-place while honoring the skip rules, then re-marshal), or apply replacements only to the specific occurrences/offsets that correspond to non-skipped fields rather than global string replacement.
|
Implemented additional Copilot follow-ups for #333 in commit 9633328:
|
| @@ -355,6 +355,10 @@ func (s *GitStore) writeSessionToSubdirectory(opts WriteCommittedOptions, sessio | |||
| if err != nil { | |||
| return filePaths, fmt.Errorf("failed to marshal session metadata: %w", err) | |||
| } | |||
| metadataJSON, err = redact.JSONBytes(metadataJSON) | |||
| if err != nil { | |||
| return filePaths, fmt.Errorf("failed to redact metadata json: %w", err) | |||
| } | |||
There was a problem hiding this comment.
Redacting the entire marshaled CommittedMetadata JSON can unintentionally modify non-secret identifier/path fields that are required for restore and transcript scoping (e.g. transcript_identifier_at_start, transcript_path). redact.JSONBytes currently only skips keys ending in id/ids, so these fields are still subject to high-entropy redaction. Consider redacting only the user/AI content fields (e.g. Summary) before marshaling, or expand the JSON skip rules to explicitly preserve these metadata fields (and add a regression test for them).
| metadataJSON, err = redact.JSONBytes(metadataJSON) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to redact metadata json: %w", err) | ||
| } |
There was a problem hiding this comment.
UpdateSummary re-marshals and re-redacts the whole session metadata JSON. This can retroactively change unrelated metadata fields (including identifier/path fields used by restore) whenever a summary is updated. Prefer redacting only the summary payload (or explicitly skipping required metadata keys) so UpdateSummary cannot clobber other metadata values.
| metadataJSON, err = redact.JSONBytes(metadataJSON) | |
| if err != nil { | |
| return fmt.Errorf("failed to redact metadata json: %w", err) | |
| } |
redact/redact.go
Outdated
| // JSONContent parses a JSON string and redacts string values while skipping | ||
| // identifier fields (keys ending in "id") and non-user payload fields. | ||
| func JSONContent(content string) (string, error) { | ||
| var parsed any | ||
| if err := json.Unmarshal([]byte(content), &parsed); err != nil { | ||
| return "", err | ||
| } | ||
|
|
||
| redacted := redactJSONValue(parsed) | ||
| if reflect.DeepEqual(parsed, redacted) { | ||
| return content, nil | ||
| } | ||
|
|
||
| redactedJSON, err := json.Marshal(redacted) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return string(redactedJSON), nil |
There was a problem hiding this comment.
JSONContent re-marshals redacted JSON with json.Marshal, which strips indentation and the trailing newline from inputs that were originally formatted (e.g. via jsonutil.MarshalIndentWithNewline). That means files become compact only when redaction occurs, making metadata output inconsistent. Consider marshaling with an Encoder configured with the same indent and newline behavior to keep formatting stable.
| relWithinDir = filepath.Clean(relWithinDir) | ||
| if relWithinDir == ".." || strings.HasPrefix(relWithinDir, ".."+string(filepath.Separator)) { | ||
| return fmt.Errorf("path traversal detected: %s", relWithinDir) | ||
| } | ||
|
|
||
| blobHash, mode, err := createBlobFromFile(repo, path) | ||
| treePath := filepath.Join(dirPathRel, relWithinDir) | ||
| blobHash, mode, err := createRedactedBlobFromFile(repo, path, treePath) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create blob for %s: %w", path, err) | ||
| } | ||
|
|
||
| treePath := filepath.Join(dirPathRel, relWithinDir) | ||
| entries[treePath] = object.TreeEntry{ | ||
| Name: treePath, | ||
| Mode: mode, |
There was a problem hiding this comment.
treePath := filepath.Join(dirPathRel, relWithinDir) can introduce OS-specific path separators (e.g. \ on Windows). The checkpoint tree builder later splits paths on /, so this will produce incorrect tree structure / missing files on Windows. Use Git-style / paths here (e.g. path.Join(...) or filepath.ToSlash(...) for both parts) before inserting into entries.
|
Implemented additional Copilot feedback in commit
Validation:
|
|
|
||
| blobHash, mode, err := createBlobFromFile(repo, path) | ||
| treePath := filepath.ToSlash(filepath.Join(dirPathRel, relWithinDir)) | ||
| treePath = strings.ReplaceAll(treePath, `\`, "/") |
There was a problem hiding this comment.
The ReplaceAll call appears redundant since filepath.ToSlash already converts all backslashes to forward slashes on the previous line. Consider removing this redundant operation.
| treePath = strings.ReplaceAll(treePath, `\`, "/") |
…l tests
Align committed metadata handling with the safer temporary-path behavior and
add regression coverage for the remaining edge cases from review.
Behavior fixes:
- writeFinalTaskCheckpoint: when JSONL redaction fails for subagent transcript,
now logs a warning and falls back to plain text redaction (redact.Bytes)
instead of silently dropping content.
- copyMetadataDir: normalize rel path with filepath.Clean and use strict
traversal detection (".." or "../...") to avoid false positives like
"..notes.txt" while still blocking escapes.
- redact.marshalJSONPreservingFormatting: correctly treat compact JSON with an
optional trailing newline by checking single-line content after trimming one
trailing newline.
Test coverage improvements:
- Add committed-path regression test for malformed JSONL subagent transcript
fallback redaction.
- Add committed-path regression test proving copyMetadataDir accepts leading-dot
filenames (e.g. "..notes.txt").
- Add strong guardrail test for redactSummary field coverage:
field-count tripwires + scalar autofill + key-set checks + non-string
invariants via normalized deep-equality.
- Add compact-JSON-with-trailing-newline JSONBytes test.
- Add t.Parallel() to safe unit tests in checkpoint/redact test files.
Validation:
- go test ./redact ./cmd/entire/cli/checkpoint
Entire-Checkpoint: c172b811ee56
|
Implemented additional review follow-ups in commit What changed:
New/updated tests:
Validation:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
cmd/entire/cli/checkpoint/committed.go:1181
copyMetadataDirbuilds git tree entry paths usingfilepath.Rel/filepath.Clean, which will yield OS-specific separators on Windows (e.g.\). Those paths are later consumed byBuildTreeFromEntries(splits on/) and by git tree lookups (expects/), so metadata files can end up under a single filename containing backslashes rather than the intended directory structure. NormalizerelPath(and thusfullPath) to forward slashes (e.g. viafilepath.ToSlash) before adding toentriesand before passing intocreateRedactedBlobFromFile.
relPath, err := filepath.Rel(metadataDir, path)
if err != nil {
return fmt.Errorf("failed to get relative path for %s: %w", path, err)
}
relPath = filepath.Clean(relPath)
// Prevent path traversal via symlinks pointing outside the metadata dir
if relPath == ".." || strings.HasPrefix(relPath, ".."+string(filepath.Separator)) {
return fmt.Errorf("path traversal detected: %s", relPath)
}
// Create blob from file with secrets redaction
blobHash, mode, err := createRedactedBlobFromFile(s.repo, path, relPath)
if err != nil {
return fmt.Errorf("failed to create blob for %s: %w", path, err)
}
// Store at checkpoint path
fullPath := basePath + relPath
entries[fullPath] = object.TreeEntry{
Name: fullPath,
|
Following up on the Copilot note about
We intentionally keep both lines: treePath := filepath.ToSlash(filepath.Join(dirPathRel, relWithinDir))
treePath = strings.ReplaceAll(treePath, `\\`, "/")
This behavior is covered by |
Summary
Hardened checkpoint metadata persistence by redacting secrets before storing checkpoint artifacts.
Why
Prevents accidental secret leakage in checkpoint artifacts persisted on metadata shadow/commit branches.
Changes
metadata.json.Validation
mise run lintgo test ./cmd/entire/cli/checkpointgo test ./cmd/entire/cli/strategy -run 'TestPostCommit|TestShadowStrategy_CondenseSession_EphemeralBranchTrailer'Notes