refactor: simplify artifact scanning and cleanup flow#85
Conversation
📝 WalkthroughWalkthroughThe PR replaces Rust-only scanning with ecosystem-filtered Rust, Node, and Java discovery, refactors analysis into an ChangesEcosystem scan, analysis, and cleanup
Sequence Diagram(s)sequenceDiagram
participant CommandsScan as commands/scan.rs
participant ApiScan as api::scan
participant ScannerScan as scanner::scan
participant WalkDirs as walk_dirs
participant Detectors as DETECTORS
CommandsScan->>ApiScan: scan(root, ecosystems)
ApiScan->>ScannerScan: scan(root, ecosystems)
ScannerScan->>WalkDirs: collect directories under root
ScannerScan->>Detectors: select detectors for ecosystems
Detectors-->>ScannerScan: detector list
ScannerScan-->>ApiScan: ScanResult
ApiScan-->>CommandsScan: ScanResult
sequenceDiagram
participant CommandsClean as commands/clean.rs
participant ApiClean as api::clean
participant Analyzer as Analyzer
participant CreateCleanupPlan as cleaner::create_cleanup_plan
participant ExecuteCleanup as cleaner::execute_cleanup
CommandsClean->>ApiClean: build_plan(scan)
ApiClean->>Analyzer: analyze(scan)
Analyzer-->>ApiClean: AnalysisResult
ApiClean->>CreateCleanupPlan: create_cleanup_plan(analysis)
CreateCleanupPlan-->>ApiClean: CleanupPlan
ApiClean-->>CommandsClean: CleanupPlan
CommandsClean->>ExecuteCleanup: execute_cleanup(plan)
ExecuteCleanup-->>CommandsClean: CleanupResult
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/dustfril-cli/src/commands/scan.rs`:
- Around line 31-34: The scan command output currently prints only each
artifact’s path, so the new ecosystem information is lost. Update the
user-facing printing in the scan flow in the `scan` command so it includes the
artifact ecosystem alongside the path, using the existing `result.artifacts`
iteration and `artifact` fields. Make the output more readable by formatting
both values together instead of only using `artifact.path`.
In `@crates/dustfril-core/src/analyzer/analyze.rs`:
- Around line 15-24: The artifact ordering in analyze::analyze now comes from
Rayon collection order instead of the previous largest-first behavior. Update
the analyze_artifact pipeline in AnalysisResult construction to sort the
collected artifacts by descending size_bytes before returning, so the CLI output
from analysis_result.artifacts still surfaces the largest cleanup targets first.
- Around line 78-87: The directory traversal in calculate_directory_size and
find_latest_modified is following symlinks via entry.metadata(), which can
recurse into linked directories and count content outside the artifact. Update
the entry handling to use symlink_metadata() in the analyze.rs traversal logic,
detect symlink file types before any recursion, and skip them entirely so only
real files and directories are processed.
In `@crates/dustfril-core/src/analyzer/tests.rs`:
- Around line 47-52: The recommendation assertion in the analyzer test is too
broad and accepts every CleanupRecommendation variant, so it does not verify
behavior. Update the test fixture and/or assertions in analyzer/tests.rs so the
analyzed artifact produced by the relevant test case yields one specific,
deterministic recommendation, and then assert that exact variant using
analysis.artifacts[0].recommendation.
In `@crates/dustfril-core/src/fs/walk.rs`:
- Around line 7-10: The WalkDir iterator in walk_directories is swallowing
traversal errors by converting every Result into an optional entry, which can
hide missing roots or unreadable subtrees. Update the walk logic to propagate
WalkDir failures instead of using a drop-on-error filter, and keep the existing
directory filtering in walk_directories so callers can distinguish an actual
empty scan from an error.
In `@crates/dustfril-core/src/models/cleanup.rs`:
- Around line 49-54: The CleanupCandidate plan is missing the planned filesystem
kind, which causes execute_cleanup to re-detect the live entry instead of
deleting the analyzed artifact. Update CleanupCandidate in cleanup.rs to retain
the expected kind (or equivalent stronger identity data) alongside path,
ecosystem, size_bytes, and age_days, then thread that field through the cleanup
planning and execution flow so execute_cleanup can compare the planned kind
against the current entry and reject mismatches before deleting.
- Around line 29-34: The CleanupRecommendation enum changed the serialized wire
value for the review state, so update its serde behavior to preserve
compatibility with the old "Review" value. In CleanupRecommendation, add a serde
alias or explicit rename handling on the NeedsReview variant so both the new
name and the legacy value are accepted during deserialization while keeping the
intended serialization format stable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: cd10da77-810d-4592-980d-4f531eb9e079
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (54)
apps/dustfril-cli/src/cli.rsapps/dustfril-cli/src/commands/analyze.rsapps/dustfril-cli/src/commands/clean.rsapps/dustfril-cli/src/commands/scan.rscrates/dustfril-core/Cargo.tomlcrates/dustfril-core/src/analyzer/age.rscrates/dustfril-core/src/analyzer/analyze.rscrates/dustfril-core/src/analyzer/metadata.rscrates/dustfril-core/src/analyzer/mod.rscrates/dustfril-core/src/analyzer/recommendation.rscrates/dustfril-core/src/analyzer/size.rscrates/dustfril-core/src/analyzer/tests.rscrates/dustfril-core/src/api/analyze.rscrates/dustfril-core/src/api/clean.rscrates/dustfril-core/src/api/scan.rscrates/dustfril-core/src/cleaner/executor.rscrates/dustfril-core/src/cleaner/plan.rscrates/dustfril-core/src/cleaner/tests.rscrates/dustfril-core/src/detector/java/build.rscrates/dustfril-core/src/detector/java/gradle.rscrates/dustfril-core/src/detector/java/maven.rscrates/dustfril-core/src/detector/java/mod.rscrates/dustfril-core/src/detector/mod.rscrates/dustfril-core/src/detector/node/mod.rscrates/dustfril-core/src/detector/node/node_modules.rscrates/dustfril-core/src/detector/node/package_json.rscrates/dustfril-core/src/detector/project.rscrates/dustfril-core/src/detector/rust/git.rscrates/dustfril-core/src/detector/rust/mod.rscrates/dustfril-core/src/detector/rust/registry.rscrates/dustfril-core/src/detector/rust/target.rscrates/dustfril-core/src/detector/scan.rscrates/dustfril-core/src/detector/tests.rscrates/dustfril-core/src/error.rscrates/dustfril-core/src/fs/mod.rscrates/dustfril-core/src/fs/walk.rscrates/dustfril-core/src/lib.rscrates/dustfril-core/src/models/analysis.rscrates/dustfril-core/src/models/analysis_result.rscrates/dustfril-core/src/models/artifact.rscrates/dustfril-core/src/models/artifact_type.rscrates/dustfril-core/src/models/cleanup.rscrates/dustfril-core/src/models/cleanup_candidate.rscrates/dustfril-core/src/models/cleanup_plan.rscrates/dustfril-core/src/models/cleanup_recommendation.rscrates/dustfril-core/src/models/cleanup_result.rscrates/dustfril-core/src/models/ecosystem.rscrates/dustfril-core/src/models/mod.rscrates/dustfril-core/src/models/scan.rscrates/dustfril-core/src/models/scan_result.rscrates/dustfril-core/src/scanner/detector.rscrates/dustfril-core/src/scanner/mod.rscrates/dustfril-core/src/scanner/scan.rscrates/dustfril-core/src/scanner/tests.rs
💤 Files with no reviewable changes (21)
- crates/dustfril-core/src/detector/project.rs
- crates/dustfril-core/src/detector/mod.rs
- crates/dustfril-core/src/detector/tests.rs
- crates/dustfril-core/src/detector/rust/git.rs
- crates/dustfril-core/src/analyzer/recommendation.rs
- crates/dustfril-core/src/detector/rust/target.rs
- crates/dustfril-core/src/analyzer/size.rs
- crates/dustfril-core/src/analyzer/metadata.rs
- crates/dustfril-core/src/detector/rust/registry.rs
- crates/dustfril-core/src/models/cleanup_recommendation.rs
- crates/dustfril-core/src/models/artifact.rs
- crates/dustfril-core/src/models/artifact_type.rs
- crates/dustfril-core/src/detector/rust/mod.rs
- crates/dustfril-core/src/models/ecosystem.rs
- crates/dustfril-core/src/models/cleanup_candidate.rs
- crates/dustfril-core/src/models/scan_result.rs
- crates/dustfril-core/src/models/analysis_result.rs
- crates/dustfril-core/src/models/cleanup_result.rs
- crates/dustfril-core/src/models/cleanup_plan.rs
- crates/dustfril-core/src/detector/scan.rs
- crates/dustfril-core/src/analyzer/age.rs
| println!("Found {} artifact(s)\n", result.artifacts.len()); | ||
|
|
||
| for artifact in result.artifacts { | ||
| println!("[{}]", artifact.artifact_type); | ||
|
|
||
| println!(" {}\n", artifact.path.display()); | ||
| println!(" {:?}\n", artifact.path); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Show the ecosystem in scan output.
Lines 31-34 print only artifact.path via {:?}, which drops the new ecosystem information and makes the user-facing output harder to read.
Suggested fix
- for artifact in result.artifacts {
- println!(" {:?}\n", artifact.path);
+ for artifact in result.artifacts {
+ println!(" [{}] {}", artifact.ecosystem, artifact.path.display());
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| println!("Found {} artifact(s)\n", result.artifacts.len()); | |
| for artifact in result.artifacts { | |
| println!("[{}]", artifact.artifact_type); | |
| println!(" {}\n", artifact.path.display()); | |
| println!(" {:?}\n", artifact.path); | |
| println!("Found {} artifact(s)\n", result.artifacts.len()); | |
| for artifact in result.artifacts { | |
| println!(" [{}] {}", artifact.ecosystem, artifact.path.display()); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/dustfril-cli/src/commands/scan.rs` around lines 31 - 34, The scan
command output currently prints only each artifact’s path, so the new ecosystem
information is lost. Update the user-facing printing in the scan flow in the
`scan` command so it includes the artifact ecosystem alongside the path, using
the existing `result.artifacts` iteration and `artifact` fields. Make the output
more readable by formatting both values together instead of only using
`artifact.path`.
| let artifacts: Vec<ArtifactAnalysis> = scan_result | ||
| .artifacts | ||
| .into_par_iter() | ||
| .map(Self::analyze_artifact) | ||
| .collect(); | ||
|
|
||
| let total_size_bytes = artifacts.iter().map(|a| a.size_bytes).sum(); | ||
|
|
||
| pub fn analyze(scan_result: ScanResult) -> DustResult<AnalysisResult> { | ||
| let mut result = AnalysisResult::default(); | ||
| Ok(AnalysisResult { | ||
| artifacts, |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Preserve the previous largest-first analysis ordering.
The old analyzer sorted artifacts by descending size, but this now returns Rayon collection order; the CLI prints analysis_result.artifacts directly, so large cleanup targets no longer surface first.
Proposed fix
- let artifacts: Vec<ArtifactAnalysis> = scan_result
+ let mut artifacts: Vec<ArtifactAnalysis> = scan_result
.artifacts
.into_par_iter()
.map(Self::analyze_artifact)
.collect();
+ artifacts.sort_by_key(|artifact| std::cmp::Reverse(artifact.size_bytes));
let total_size_bytes = artifacts.iter().map(|a| a.size_bytes).sum();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let artifacts: Vec<ArtifactAnalysis> = scan_result | |
| .artifacts | |
| .into_par_iter() | |
| .map(Self::analyze_artifact) | |
| .collect(); | |
| let total_size_bytes = artifacts.iter().map(|a| a.size_bytes).sum(); | |
| pub fn analyze(scan_result: ScanResult) -> DustResult<AnalysisResult> { | |
| let mut result = AnalysisResult::default(); | |
| Ok(AnalysisResult { | |
| artifacts, | |
| let mut artifacts: Vec<ArtifactAnalysis> = scan_result | |
| .artifacts | |
| .into_par_iter() | |
| .map(Self::analyze_artifact) | |
| .collect(); | |
| artifacts.sort_by_key(|artifact| std::cmp::Reverse(artifact.size_bytes)); | |
| let total_size_bytes = artifacts.iter().map(|a| a.size_bytes).sum(); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/dustfril-core/src/analyzer/analyze.rs` around lines 15 - 24, The
artifact ordering in analyze::analyze now comes from Rayon collection order
instead of the previous largest-first behavior. Update the analyze_artifact
pipeline in AnalysisResult construction to sort the collected artifacts by
descending size_bytes before returning, so the CLI output from
analysis_result.artifacts still surfaces the largest cleanup targets first.
| for entry in entries.flatten() { | ||
| let Ok(metadata) = entry.metadata() else { | ||
| continue; | ||
| }; | ||
|
|
||
| if metadata.is_file() { | ||
| total_size += metadata.len(); | ||
| } else if metadata.is_dir() { | ||
| total_size += calculate_directory_size(&entry.path()); | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify analyzer recursive traversal currently uses metadata() and not symlink_metadata().
rg -n -C3 'entry\.metadata\(\)|symlink_metadata|follow_links' crates/dustfril-core/src/analyzer crates/dustfril-core/src/fsRepository: FrilLab/dustfril
Length of output: 1122
🏁 Script executed:
#!/bin/bash
sed -n '1,180p' crates/dustfril-core/src/analyzer/analyze.rs | cat -nRepository: FrilLab/dustfril
Length of output: 4062
🏁 Script executed:
#!/bin/bash
python3 - <<'PY'
from pathlib import Path
p = Path("crates/dustfril-core/src/analyzer/analyze.rs")
for i, line in enumerate(p.read_text().splitlines(), 1):
if 70 <= i <= 120:
print(f"{i:4d}: {line}")
PYRepository: FrilLab/dustfril
Length of output: 1638
🏁 Script executed:
#!/bin/bash
sed -n '1,170p' crates/dustfril-core/src/analyzer/analyze.rs | nl -ba | sed -n '70,120p'Repository: FrilLab/dustfril
Length of output: 190
🏁 Script executed:
#!/bin/bash
nl -ba crates/dustfril-core/src/analyzer/analyze.rs | sed -n '1,150p'Repository: FrilLab/dustfril
Length of output: 190
Skip symlinked directories in traversal
entry.metadata() follows symlinks, so both calculate_directory_size and find_latest_modified can descend into symlinked directories, count targets outside the artifact, and recurse through cycles. Use symlink_metadata() here and skip symlinks before recursing. crates/dustfril-core/src/analyzer/analyze.rs:71-120
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/dustfril-core/src/analyzer/analyze.rs` around lines 78 - 87, The
directory traversal in calculate_directory_size and find_latest_modified is
following symlinks via entry.metadata(), which can recurse into linked
directories and count content outside the artifact. Update the entry handling to
use symlink_metadata() in the analyze.rs traversal logic, detect symlink file
types before any recursion, and skip them entirely so only real files and
directories are processed.
| assert!(matches!( | ||
| analysis.artifacts[0].recommendation, | ||
| CleanupRecommendation::Keep | ||
| | CleanupRecommendation::NeedsReview | ||
| | CleanupRecommendation::SafeToClean | ||
| )); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
This recommendation test is currently vacuous.
Lines 47-52 accept every CleanupRecommendation variant, so the test only proves the field is well-typed. Please make the fixture deterministic enough to assert one specific recommendation; otherwise recommendation regressions will still pass.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/dustfril-core/src/analyzer/tests.rs` around lines 47 - 52, The
recommendation assertion in the analyzer test is too broad and accepts every
CleanupRecommendation variant, so it does not verify behavior. Update the test
fixture and/or assertions in analyzer/tests.rs so the analyzed artifact produced
by the relevant test case yields one specific, deterministic recommendation, and
then assert that exact variant using analysis.artifacts[0].recommendation.
| WalkDir::new(root) | ||
| .into_iter() | ||
| .filter_map(|e| e.ok()) | ||
| .filter(|e| e.file_type().is_dir()) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
Propagate walk failures instead of dropping them.
Lines 7-10 turn every WalkDir error into “no entry,” so a missing root or unreadable subtree can surface as a successful but incomplete scan. That makes Ok(empty) indistinguishable from “nothing was found.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/dustfril-core/src/fs/walk.rs` around lines 7 - 10, The WalkDir
iterator in walk_directories is swallowing traversal errors by converting every
Result into an optional entry, which can hide missing roots or unreadable
subtrees. Update the walk logic to propagate WalkDir failures instead of using a
drop-on-error filter, and keep the existing directory filtering in
walk_directories so callers can distinguish an actual empty scan from an error.
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | ||
| pub enum CleanupRecommendation { | ||
| Keep, | ||
| NeedsReview, | ||
| SafeToClean, | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n --fixed-strings 'CleanupRecommendation::Review' .
rg -n --fixed-strings '"Review"' . -g '*.rs' -g '*.json' -g '*.snap' -g '*.yaml' -g '*.yml'Repository: FrilLab/dustfril
Length of output: 154
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the enum and its consumers/serialization context.
rg -n --context 3 'enum CleanupRecommendation|CleanupRecommendation' crates/dustfril-core/src
printf '\n---\n'
sed -n '1,220p' crates/dustfril-core/src/models/cleanup.rs
printf '\n---\n'
rg -n --context 2 'serde|Serialize|Deserialize|CleanupPlan|CleanupResult|CleanupCandidate' crates/dustfril-core/src/models crates/dustfril-core/src/cleanerRepository: FrilLab/dustfril
Length of output: 18281
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the enum and its consumers/serialization context.
rg -n --context 3 'enum CleanupRecommendation|CleanupRecommendation' crates/dustfril-core/src
printf '\n---\n'
sed -n '1,220p' crates/dustfril-core/src/models/cleanup.rs
printf '\n---\n'
rg -n --context 2 'serde|Serialize|Deserialize|CleanupPlan|CleanupResult|CleanupCandidate' crates/dustfril-core/src/models crates/dustfril-core/src/cleanerRepository: FrilLab/dustfril
Length of output: 18281
Preserve the old serde value for NeedsReview.
This public enum is serialized with the variant name by default, so renaming Review to NeedsReview changes the wire value and will break any persisted JSON or external client still sending "Review". Add a serde alias or explicit rename policy before shipping.
Compatibility fix
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum CleanupRecommendation {
Keep,
+ #[serde(alias = "Review")]
NeedsReview,
SafeToClean,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | |
| pub enum CleanupRecommendation { | |
| Keep, | |
| NeedsReview, | |
| SafeToClean, | |
| } | |
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | |
| pub enum CleanupRecommendation { | |
| Keep, | |
| #[serde(alias = "Review")] | |
| NeedsReview, | |
| SafeToClean, | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/dustfril-core/src/models/cleanup.rs` around lines 29 - 34, The
CleanupRecommendation enum changed the serialized wire value for the review
state, so update its serde behavior to preserve compatibility with the old
"Review" value. In CleanupRecommendation, add a serde alias or explicit rename
handling on the NeedsReview variant so both the new name and the legacy value
are accepted during deserialization while keeping the intended serialization
format stable.
| pub struct CleanupCandidate { | ||
| pub path: PathBuf, | ||
| pub ecosystem: Ecosystem, | ||
| pub size_bytes: u64, | ||
| pub age_days: Option<u64>, | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Keep the planned filesystem kind in CleanupCandidate.
Dropping the expected kind from the cleanup plan forces execute_cleanup to re-detect the live entry at delete time. That changes cleanup from “delete the analyzed artifact” to “delete whatever currently exists at this path”, so a concurrent replacement can remove the wrong object. Preserve the planned kind (or stronger identity data) in the plan and reject mismatches during execution.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/dustfril-core/src/models/cleanup.rs` around lines 49 - 54, The
CleanupCandidate plan is missing the planned filesystem kind, which causes
execute_cleanup to re-detect the live entry instead of deleting the analyzed
artifact. Update CleanupCandidate in cleanup.rs to retain the expected kind (or
equivalent stronger identity data) alongside path, ecosystem, size_bytes, and
age_days, then thread that field through the cleanup planning and execution flow
so execute_cleanup can compare the planned kind against the current entry and
reject mismatches before deleting.
What
refactor: simplify scanner and analyzer architecture
Closes #
Checklist
Required
cargo checkpassescargo fmt --checkpassescargo clippy --workspace --all-targets -- -D warningspassescargo testpassesFunctional Validation
Documentation
Safety
Summary by CodeRabbit
New Features
Bug Fixes