A comprehensive disk management suite with beautiful terminal interfaces for finding duplicate files and analyzing disk space usage.
- 3-Stage Detection: Size grouping → Partial hash → Full hash (efficient and accurate)
- Parallel Processing: Multi-threaded hashing for fast scans
- Smart Selection: Keep oldest, newest, or shortest path files
- Multiple Actions: Report, delete, move to backup, or replace with hardlinks
- Advanced Filters: Filter by size, extension, skip existing hardlinks
- Fast Hashing: Optional xxhash support (10x faster than SHA-256)
- Largest Files/Directories: Find what's consuming your disk space
- File Age Analysis: Identify old files for potential cleanup
- Extension Breakdown: See which file types use the most space
- Visual Treemap: Terminal-based block visualization
- AI Analysis: Get cleanup recommendations via Ollama (optional)
- Export: Save results to CSV or JSON
- Find Reserved-Name Artifacts: Scan for
nulfiles created by buggy tools - All Reserved Names: Optionally scan for CON, PRN, AUX, COM1-9, LPT1-9
- Safe Deletion: Uses
\\?\extended path prefix to bypass Windows name reservation - Dry Run: Preview what will be deleted before nuking
- Python 3.8 or higher
- uv (recommended) or pip
# Clone the repository
git clone https://github.com/thebiglaskowski/disk-utils.git
cd disk-utils
# Create and activate a virtual environment
uv venv
.venv\Scripts\activate # Windows (cmd / PowerShell)
source .venv/bin/activate # Linux / macOS / WSL
# Install dependencies
uv pip install -r requirements.txt
# Optional: Install faster hashing
uv pip install xxhash
# Optional: Install Ollama integration
uv pip install ollamagit clone https://github.com/thebiglaskowski/disk-utils.git
cd disk-utils
pip install -r requirements.txt
# Optional
pip install xxhash ollama| Package | Required | Purpose |
|---|---|---|
rich |
Yes | Beautiful terminal formatting |
questionary |
Yes | Interactive CLI prompts |
xxhash |
No | Faster hashing (recommended) |
ollama |
No | AI-powered analysis |
Launch the unified menu to access all utilities:
python app.pypython duplicates.py# Basic scan (report only)
python duplicates.py /path/to/scan
# Delete duplicates, keep oldest files
python duplicates.py /path/to/scan --action delete --keep oldest --no-dry-run
# Move duplicates to backup folder
python duplicates.py /path/to/scan --action move --backup-dir /path/to/backup
# Replace duplicates with hardlinks (saves space)
python duplicates.py /path/to/scan --action hardlink --no-dry-run
# Scan only large image files
python duplicates.py /path/to/scan --min-size 1MB --ext jpg,png,gif
# Exclude temp files, use fast hashing
python duplicates.py /path/to/scan --exclude-ext tmp,log --fast-hash| Option | Description |
|---|---|
--action |
Action for duplicates: report, delete, move, hardlink |
--keep |
Which file to keep: oldest, newest, shortest_path |
--dry-run |
Simulate without making changes (default) |
--no-dry-run |
Actually perform the action |
--backup-dir |
Directory to move duplicates (required for move) |
--min-size |
Minimum file size (e.g., 1KB, 10MB, 1GB) |
--max-size |
Maximum file size |
--ext |
Only include these extensions (comma-separated) |
--exclude-ext |
Exclude these extensions |
--fast-hash |
Use xxhash for faster hashing (default if available) |
--secure-hash |
Use SHA-256 instead of xxhash |
--threads |
Number of threads for parallel hashing |
--include-hardlinks |
Don't skip files that are already hardlinked |
python treesize_cli.pyThis launches an interactive menu with options to:
- Scan for largest files
- Scan for largest directories
- Analyze file ages
- Show disk usage
- View treemap visualization
- Get AI analysis (requires Ollama)
- Export results to CSV/JSON
- Configure settings
python nul_scanner.pyOr select it from the main menu. Options:
- Scan for
nulfiles only - Scan for all Windows reserved-name files (CON, PRN, AUX, etc.)
- Info panel explaining what nul files are and why they exist
╔══════════════════════════════════════════════════════════════════╗
║ 🛠️ DISK UTILS - Management Suite 🛠️ ║
║ Duplicate Finder • Disk Space Analyzer ║
╚══════════════════════════════════════════════════════════════════╝
? What would you like to do?
❯ 🔍 Duplicate File Finder
💾 Disk Space Analyzer (TreeSize)
ℹ️ About
🚪 Exit
╭──────────────────────────────────────────────────────────────────╮
│ 🔍 Scanning Directory │
│ 📂 C:\Users\Photos │
│ Hash: xxhash (fast) | Threads: 32 | Skip hardlinks: True │
╰──────────────────────────────────────────────────────────────────╯
Phase 1/3: 🔍 Scanning files...
✓ Found 15,234 files
Phase 2/3: 🔐 Calculating partial hashes...
✓ 1,245 groups remain (3,891 files)
Phase 3/3: 🔐 Calculating full hashes...
✓ Confirmed 523 duplicate groups
╭─────────────────────────────────────────╮
│ 📊 Top 10 Largest Files │
├────┬──────────────┬──────────────────────┤
│ 1 │ 4.52 GB │ ████████████████████ │
│ 2 │ 2.31 GB │ ██████████ │
│ 3 │ 1.87 GB │ ████████ │
│ 4 │ 892.45 MB │ ████ │
│ 5 │ 654.12 MB │ ███ │
╰────┴──────────────┴──────────────────────╯
Settings are saved to ~/.treesize/settings.json:
{
"top_n": 50,
"min_file_size_bytes": 0,
"min_dir_size_bytes": 0,
"follow_symlinks": false,
"default_path": "C:\\",
"max_depth": null,
"quick_mode": false,
"ollama_model": "llama3.2"
}By default, these directories are skipped during scanning:
.git,node_modules,__pycache__.venv,venv,env$RECYCLE.BIN,System Volume InformationWindows,ProgramData,AppData.cache,.npm,.yarn
To enable AI-powered cleanup recommendations:
- Install Ollama: https://ollama.ai
- Pull a model:
ollama pull llama3.2 - Start Ollama:
ollama serve - Install the Python package:
pip install ollama - Select your model in Settings
The AI will analyze your scan results and provide:
- File categorization (temp files, caches, media, etc.)
- Safe-to-delete recommendations
- Files to keep
- Specific cleanup suggestions
- Use xxhash: Install
xxhashfor 10x faster duplicate detection - Quick Mode: Enable in TreeSize settings to skip files under 1MB
- Thread Count: Adjust based on your system (default: 4x CPU cores, max 32)
- Size Filters: Use
--min-sizeto skip small files when finding duplicates - Extension Filters: Focus on specific file types with
--ext
- Dry Run by Default: Duplicate finder simulates actions without making changes
- Atomic Operations: Hardlink replacement uses temp files to prevent data loss
- Smart Skipping: Automatically skips system directories and symlinks
- Existing Hardlinks: Detects and skips files already hardlinked to each other
MIT License - See LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Rich - Beautiful terminal formatting
- Questionary - Interactive CLI prompts
- xxhash - Extremely fast hashing
- Ollama - Local AI models