Complete command-line reference for FilePrepper v0.4.9
fileprepper <command> [options]
# Get help
fileprepper --help
fileprepper <command> --helpAvailable for all commands:
| Option | Aliases | Description | Default |
|---|---|---|---|
--input |
-i |
Input file path | Required |
--output |
-o |
Output file path | Required |
--has-header |
--header |
Input has header row | true |
--ignore-errors |
- | Continue on errors | false |
--verbose |
-v |
Enable verbose output | false |
--encoding |
- | File encoding (auto, utf-8, cp949, euc-kr) | auto |
--skip-rows |
- | Number of rows to skip before header | 0 |
- CSV (.csv) - Comma-Separated Values
- TSV (.tsv) - Tab-Separated Values
- JSON (.json) - JavaScript Object Notation
- XML (.xml) - Extensible Markup Language
- Excel (.xlsx, .xls) - Microsoft Excel Spreadsheet
Add new columns with constant or calculated values.
Syntax:
fileprepper add-columns -i INPUT -o OUTPUT --columns DEFINITIONS [OPTIONS]Example:
# Add Status and Priority columns with fixed values
fileprepper add-columns -i data.csv -o output.csv \
--columns "Status:Active,Priority:High" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)--columns- Column definitions in formatName:Value(comma-separated, required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Remove specified columns from the dataset.
Syntax:
fileprepper remove-columns -i INPUT -o OUTPUT -c COLUMNS [OPTIONS]Example:
# Remove temporary and debug columns
fileprepper remove-columns -i data.csv -o cleaned.csv \
-c "Temporary,Notes,DebugInfo" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --columns- Columns to remove (comma-separated, required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Rename columns using name mappings.
Syntax:
fileprepper rename-columns -i INPUT -o OUTPUT --map MAPPINGS [OPTIONS]Example:
# Rename columns for clarity
fileprepper rename-columns -i data.csv -o renamed.csv \
--map "OldName:NewName,user_age:Age,dept:Department" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)--map- Rename mappings in formatOldName:NewName(comma-separated, required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Change the order of columns in the output.
Syntax:
fileprepper reorder-columns -i INPUT -o OUTPUT -c COLUMN_ORDER [OPTIONS]Example:
# Reorder columns to put ID first
fileprepper reorder-columns -i data.csv -o reordered.csv \
-c "ID,Name,Email,Age,Department" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --columns- Desired column order (comma-separated, required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Perform mathematical or string operations between columns.
Syntax:
fileprepper column-interaction -i INPUT -o OUTPUT --source COLUMNS --operation OP --output NAME [OPTIONS]Example:
# Calculate total from price and quantity
fileprepper column-interaction -i sales.csv -o calculated.csv \
--source "Price,Quantity" --operation Multiply --output "Total" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)--source- Source columns (comma-separated, required)--operation- Operation: Add, Subtract, Multiply, Divide, Concatenate (required)--output- Output column name (required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Convert column data types with optional format specifications.
Syntax:
fileprepper convert-type -i INPUT -o OUTPUT -c CONVERSIONS [OPTIONS]Example:
# Convert multiple column types
fileprepper convert-type -i data.csv -o typed.csv \
-c "Date:DateTime:yyyy-MM-dd,Age:Integer,Price:Decimal,Active:Boolean" \
--culture "en-US" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --conversions- Conversions in formatColumn:Type[:Format](comma-separated, required)--culture- Culture for parsing (default: en-US)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Supported Types: String, Integer, Decimal, DateTime, Boolean
Extract date components (year, month, day, etc.) from date columns.
Syntax:
fileprepper extract-date -i INPUT -o OUTPUT --column COLUMN --components PARTS [OPTIONS]Example:
# Extract year, month, and day of week from order date
fileprepper extract-date -i orders.csv -o extracted.csv \
--column "OrderDate" --components "Year,Month,Day,DayOfWeek" \
--format "yyyy-MM-dd" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)--column- Source date column (required)--components- Components to extract: Year, Month, Day, Hour, Minute, Second, DayOfWeek (comma-separated, required)--format- Custom date format (optional)--culture- Culture for parsing (default: en-US)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Fill missing or empty values in columns using various strategies.
Syntax:
fileprepper fill-missing -i INPUT -o OUTPUT --methods METHODS [OPTIONS]Example:
# Fill Age with mean and City with fixed value
fileprepper fill-missing -i data.csv -o filled.csv \
--methods "Age:Mean,City:FixedValue:Unknown" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)--methods,-m- Fill methods in formatColumn:Method[:Value](comma-separated, required)--append-to-source- Append result to source file (default: false)--output-column- Output column template (optional)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Fill Methods: Mean, Median, Mode, ForwardFill, BackwardFill, FixedValue
Normalize numeric columns to a specific range.
Syntax:
fileprepper normalize -i INPUT -o OUTPUT -c COLUMNS -m METHOD [OPTIONS]Example:
# Normalize age and salary to 0-1 range
fileprepper normalize -i data.csv -o normalized.csv \
-c "Age,Salary,Score" -m MinMax --min 0 --max 1 --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --columns- Columns to normalize (comma-separated, required)-m, --method- Normalization method: MinMax, ZScore (required)--min- Min value for MinMax (default: 0)--max- Max value for MinMax (default: 1)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Scale numeric columns using statistical methods.
Syntax:
fileprepper scale -i INPUT -o OUTPUT -c COLUMNS -m METHOD [OPTIONS]Example:
# Standardize height and weight
fileprepper scale -i data.csv -o scaled.csv \
-c "Height,Weight,BMI" -m Standardization --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --columns- Columns to scale (comma-separated, required)-m, --method- Scaling method: MinMax, Standardization (required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Convert categorical variables to binary (0/1) columns.
Syntax:
fileprepper one-hot-encoding -i INPUT -o OUTPUT -c COLUMNS [OPTIONS]Example:
# One-hot encode category and type columns
fileprepper one-hot-encoding -i data.csv -o encoded.csv \
-c "Category,Type" --prefix "Cat_,Type_" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --columns- Categorical columns to encode (comma-separated, required)--prefix- Prefixes for encoded columns (comma-separated, optional)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Replace specific values in columns.
Syntax:
fileprepper replace -i INPUT -o OUTPUT --column COLUMN --old VALUES --new VALUE [OPTIONS]Example:
# Replace N/A and Unknown with Pending
fileprepper replace -i data.csv -o replaced.csv \
--column "Status" --old "N/A,Unknown,null" --new "Pending" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)--column- Target column (required)--old- Values to replace (comma-separated, required)--new- Replacement value (required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Group data and perform aggregations.
Syntax:
fileprepper aggregate -i INPUT -o OUTPUT --group COLUMNS --aggregations FUNCS [OPTIONS]Example:
# Group by category and region, calculate sum and average
fileprepper aggregate -i sales.csv -o aggregated.csv \
--group "Category,Region" \
--aggregations "Sales:Sum,Quantity:Avg,Orders:Count" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)--group- Grouping columns (comma-separated, required)--aggregations- Aggregations in formatColumn:Function(Sum/Avg/Min/Max/Count, comma-separated, required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Calculate comprehensive statistics on numeric columns.
Syntax:
fileprepper stats -i INPUT -o OUTPUT -c COLUMNS --stats STATISTICS [OPTIONS]Example:
# Calculate mean, min, max for age and salary
fileprepper stats -i employees.csv -o stats.csv \
-c "Age,Salary,Score" --stats "Mean,Min,Max,Median,StandardDeviation" \
--suffix "_stat" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --columns- Columns to analyze (comma-separated, required)-s, --stats- Statistics to calculate (comma-separated, required)--suffix- Suffix for output column names (default: "_stat")--default-value- Default value for calculation errors (optional)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Available Statistics: Mean, StandardDeviation, Min, Max, Median, Q1, Q3, ZScore, RobustZScore, PercentRank, MAD
Create lag features from time series data for machine learning.
Syntax:
fileprepper create-lag-features -i INPUT -o OUTPUT -g GROUP -t TIME -l COLUMNS -p PERIODS [OPTIONS]Example:
# Create lag features for predictive modeling
fileprepper create-lag-features -i timeseries.csv -o features.csv \
--group-by "PartNumber" --time-column "Date" \
--lag-columns "Value,Temperature" --lag-periods "1,2,3" \
--target "FailureStatus" --drop-nulls --verboseOptions:
-i, --input- Input CSV file path (required)-o, --output- Output file path (required)-g, --group-by- Column to group by (e.g., Part Number, Entity ID) (required)-t, --time-column- Column representing time/sequence for sorting (required)-l, --lag-columns- Columns to create lag features from (comma-separated, required)-p, --lag-periods- Lag periods (e.g., 1,2,3) (comma-separated, required)--target- Target column to predict (optional, kept in output)--drop-nulls- Drop rows with null lag values (default: true)-k, --keep-columns- Additional columns to keep in output (comma-separated, optional)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Use Case: Time series forecasting, predictive maintenance, sequential pattern analysis
Remove duplicate rows based on specified columns.
Syntax:
fileprepper drop-duplicates -i INPUT -o OUTPUT -c COLUMNS [OPTIONS]Example:
# Remove duplicates based on email
fileprepper drop-duplicates -i contacts.csv -o unique.csv \
-c "Email" --keep First --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --columns- Columns to check for duplicates (comma-separated, required)--keep- Which duplicate to keep: First or Last (default: First)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Filter rows based on column conditions.
Syntax:
fileprepper filter-rows -i INPUT -o OUTPUT -c CONDITIONS [OPTIONS]Example:
# Filter rows where Age > 30 AND Status is Active
fileprepper filter-rows -i data.csv -o filtered.csv \
-c "Age:GreaterThan:30,Status:Equals:Active" --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-c, --conditions- Filter conditions in formatColumn:Operator:Value(comma-separated, required)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Operators: Equals, NotEquals, GreaterThan, GreaterOrEqual, LessThan, LessOrEqual, Contains, NotContains, StartsWith, EndsWith
Merge multiple files vertically (concatenate) or horizontally (join).
Syntax:
fileprepper merge -i FILE1 FILE2 [FILE3...] -o OUTPUT -t TYPE [OPTIONS]
fileprepper merge -p "data/*.csv" -o OUTPUT -t TYPE [OPTIONS]Examples:
# Vertical merge (stack rows from multiple files)
fileprepper merge -i file1.csv file2.csv file3.csv -o merged.csv \
--type Vertical --verbose
# Horizontal merge (join on ID column)
fileprepper merge -i users.csv salaries.csv -o combined.csv \
--type Horizontal --join-type Inner --key-columns "ID" --verbose
# Merge all CSV files matching a glob pattern
fileprepper merge -p "sensor-data/*.csv" -o merged.csv \
--type Vertical --verboseOptions:
-i, --input- Input file paths (space-separated, minimum 2 files)-p, --input-pattern- Glob pattern for input files (e.g.,data/*.csv)-o, --output- Output file path (required)-t, --type- Merge type: Vertical (concatenate rows) or Horizontal (join columns) (required)-j, --join-type- Join type for horizontal merge: Inner, Left, Right, Full (default: Inner)-k, --key-columns- Key columns for horizontal merge (comma-separated, required for horizontal)--has-header- Input files have headers (default: true)--verbose- Show detailed progress
Note: Use either
--inputor--input-pattern, not both.
Remove constant or near-constant columns from the dataset.
Syntax:
fileprepper remove-constants -i INPUT -o OUTPUT [OPTIONS]Examples:
# Remove columns where all values are identical
fileprepper remove-constants -i data.csv -o cleaned.csv --verbose
# Remove columns with less than 1% unique values
fileprepper remove-constants -i data.csv -o cleaned.csv \
--threshold 0.01 --verbose
# Report-only mode (list constant columns without removing)
fileprepper remove-constants -i data.csv -o data.csv \
--report-only --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)--threshold- Unique ratio threshold (0.0 = exact constants only, 0.01 = below 1% unique ratio) (default: 0.0)--report-only- List constant columns without removing them (default: false)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Sample data using various sampling methods.
Syntax:
fileprepper data-sampling -i INPUT -o OUTPUT -m METHOD [OPTIONS]Example:
# Random sample 1000 rows
fileprepper data-sampling -i large.csv -o sample.csv \
-m Random --size 1000 --verbose
# Stratified sampling by category
fileprepper data-sampling -i data.csv -o stratified.csv \
-m Stratified --column "Category" --ratio 0.3 --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-m, --method- Sampling method: Random, Systematic, Stratified (required)--size- Sample size (number of rows, optional)--ratio- Sample ratio 0.0-1.0 (optional)--column- Stratification column (required for Stratified method)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Convert between different file formats.
Syntax:
fileprepper convert-format -i INPUT -o OUTPUT -f FORMAT [OPTIONS]Examples:
# Convert CSV to JSON
fileprepper convert-format -i data.csv -o data.json \
-f JSON --pretty --verbose
# Convert CSV to XML
fileprepper convert-format -i data.csv -o data.xml \
-f XML --pretty --verbose
# Convert Excel to CSV
fileprepper convert-format -i data.xlsx -o data.csv \
-f CSV --verboseOptions:
-i, --input- Input file path (required)-o, --output- Output file path (required)-f, --format- Target format: CSV, TSV, PSV, JSON, XML (required)--pretty- Pretty-print JSON/XML (default: false)--has-header- Input has headers (default: true)--verbose- Show detailed progress
Most commands support processing multiple columns simultaneously for efficiency:
# Normalize 5 columns at once
fileprepper normalize -i data.csv -o normalized.csv \
-c "Age,Height,Weight,Salary,Score" -m MinMax --verbose
# Convert 4 types simultaneously
fileprepper convert-type -i data.csv -o typed.csv \
-c "Date:DateTime:yyyy-MM-dd,Age:Integer,Active:Boolean,Salary:Decimal"
# Fill missing values in 3 columns with different strategies
fileprepper fill-missing -i data.csv -o filled.csv \
--methods "Age:Mean,City:Mode,Score:Median"# Ignore errors and continue processing
fileprepper normalize -i dirty.csv -o normalized.csv \
-c "Age,Salary" -m MinMax --ignore-errors --verbose
# Use default value for calculation errors
fileprepper stats -i data.csv -o stats.csv \
-c "Value" --stats "Mean" --default-value "0" --ignore-errorsChain multiple commands for complex data transformations:
# Complete data preparation pipeline
# Step 1: Clean missing values
fileprepper fill-missing -i raw.csv -o step1.csv \
--methods "Age:Mean,City:Mode" --verbose
# Step 2: Filter valid data
fileprepper filter-rows -i step1.csv -o step2.csv \
-c "Age:GreaterThan:0,Age:LessThan:120" --verbose
# Step 3: Normalize numeric columns
fileprepper normalize -i step2.csv -o step3.csv \
-c "Age,Salary,Score" -m MinMax --verbose
# Step 4: Create lag features for time series
fileprepper create-lag-features -i step3.csv -o step4.csv \
--group-by "ID" --time-column "Date" \
--lag-columns "Score" --lag-periods "1,2,3" --verbose
# Step 5: Convert to JSON format
fileprepper convert-format -i step4.csv -o final.json \
-f JSON --pretty --verbose
# Cleanup intermediate files
rm step1.csv step2.csv step3.csv step4.csvProcess multiple files with loops:
# Process all CSV files in a directory
for file in data/*.csv; do
output="processed/$(basename "$file")"
fileprepper normalize -i "$file" -o "$output" -c "Age,Salary" -m MinMax
done
# Apply same transformation to multiple files
files="jan.csv feb.csv mar.csv"
for file in $files; do
fileprepper filter-rows -i "$file" -o "filtered_$file" \
-c "Status:Equals:Active"
done- Test with small files first - Verify commands work correctly on sample data before processing large datasets
- Use --help extensively - Every command has detailed built-in help:
fileprepper <command> --help - Check headers - Ensure
--has-headermatches your data (default is true) - Enable verbose mode - Use
--verboseto see detailed progress and validation results - Error handling - Use
--ignore-errorsfor dirty data that may have parsing issues - Backup originals - Always keep copies of original data before transforming
- Watch column names - Column names are case-sensitive and must match exactly
- Quote arguments - Use quotes for arguments with spaces or special characters
- Pipeline incrementally - Build complex workflows step-by-step, validating each stage
# Install as global .NET tool
dotnet tool install -g fileprepper-cli
# Update if already installed
dotnet tool update -g fileprepper-cli
# Uninstall and reinstall
dotnet tool uninstall -g fileprepper-cli
dotnet tool install -g fileprepper-cli
# Run from source for development
cd src/FilePrepper.CLI
dotnet run -- <command> [options]- Ensure .NET global tools path is in your PATH environment variable
- On Windows:
%USERPROFILE%\.dotnet\tools - On Linux/Mac:
~/.dotnet/tools
- Check for typos - column names are case-sensitive
- Use exact names from CSV headers
- Quote column names with spaces:
--columns "First Name,Last Name" - Verify the file has headers if using
--has-header true
- Check file encoding (UTF-8 recommended)
- Verify CSV delimiter matches file (comma for CSV, tab for TSV)
- Look for malformed rows or inconsistent column counts
- Use
--ignore-errorsto skip problematic rows
- Large files (>1GB) may need increased memory
- Consider splitting large files into smaller batches
- Use appropriate data types (Integer vs Decimal) for better performance
- Close other applications to free up system resources
- For very large datasets, consider using a database or big data tools
"Input file not found"
- Verify file path is correct (use absolute paths or correct relative paths)
- Check file exists:
ls <file>(Linux/Mac) ordir <file>(Windows)
"Invalid argument format"
- Check command syntax with
--help - Ensure comma-separated lists have no spaces (or quote the entire argument)
- Verify operators and method names are spelled correctly
"Column not found"
- Verify column name matches exactly (case-sensitive)
- Check file has headers if using
--has-header true - List headers:
head -1 file.csv
"Failed to parse value"
- Data may not match expected type (e.g., text in numeric column)
- Use
--ignore-errorsto skip invalid values - Specify
--default-valuefor error cases
- Built-in help:
fileprepper --helporfileprepper <command> --help - Documentation: https://github.com/iyulab/FilePrepper
- Issues: https://github.com/iyulab/FilePrepper/issues
- Examples: See
docs/Common-Scenarios.mdfor real-world examples
- Auto encoding detection (CP949/EUC-KR) with
--encodingoption - Skip metadata rows with
--skip-rowsoption - Remove constant/near-constant columns with
remove-constantscommand - Merge glob pattern support with
--input-patternoption - Unpivot (wide-to-long) transformation
- Filename metadata extraction during CSV concatenation
- System.CommandLine 2.0.0 stable migration
- 29 commands total, 359 tests passing
- System.CommandLine integration for robust CLI framework
- Spectre.Console rich terminal UI with colors and progress
- Phase 2: datetime, merge-asof, string, conditional, window commands
- 26 commands with consistent syntax
- Initial CLI implementation
- Core data processing commands
- Basic file format support