You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a --stats (alias --profile) flag that computes per-column statistics after loading input and prints them as a formatted table.
sql-pipe sales.csv --stats
# | column | type | non-null | min | max | mean |# |--------|---------|----------|-------|---------|--------|# | id | INTEGER | 1000 | 1 | 1000 | 500.5 |# | amount | REAL | 1000 | 0.50 | 9999.99 | 512.34 |# | region | TEXT | 1000 | East | West | |
Motivation
This is the first thing every data analyst does with a new dataset. Currently users must manually write SELECT MIN(x), MAX(x), AVG(x), COUNT(*) FROM t WHERE x IS NOT NULL for each column. A --stats mode automates the most common profiling query and produces instant insight into data shape, completeness, and distribution. Competitive with csvstat (csvkit) and DuckDB's .mode stats.
Acceptance Criteria
--stats flag is parsed in args.zig
After loading tables, compute per-column: type, non-null count, min, max, mean (for numeric), distinct count
For TEXT columns: show min/max as string values, skip mean
For INTEGER/REAL columns: show all stats including mean
Output formatted as a table (reuse existing table formatter)
--stats is mutually exclusive with --columns, --validate, --sample, --schema, --explain, and a query argument
Works with multiple files (show stats per table)
Integration tests cover the new mode
Help text updated
Implementation Notes
Use PRAGMA table_info(t) to get column names and types (function getTableColumns at src/sqlite.zig:277 already does this)
Build an aggregate query with UNION ALL per column
Add StatsArgs to src/args.zig following the pattern of ColumnsArgs/ValidateArgs
Description
Add a
--stats(alias--profile) flag that computes per-column statistics after loading input and prints them as a formatted table.Motivation
This is the first thing every data analyst does with a new dataset. Currently users must manually write
SELECT MIN(x), MAX(x), AVG(x), COUNT(*) FROM t WHERE x IS NOT NULLfor each column. A--statsmode automates the most common profiling query and produces instant insight into data shape, completeness, and distribution. Competitive withcsvstat(csvkit) and DuckDB's.mode stats.Acceptance Criteria
--statsflag is parsed inargs.zig--statsis mutually exclusive with--columns,--validate,--sample,--schema,--explain, and a query argumentImplementation Notes
PRAGMA table_info(t)to get column names and types (functiongetTableColumnsatsrc/sqlite.zig:277already does this)UNION ALLper columnStatsArgstosrc/args.zigfollowing the pattern ofColumnsArgs/ValidateArgs