Decompose your data at the speed of light.
htop meets pandas-profiling -- written in Rust.
dprism is a terminal-native tool that lets you explore, profile, and understand datasets instantly -- without leaving your terminal, without spinning up Jupyter, without writing a single line of code.
Built with Polars for lightning-fast data processing and Ratatui for a beautiful, responsive TUI.
┌─────────────────────────────────────────────────────────────────┐
│ dprism sales.csv 1,000,000 rows x 12cols | 48 MiB │
├──────────────────────────┬──────────────────────────────────────┤
│ Columns (12) │ Column Profile │
│ │ │
│ # Column Type N │ revenue [f64] │
│ 1 id i64 0% │ │
│ 2 date date 0% │ ─── Overview ─── │
│ >3 revenue f64 2% │ Count ............ 1,000,000 │
│ 4 region str 0% │ Null count ........ 20,000 │
│ 5 category str 1% │ Null % ............ 2.00% │
│ │ Unique ............ 847,231 │
│ │ │
│ │ ─── Statistics ─── │
│ │ Mean ............. 4,521.87 │
│ │ Median ........... 3,200.00 │
│ │ Std Dev .......... 2,876.43 │
│ │ Q1 (25%) ......... 1,250.00 │
│ │ Q3 (75%) ......... 6,800.00 │
│ │ Min .............. 0.50 │
│ │ Max .............. 99,999.99 │
│ │ (!) Outliers ........ 42 (0.4%) │
│ │ │
│ │ ─── Distribution ─── │
│ │ 0.5 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 312 │
│ │ 10000.5 ▓▓▓▓▓▓▓▓▓▓ 247 │
│ │ 20000.5 ▓▓▓▓▓▓▓▓ 198 │
│ │ │
│ │ ─── Top Values ─── │
│ │ 99.99 ████████████ 1,204 │
│ │ 49.99 ████████ 892 │
├──────────────────────────┴──────────────────────────────────────┤
│ ↑/k Up ↓/j Down / Search Tab Cycle c Corr f Filter q Q │
└─────────────────────────────────────────────────────────────────┘
- Multi-format -- CSV, Parquet, and Arrow IPC out of the box
- Instant loading -- powered by Polars, handles multi-GB files
- stdin piping --
cat data.csv | dprism explore - - Progress indicator -- spinner for large files
- Column profiling -- mean, median, std dev, Q1/Q3, min/max, null %, unique count
- Inline histograms -- distribution plots for numeric columns
- Correlation matrix -- pairwise Pearson correlations between numeric columns
- Outlier detection -- IQR-based outlier highlighting
- Top values -- frequency bar chart per column
- Data preview -- toggle between profile and raw data view with
Tab - Column search -- press
/to find columns by name - Interactive filtering -- press
fto filter rows (e.g.,age>30,name=Alice) - Correlation view -- press
cto see the correlation matrix - Type detection -- colour-coded data types (int, float, string, bool, date)
- Keyboard-driven -- vim-style navigation (j/k, g/G)
- Schema validation -- validate datasets against YAML rules
- Dataset diff -- compare two datasets and see changes
- JSON export -- dump stats for CI/automation
- Config file --
~/.dprism.tomlfor persistent preferences
git clone https://github.com/whispem/dprism.git
cd dprism
cargo install --path .cargo install dprismdprism explore data.csv
dprism explore warehouse.parquet
dprism explore events.arrowcat data.csv | dprism explore -
curl -s https://example.com/data.csv | dprism explore -dprism explore data.csv --filter "age > 30"
dprism explore data.csv --filter "department = Engineering"
dprism explore data.csv --filter "city ~ York"dprism explore data.csv --delimiter ';' # Custom CSV delimiter
dprism explore data.csv --no-header # First row is data
dprism explore data.csv --head 10000 # Load first 10k rows
dprism explore data.csv --export-stats stats.json # Export & exitTip: The alias
exworks too:dprism ex data.csv
dprism validate data.csv schema.yamlExample schema (schema.yaml):
columns:
age:
type: Int64
nullable: false
min: 0
max: 150
department:
type: String
values: [Engineering, Marketing, Sales, HR]
salary:
type: Float64
nullable: true
min: 0Schema rules: type, nullable, min, max, unique, values.
dprism diff old.csv new.csv
dprism diff v1.parquet v2.parquetOutput includes schema changes (columns added/removed/type changed), row count deltas, and per-column value summaries.
Create ~/.dprism.toml for persistent defaults:
[defaults]
delimiter = ";"
head = 50000
# no_header = true
[ui]
theme = "dark"
histogram_bins = 20| Key | Action |
|---|---|
↑ / k |
Previous column |
↓ / j |
Next column |
/ |
Search columns by name |
Tab |
Cycle views: Profile -> Preview -> Corr |
c |
Jump to correlation matrix |
f |
Filter rows (or clear active filter) |
g / Home |
Jump to first column |
G / End |
Jump to last column |
q / Esc |
Quit (or exit search/filter) |
In both --filter CLI flag and interactive f mode:
| Expression | Meaning |
|---|---|
age > 30 |
Numeric greater-than |
salary >= 50000 |
Numeric greater-or-equal |
name = Alice |
String equality |
name != Bob |
String inequality |
city ~ York |
String contains |
src/
├── main.rs # Entry point & command routing
├── lib.rs # Public API for tests
├── cli.rs # Clap-based CLI (explore, validate, diff)
├── config.rs # ~/.dprism.toml config file support
├── error.rs # Custom error types (thiserror)
├── data/
│ ├── mod.rs
│ ├── loader.rs # Polars-powered CSV, Parquet & Arrow IPC loading
│ ├── stats.rs # Per-column statistics (with histograms & outliers)
│ ├── correlation.rs # Pearson correlation matrix
│ ├── filter.rs # Row filter expression parser
│ ├── schema.rs # YAML schema validation
│ └── diff.rs # Dataset comparison
└── ui/
├── mod.rs
└── explorer.rs # Ratatui TUI (profile, preview, correlation, filter)
-
dprism watch pipeline.yaml-- real-time file monitoring - Webhook alerts for data quality
- JSON / NDJSON format support
- Excel (.xlsx) format support
- Plugin system (custom stats, custom views)
- ML model benchmark runner
- Homebrew & apt packages
- WASM playground
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feat/amazing-feature) - Write tests for new functionality
- Ensure
cargo clippyandcargo fmtpass - Open a PR with a clear description
MIT -- see LICENSE for details.
Built with love and Rust by @whispem