Skip to content

whispem/dprism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dprism

Decompose your data at the speed of light.

htop meets pandas-profiling -- written in Rust.

version License: MIT Rust status


What is dprism?

dprism is a terminal-native tool that lets you explore, profile, and understand datasets instantly -- without leaving your terminal, without spinning up Jupyter, without writing a single line of code.

Built with Polars for lightning-fast data processing and Ratatui for a beautiful, responsive TUI.

┌─────────────────────────────────────────────────────────────────┐
│ dprism   sales.csv   1,000,000 rows x 12cols | 48 MiB      │
├──────────────────────────┬──────────────────────────────────────┤
│  Columns (12)        │  Column Profile                  │
│                          │                                      │
│  #  Column      Type  N  │  revenue  [f64]                     │
│  1  id          i64  0%  │                                      │
│  2  date        date 0%  │  ─── Overview ───                   │
│ >3  revenue     f64  2%  │  Count ............ 1,000,000       │
│  4  region      str  0%  │  Null count ........ 20,000         │
│  5  category    str  1%  │  Null % ............ 2.00%          │
│                          │  Unique ............ 847,231        │
│                          │                                      │
│                          │  ─── Statistics ───                  │
│                          │  Mean ............. 4,521.87        │
│                          │  Median ........... 3,200.00        │
│                          │  Std Dev .......... 2,876.43        │
│                          │  Q1 (25%) ......... 1,250.00       │
│                          │  Q3 (75%) ......... 6,800.00       │
│                          │  Min .............. 0.50            │
│                          │  Max .............. 99,999.99       │
│                          │  (!) Outliers ........ 42 (0.4%)     │
│                          │                                      │
│                          │  ─── Distribution ───               │
│                          │      0.5 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 312       │
│                          │  10000.5 ▓▓▓▓▓▓▓▓▓▓ 247            │
│                          │  20000.5 ▓▓▓▓▓▓▓▓ 198              │
│                          │                                      │
│                          │  ─── Top Values ───                 │
│                          │  99.99        ████████████ 1,204    │
│                          │  49.99        ████████ 892          │
├──────────────────────────┴──────────────────────────────────────┤
│ ↑/k Up  ↓/j Down  / Search  Tab Cycle  c Corr  f Filter  q Q  │
└─────────────────────────────────────────────────────────────────┘

Features

Data Loading

  • Multi-format -- CSV, Parquet, and Arrow IPC out of the box
  • Instant loading -- powered by Polars, handles multi-GB files
  • stdin piping -- cat data.csv | dprism explore -
  • Progress indicator -- spinner for large files

Profiling & Statistics

  • Column profiling -- mean, median, std dev, Q1/Q3, min/max, null %, unique count
  • Inline histograms -- distribution plots for numeric columns
  • Correlation matrix -- pairwise Pearson correlations between numeric columns
  • Outlier detection -- IQR-based outlier highlighting
  • Top values -- frequency bar chart per column

Interactive TUI

  • Data preview -- toggle between profile and raw data view with Tab
  • Column search -- press / to find columns by name
  • Interactive filtering -- press f to filter rows (e.g., age>30, name=Alice)
  • Correlation view -- press c to see the correlation matrix
  • Type detection -- colour-coded data types (int, float, string, bool, date)
  • Keyboard-driven -- vim-style navigation (j/k, g/G)

CLI Tools

  • Schema validation -- validate datasets against YAML rules
  • Dataset diff -- compare two datasets and see changes
  • JSON export -- dump stats for CI/automation
  • Config file -- ~/.dprism.toml for persistent preferences

Installation

From source (requires Rust 1.75+)

git clone https://github.com/whispem/dprism.git
cd dprism
cargo install --path .

From crates.io

cargo install dprism

Usage

Explore a dataset

dprism explore data.csv
dprism explore warehouse.parquet
dprism explore events.arrow

Pipe from stdin

cat data.csv | dprism explore -
curl -s https://example.com/data.csv | dprism explore -

Explore with filtering

dprism explore data.csv --filter "age > 30"
dprism explore data.csv --filter "department = Engineering"
dprism explore data.csv --filter "city ~ York"

Options

dprism explore data.csv --delimiter ';'              # Custom CSV delimiter
dprism explore data.csv --no-header                  # First row is data
dprism explore data.csv --head 10000                 # Load first 10k rows
dprism explore data.csv --export-stats stats.json    # Export & exit

Tip: The alias ex works too: dprism ex data.csv

Validate a dataset against a YAML schema

dprism validate data.csv schema.yaml

Example schema (schema.yaml):

columns:
  age:
    type: Int64
    nullable: false
    min: 0
    max: 150
  department:
    type: String
    values: [Engineering, Marketing, Sales, HR]
  salary:
    type: Float64
    nullable: true
    min: 0

Schema rules: type, nullable, min, max, unique, values.

Compare two datasets

dprism diff old.csv new.csv
dprism diff v1.parquet v2.parquet

Output includes schema changes (columns added/removed/type changed), row count deltas, and per-column value summaries.

Config file

Create ~/.dprism.toml for persistent defaults:

[defaults]
delimiter = ";"
head = 50000
# no_header = true

[ui]
theme = "dark"
histogram_bins = 20

Keyboard shortcuts

Key Action
/ k Previous column
/ j Next column
/ Search columns by name
Tab Cycle views: Profile -> Preview -> Corr
c Jump to correlation matrix
f Filter rows (or clear active filter)
g / Home Jump to first column
G / End Jump to last column
q / Esc Quit (or exit search/filter)

Filter expressions

In both --filter CLI flag and interactive f mode:

Expression Meaning
age > 30 Numeric greater-than
salary >= 50000 Numeric greater-or-equal
name = Alice String equality
name != Bob String inequality
city ~ York String contains

Architecture

src/
├── main.rs          # Entry point & command routing
├── lib.rs           # Public API for tests
├── cli.rs           # Clap-based CLI (explore, validate, diff)
├── config.rs        # ~/.dprism.toml config file support
├── error.rs         # Custom error types (thiserror)
├── data/
│   ├── mod.rs
│   ├── loader.rs    # Polars-powered CSV, Parquet & Arrow IPC loading
│   ├── stats.rs     # Per-column statistics (with histograms & outliers)
│   ├── correlation.rs # Pearson correlation matrix
│   ├── filter.rs    # Row filter expression parser
│   ├── schema.rs    # YAML schema validation
│   └── diff.rs      # Dataset comparison
└── ui/
    ├── mod.rs
    └── explorer.rs  # Ratatui TUI (profile, preview, correlation, filter)

Roadmap

v1.1 -- Pipeline Mode

  • dprism watch pipeline.yaml -- real-time file monitoring
  • Webhook alerts for data quality
  • JSON / NDJSON format support
  • Excel (.xlsx) format support

v2.0 -- Extensibility

  • Plugin system (custom stats, custom views)
  • ML model benchmark runner
  • Homebrew & apt packages
  • WASM playground

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feat/amazing-feature)
  3. Write tests for new functionality
  4. Ensure cargo clippy and cargo fmt pass
  5. Open a PR with a clear description

License

MIT -- see LICENSE for details.


Built with love and Rust by @whispem

About

A terminal-native tool that lets you explore, profile, and understand datasets instantly — without leaving your terminal, without spinning up Jupyter, without writing a single line of code. Built with Polars for lightning-fast data processing and Ratatui for a beautiful, responsive TUI.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages