Skip to content

sofia-willow/dataweaver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧵 DataWeaver

Python 3.10+ License: MIT Code style: black

CLI data transformation and pipeline tool. Query, filter, convert, and inspect CSV, JSON, JSONL, and YAML files from your terminal.

✨ Features

  • 🔍 Query — SQL-like filtering with expressions: age > 30 AND city == 'Madrid'
  • 🔄 Convert — Transform between CSV, JSON, JSONL, YAML, and Markdown
  • 📊 Inspect — View schema, data types, statistics, and previews
  • 📦 Pipe-friendly — Reads from stdin, writes to stdout — plays nice with Unix pipes
  • 🎨 Rich output — Beautiful terminal tables powered by Rich

📦 Installation

pip install .

Or install in development mode:

pip install -e ".[dev]"

🚀 Usage

Query — Filter your data

# Filter rows from a CSV
dataweaver query data.csv --where "age > 30 AND city == 'Madrid'"

# Select specific columns
dataweaver query data.json --where "status == 'active'" --select name,email,score

# Sort results
dataweaver query data.csv --where "score >= 80" --sort score --desc

# Output as JSON instead of table
dataweaver query data.csv --where "age > 25" -o json

Filter expressions

DataWeaver supports a rich expression syntax:

Operator Example
== city == 'Madrid'
!= status != 'inactive'
> < >= <= age > 30
AND age > 30 AND city == 'Madrid'
OR role == 'admin' OR role == 'superadmin'
NOT NOT status == 'banned'
contains() contains(email, 'gmail')
Parentheses (age > 20 AND age < 40) OR city == 'Madrid'

Convert — Transform between formats

# CSV to JSON
dataweaver convert data.csv -o json

# JSON to YAML
dataweaver convert data.json -o yaml

# CSV to Markdown table
dataweaver convert data.csv -o markdown

# JSONL to CSV, saving to file
dataweaver convert events.jsonl -o csv --out events.csv

Inspect — Understand your data

# Show schema, types, and statistics
dataweaver inspect data.csv

# Inspect a JSON file
dataweaver inspect users.json

Output includes:

  • Row and column counts
  • Column names, types, and null counts
  • Numeric statistics (min, max, avg, sum)
  • Data preview

Piping — Unix-friendly

# Chain with other tools
cat data.csv | dataweaver query - -f csv --where "age > 30" -o json | jq '.[] | .name'

# Convert API response
curl -s https://api.example.com/users | dataweaver convert - -f json -o csv

# Filter and convert in one pipeline
dataweaver query data.csv --where "status == 'active'" -o jsonl | \
  dataweaver query - -f jsonl --where "score > 80" -o table

🏗️ Architecture

dataweaver/
├── __init__.py      # Package metadata
├── __main__.py      # python -m dataweaver entry point
├── cli.py           # Argument parsing and command dispatch
├── models.py        # Dataset dataclass with chainable methods
├── query.py         # Expression parser (recursive descent)
├── reader.py        # File format readers (CSV, JSON, JSONL, YAML)
├── writer.py        # Output writers (CSV, JSON, JSONL, YAML, Markdown, Rich table)
└── transform.py     # Data transformations (select, rename, sort, group, aggregate)

🧪 Testing

pip install -e ".[dev]"
pytest -v

📄 License

MIT © Sofia Willow 2026

About

CLI data transformation and pipeline tool. Query, filter, convert between CSV/JSON/YAML.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages