pandas_diff

Generate event logs of row-level changes between two pandas DataFrames.

Not a statistical comparison tool — pandas_diff tells you what changed: which rows were created, deleted, or modified, and exactly which fields changed.

Installation

pip install pandas_diff

# With Parquet support
pip install pandas_diff[parquet]

Quick start

import pandas as pd
from pandas_diff import get_diffs

before = pd.DataFrame([
    {"hero": "hulk", "power": "strength"},
    {"hero": "black_widow", "power": "spy"},
    {"hero": "thor", "hammers": 0},
])
after = pd.DataFrame([
    {"hero": "hulk", "power": "smart"},
    {"hero": "captain marvel", "power": "strength"},
    {"hero": "thor", "hammers": 2},
])

df = get_diffs(before, after, keys="hero")

operation	object_keys	object_values	attribute_changed	old_value	new_value
create	[hero]	captain marvel
delete	[hero]	black_widow
modify	[hero]	hulk	power	strength	smart
modify	[hero]	thor	hammers	0	2

CLI

pandas_diff before.csv after.csv --keys id
pandas_diff old.parquet new.parquet --keys name,date --format json
pandas_diff a.csv b.csv --keys id --ignore updated_at -o diff.csv

Supported file formats: CSV, JSON (flat records), Parquet.

Use cases

Batch to event-driven migration — Detect changes between pipeline runs and stream them to Kafka.
Audit event logs — Track how resources change over time.
Data conciliation — Compare a CMDB against the real state of infrastructure.
Environment sync — Propagate changes between production and disaster recovery.

API

get_diffs(
    before: pd.DataFrame,      # Previous state
    after: pd.DataFrame,        # Current state
    keys: list[str] | str,      # Column(s) identifying each row
    ignore_columns: list[str],  # Columns to skip (optional)
) -> pd.DataFrame

Returns a DataFrame with columns: operation, object_keys, object_values, object_json, attribute_changed, old_value, new_value.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.github		.github
docs		docs
pandas_diff		pandas_diff
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pandas_diff

Installation

Quick start

CLI

Use cases

API

License

About

Uh oh!

Releases 21

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pandas_diff

Installation

Quick start

CLI

Use cases

API

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages