Skip to content

harvard-lil/binoc

Repository files navigation

Binoc: The Missing Changelog for Datasets

Binoc generates changelogs for datasets that don't have them. Given a series of snapshots of a dataset downloaded at different times, it detects what changed, expresses those changes as a minimal structured diff, and produces human-readable summaries that distinguish substantive policy changes from clerical housekeeping. The primary audience is archivists, data scientists, and stewards tracking undocumented changes to published datasets.

Documentation: https://harvard-lil.github.io/binoc/

Install

pip install binoc

Or run without installing:

uvx binoc diff path/to/snapshot-a path/to/snapshot-b

Plugins extend binoc with domain-specific format support and install the same way:

pip install binoc-sqlite          # semantic SQLite schema + row-count diffing

The documentation site has tutorials, how-to recipes, reference for the CLI / Python API / Rust SDK / changeset schema, and the architectural explanation set. Start at the Tutorial if you're new, Start here for a role-based map of the site, or the Architecture overview if you're evaluating or extending binoc.

Project status

Binoc is in a collaborative design phase. The CLI is ready to use; internals are unstable and expected to change. We welcome feedback, plugin authors, and contributors.

Architectural ground rules

The contract for human and AI contributors lives in AGENTS.md. The long-form record of every major design decision lives in docs/adr/.

About

Tool to generate the missing changelog for datasets

Topics

Resources

License

Stars

Watchers

Forks

Contributors