Olly

Olly is a data observability framework that monitors data warehouses for schema changes, volume anomalies, freshness issues, cross-source integrity, cost spikes, unused tables, and contract violations. Currently supports DuckDB, Postgres, BigQuery, and Snowflake via Ibis.

By default, the schema checks are metadata-only, making it lightweight and low-cost.

🚨 This should be considered experimental and subject to change.

Install

uv add olly-core

Install with an adapter:

uv add "olly-core[duckdb]"    # or postgres, bigquery, snowflake

Quickstart

olly init                 # interactive setup → creates olly.toml
olly snapshot             # capture current warehouse state
olly check                # detect changes since last snapshot

That's it. Olly compares consecutive snapshots and reports schema changes, volume anomalies, and stale tables, out of the box. You can add more specific configuration for high-value objects.

What's available

Check	What it detects	Severity
Schema	Added/removed tables, added/removed/changed columns, nullability changes	error or warning
Volume	Row count anomalies using EWMA (default) or z-score over snapshot history	warning
Freshness	Tables not updated within the configured threshold	warning
Integrity	Row count or hash mismatches between source and target databases	error
Contracts	Schema violations against declared Python contracts	error or warning
dbt integration	Ingest `run_results.json`, and get reporting on test failures, changes in execution time, and track DDL changes	error or warning
Usage	Tables not queried within a lookback window (BigQuery only)	error or warning
Cost	Query cost spike (BigQuery only)	warning

Example output

Findings
┏━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check    ┃ Severity ┃ Table            ┃ Description                                                        ┃
┡━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ schema   │ error    │ main.customers   │ Column removed: main.customers.email                              │
│ schema   │ warning  │ main.orders      │ New column: main.orders.status                                    │
│ freshness│ warning  │ main.payments    │ Stale data: main.payments — last update 30.0h ago (threshold: 24.0h)│
│ volume   │ warning  │ main.orders      │ Row count anomaly (increase): main.orders (12,345 rows, z-score: +3.20) │
└──────────┴──────────┴──────────────────┴────────────────────────────────────────────────────────────────────┘

1 error(s), 3 warning(s)

Use --json for machine-readable output.

CLI reference

olly init              Run the interactive setup wizard
olly snapshot          Capture current warehouse state
  --verbose            Print detailed progress
  --connection NAME    Only snapshot this connection
olly check             Run data quality checks
  --json               Machine-readable JSON output
  --verbose            Print detailed progress
  --write-results      Persist findings to ~/.olly/findings.json (default: true)
  --no-write-results   Disable writing findings to disk
  --connection NAME    Only check this connection
olly plan              Show resolved config for each table
  --connection NAME    Only explain this connection
olly unused            Show unused and stale tables
  --json               Machine-readable JSON output
  --verbose            Print detailed progress
  --connection NAME    Only check this connection
olly debug             Test connectivity to the configured warehouse
  --connection NAME    Only test this connection
olly clean             Delete the local state database
  --yes                Skip confirmation prompt
olly create-state      Create warehouse state schema and tables
  --connection NAME    Only create state for this connection
olly serve             Start the web dashboard
  --host HOST          Bind address (default: 127.0.0.1)
  --port PORT          Port (default: 8000)

Configuration

Olly is configured via olly.toml in your project root. Run olly init to generate one interactively, or create it by hand.

Connections

Each named connection under [connections.<name>] specifies a warehouse backend via type plus backend-specific fields. You can monitor multiple warehouses from a single config:

# DuckDB
[connections.primary]
type = "duckdb"
path = "warehouse.duckdb"          # omit for in-memory

# Postgres (second connection)
[connections.analytics]
type = "postgres"
url = "${DATABASE_URL}"

# BigQuery
[connections.warehouse]
type = "bigquery"
project = "my-project"
dataset = "analytics"                       # optional
use_information_schema_row_counts = true     # optional, default true

# Snowflake
[connections.snowflake]
type = "snowflake"
account = "my-account"
database = "my_db"                          # optional
use_account_usage = false                   # optional, default false
user = "my-user"                            # optional, forwarded to Ibis
role = "ANALYST"                            # optional, forwarded to Ibis
warehouse = "COMPUTE_WH"                    # optional, forwarded to Ibis

Any extra keys beyond the standard fields are forwarded as keyword arguments to the underlying Ibis connect() call. This lets you pass adapter-specific options (e.g. Snowflake user, role, warehouse, or DuckDB read_only) without Olly needing to know about them.

For BigQuery, set credentials via:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Selection

Each connection has its own selection. By default, all schemas are monitored (except information_schema) and all tables are included. Use glob patterns (*) to filter:

[connections.primary.selection]
include_schemas = ["*"]
exclude_schemas = ["information_schema", "scratch", "dev"]
include_tables = ["*.*"]
exclude_tables = ["main.tmp_*", "main.staging_*"]

Settings

Global defaults for check thresholds:

[settings]
freshness_threshold_hours = 24.0    # max age before flagging stale
volume_zscore_threshold = 3.0       # z-score cutoff for volume anomalies
volume_method = "ewma"              # "ewma" (default) or "zscore"
history_depth = 30                  # snapshots to keep for trend analysis
min_history_for_anomaly = 5         # minimum snapshots before volume checks run
write_results = true                # persist findings to ~/.olly/findings.json
state_schema = "olly_state"         # optional: store state in warehouse instead of local SQLite

EWMA (Exponentially Weighted Moving Average) is the default volume method. It weights recent observations more heavily, making it better at handling trending tables without false positives. Use "zscore" for stationary tables where equal weighting of all history is preferred. Both methods can be overridden per table.

Overrides

Override settings for specific tables or patterns, scoped per connection. More specific matches win (schema < pattern < exact table):

[[connections.primary.overrides]]
match = "main.orders"
freshness_threshold_hours = 168
volume_zscore_threshold = 5.0

[[connections.primary.overrides]]
match = "main.*"
freshness_column = "updated_at"

Use olly plan to see how overrides resolve for each object.

Cross-connection integrity

Compare data between two connections. Syncs reference named connections from your [connections.*] config:

[connections.warehouse]
type = "duckdb"
path = "warehouse.duckdb"

[connections.replica]
type = "postgres"
url = "postgresql://${PGUSER}:${PGPASSWORD}@replica:5432/db"

[integrity]
module = "integrity_syncs.py"   # file path or dotted module name

# integrity_syncs.py
from olly.models import IntegrityMethod, Sync, WindowOp, WindowSpec

syncs = [
    Sync(
        name="orders_count",
        source="warehouse",        # references [connections.warehouse]
        target="replica",          # references [connections.replica]
        source_table="main.orders",
        target_table="public.orders",
        method=IntegrityMethod.COUNT,
        watermark="updated_at",
        window=WindowSpec(op=WindowOp.GT_NOW, duration="2h"),
    ),
]

Methods:

Method	What it compares
`count`	Row counts between source and target
`count_distinct`	Distinct values of a key column
`pk`	Primary key sets (finds missing/extra rows)
`hash`	Row-level hash of specified columns

Pipelines support time windows, WHERE filters, tolerance thresholds, and watermark columns for incremental checks.

Contracts

Define expected table schemas as Python classes:

# contracts.py
from datetime import datetime
from olly.contracts import TableContract

class Orders(TableContract):
    __table__ = "orders"
    __schema__ = "main"
    __connection__ = "primary"  # only check against this connection (optional)
    __strict__ = True           # flag unexpected columns

    id: int
    amount: float
    created_at: datetime
    customer_name: str | None   # nullable

When __connection__ is set, the contract is only validated against that named connection. When omitted, it runs against all connections.

Point your config at the contracts module:

[contracts]
module = "contracts.py"     # file path or dotted module name

Olly validates the warehouse schema against your contracts on every olly check.

dbt integration

Parse dbt's run_results.json to surface failures:

[dbt]
run_results_path = "target/run_results.json"
performance_threshold = 3.0    # z-score threshold for execution time anomalies
min_history_for_anomaly = 5    # minimum runs before performance checks activate

Table usage monitoring

Detect tables that haven't been queried recently. Currently supported on BigQuery only (uses INFORMATION_SCHEMA.JOBS_BY_PROJECT).

[usage]
enabled = true
lookback_days = 90          # how far back to scan query history
unused_threshold_days = 30  # days without queries before flagging
bigquery_region = "us"

Tables with no queries in the lookback window are flagged as errors. Tables queried but not within the unused threshold are warnings.

Query cost monitoring

Track BigQuery query costs and detect spending spikes. Uses z-score anomaly detection over cost history from previous snapshots. Cost fields live in the [usage] section:

[usage]
enabled = true
cost_enabled = true
cost_lookback_days = 30      # days of query history to aggregate
price_per_tb_usd = 6.25     # on-demand pricing rate
spike_threshold = 3.0        # z-score threshold for cost spike alerts

A legacy [cost] section is still accepted for backward compatibility, but new configs should use [usage].

When olly check runs, it shows a cost summary with top tables and top users by spend. Cost spikes are flagged as findings when the current period's total exceeds the historical mean by more than spike_threshold standard deviations.

Slack notifications

Send findings to a Slack channel via an incoming webhook:

[slack]
webhook_url = "https://hooks.slack.com/services/T00/B00/xxxx"
on_error = true       # send on errors (default: true)
on_warning = false    # send on warnings (default: false)

When olly check produces qualifying findings, a summary is posted to the configured webhook.

Python API

All functionality is available as importable Python modules:

from olly.checker import run_checks
from olly.cli.snapshot import take_snapshot
from olly.config import (
    ConnectionConfig, NamedConnection, OllyConfig, Selection, Settings,
)

nc = NamedConnection(
    name="primary",
    connection=ConnectionConfig(type="duckdb", path="warehouse.duckdb"),
    selection=Selection(include_schemas=["main"]),
)
config = OllyConfig(
    connections={"primary": nc},
    settings=Settings(),
)

take_snapshot(config)
findings, dbt_findings, cost_records = run_checks(config)

for finding in findings:
    print(f"[{finding.connection_name}] {finding.check_type}: {finding.description}")

Web dashboard

Install the dashboard extra and start the server:

uv add "olly-core[dashboard]"
olly serve

The dashboard reads from a state database (written by olly check) and displays findings in a web UI at http://127.0.0.1:8000.

How it works

Olly uses a snapshot-and-diff model:

olly snapshot connects to your warehouse via Ibis, introspects schemas and tables, and stores schema info and row counts in a local SQLite database (~/.olly/state.db).
olly check compares the two most recent snapshots. It runs schema diffs, volume anomaly detection, freshness checks, and any configured integrity/contract/dbt checks.
Findings are printed to the terminal and written to a sqlite database, or specified warehouse schema.

By default, state is fully local — Olly only reads from your warehouse and writes to the ~/.olly/ directory. Optionally, set state_schema in [settings] and run olly create-state to store state in your warehouse instead.

Development

uv sync --group dev
uv run pytest tests/
uv run ruff check
uv run ty check

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dev		dev
src/olly		src/olly
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
loq.toml		loq.toml
pyproject.toml		pyproject.toml
quickstart.md		quickstart.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Olly

Install

Quickstart

What's available

Example output

CLI reference

Configuration

Connections

Selection

Settings

Overrides

Cross-connection integrity

Contracts

dbt integration

Table usage monitoring

Query cost monitoring

Slack notifications

Python API

Web dashboard

How it works

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Olly

Install

Quickstart

What's available

Example output

CLI reference

Configuration

Connections

Selection

Settings

Overrides

Cross-connection integrity

Contracts

dbt integration

Table usage monitoring

Query cost monitoring

Slack notifications

Python API

Web dashboard

How it works

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages