Add correctness, rule-consistency, and currency metrics by jb3rndt · Pull Request #4 · HPI-Information-Systems/Metis

jb3rndt · 2025-10-19T19:14:22Z

This PR is based on #3, but I'm opening this already to get your feedback :)

Adds three new metrics:

correctness: compares each data point with its ground truth using a distance function (either simple absolute difference for numbers or levenshtein distance for strings)
currency: given a decline rate per column, the name of the column that contains the assessment date of each value in the tuple, and optionally a simulated assessment date to not rely on "now", calculates the currency based on this formula: curr(w, A) = exp(-decline(A) * age(w,A)) (with w the attribute value and A the column)
- the decline rate is interpreted in years right now. It might be useful to make that configurable too?
rule-based consistency: checks whether the given rules per column hold on an attribute. Weighing a rule happens inside the rule definition itself. The return value of all rules are just added up when assessing the consistency value.
- since rules are defined as python functions right now, I allowed metrics to be initialized by passing a config object directly (keeping JSON as an option too of course)

Copilot

Pull Request Overview

This PR introduces three new data quality metrics (Correctness, Currency, and Rule-based Consistency), refactors writer implementations to use a shared SQLAlchemy-based DatabaseWriter, and adds a SQLAlchemy ORM model for persisting results.

New metrics: Correctness (distance vs. ground truth), Currency (exponential decay by age), Rule-based Consistency (rule aggregation with certainty).
Refactor: Unify SQLite/Postgres writers via DatabaseWriter and SQLAlchemy ORM models; add DQDimension enum and update DQResult to use it.
Config handling: Add MetricConfig base and load_config utility; allow passing config objects (notably for rule-based consistency).

Reviewed Changes

Copilot reviewed 18 out of 21 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
metis/writer/sqlite_writer.py	Switch SQLiteWriter to SQLAlchemy via DatabaseWriter; provide engine factory.
metis/writer/postgres_writer.py	Switch PostgresWriter to SQLAlchemy via DatabaseWriter; provide engine factory.
metis/writer/database_writer.py	New base writer using SQLAlchemy ORM models; centralizes table creation and writes.
metis/writer/console_writer.py	Minor typing tweak for optional config.
metis/utils/result.py	DQdimension type changed to DQDimension enum.
metis/utils/dq_dimension.py	New DQDimension StrEnum with dimensions.
metis/models.py	New SQLAlchemy declarative model and dynamic table registration.
metis/metric/metric.py	Add MetricConfig support and config loader.
metis/metric/currency/currency.py	Implement Currency metric with exponential decay by age.
metis/metric/currency/config.py	Config dataclass for Currency.
metis/metric/correctness/correctness.py	Implement Correctness metric (distance-based).
metis/metric/consistency/rule_consistency.py	Implement rule-based consistency with certainty annotation.
metis/metric/consistency/consistency.py	Make Consistency accept JSON config path; switch to DQDimension.
metis/metric/consistency/config.py	Config dataclasses for consistency metrics.
metis/metric/config.py	Base config dataclass helper.
metis/metric/completeness/completeness.py	Switch to DQDimension and updated typing.
metis/metric/init.py	Export new metrics.
metis/dq_orchestrator.py	Allow MetricConfig object in orchestrator assess API.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

metis/models.py

metis/writer/database_writer.py

metis/metric/consistency/consistency_countFDViolations.py

metis/metric/correctness/correctness.py

metis/utils/result.py

metis/models.py

metis/writer/postgres_writer.py

metis/writer/sqlite_writer.py

lisehr

The three metric implementations look good! Could you please still rename the metrics according to the new naming convention? _

jb3rndt · 2025-12-12T14:48:53Z

Metric names and their file names fit the new naming scheme now :)

metis/utils/similarity_measures/levenshtein_distance.py

metis/utils/dq_dimension.py

lisehr · 2025-12-16T19:40:22Z

metis/metric/config.py

+
+
+@dataclass
+class MetricConfig:


can we also create a naming scheme for metric configs in analogy to the metrics? like timeliness_config since we used this instead of camel case so far?

For the configs, I have adopted the following naming scheme based on the name of the metric this config is used for:
file name: {metric file name}_config.py
class name: equal to the file name

For the consistency metric, this results in:
metric file name and class name: consistency_ruleBasedHinrichs.py

config file name and class name: consistency_ruleBasedHinrichs_config.py

metis/metric/consistency/consistency_ruleBasedHinrichs.py

metis/metric/completeness/completeness.py

metis/metric/timeliness/timeliness_heinrich.py

lisehr

Left some comments directly in the code.

jb3rndt · 2025-12-17T11:07:13Z

Thank you, I have adjusted the naming and docstrings accordingly :)

jb3rndt · 2026-02-02T15:47:12Z

@lisehr I've updated the branch :) Here is a short summary of the parts that have changed since your last review:

Metrics

New: completeness_nullRate: Measures the ratio of null values in a dataset (differs from the existing completeness_nullRatio in that it can be configured which granularity should be used). Does it make sense to merge both? (if yes, which name should be kept?)
New: completeness_nullAndDMVRate: Extends null rate assessment by detecting Disguised Missing Values (DMVs) using FAHES algorithm
Changed: consistency_ruleBasedPipino: Added certainty calculation
Changed: timeliness_heinrich: Added certainty calculation

Framework Changes

New: Add metric-specific loggers
New: CSVWriter
New: Datetime Utilities: Precision calculation utilities for timeliness metrics

lisehr

Generally, all additions look good. A few minor / structural comments:

nullRate vs. nullRatio: do we really need both files? I suggest combining them to one file that calculates both, the rate (count) plus the ratio of nulls. Currently it seems that nullRate calculates the ratio too anyway.
Please move all configs to Metis/configs/metric
Why is correctness_heinrich in the writer folder?

lisehr

@jb3rndt: sorry, I messed up answering to your last questions.

completeness_nullRate vs nullRatio: the new file indeed looks much better and I suggest calling it completeness_nullRatio + make clear to add information in the header that both aspects (count + ratio) are calculated and stored to the DB, also on different granularity levels.
for completeness_nullAndDMVRate: also here, please make sure to declare FAHES with link to paper + github already in the header of the file. For the future, we could also use other DMV detectors like DisMis, for now, you can leave the filename as it is. Thank you!

jb3rndt assigned lisehr and Phimanu Oct 19, 2025

Copilot AI review requested due to automatic review settings October 19, 2025 19:14

jb3rndt force-pushed the feat/correctness-metric branch from 3b1f382 to 9f9bc44 Compare October 19, 2025 19:16

Copilot AI reviewed Oct 19, 2025

View reviewed changes

jb3rndt force-pushed the feat/correctness-metric branch from 9d0d1ad to 7319535 Compare December 1, 2025 22:05

lisehr requested changes Dec 7, 2025

View reviewed changes

jb3rndt added 11 commits December 8, 2025 12:19

add correctness metric

4630a57

add rule-based consistency metric

7c131bf

add currency metric

c540fdc

add MetricConfig support in addition to json configuration

89da95b

prototypical assessment of certainty of the rule-consistency metric

379641f

fix correctness metric and adjust docstring

c823e93

add prototypical certainty calculation to correctness and currency

8947834

add postgres docker setup

cf3305c

add basic logging

33ad9bb

move common utilities into utils folder

a77a890

add fallback writing to csv in case the writer errors

bf8e8c9

jb3rndt force-pushed the feat/correctness-metric branch from 7319535 to df48399 Compare December 8, 2025 12:37

rename metrics according to naming scheme

cd9a0a7

jb3rndt force-pushed the feat/correctness-metric branch 2 times, most recently from 5b741b4 to fa03a80 Compare December 8, 2025 13:37

add csv writer

164ae7f

jb3rndt force-pushed the feat/correctness-metric branch 2 times, most recently from e815ab8 to 78a52bb Compare December 12, 2025 14:35

rename metric files

1c3d0a5

jb3rndt force-pushed the feat/correctness-metric branch from 78a52bb to 1c3d0a5 Compare December 12, 2025 14:47

add tuple rules to metric ConsistencyRuleBasedHinrichs

e1362ea

jb3rndt force-pushed the feat/correctness-metric branch from fa08c91 to e1362ea Compare December 15, 2025 19:21

remove certainty from correctness and timeliness for now again

3f22d9d

lisehr self-requested a review December 16, 2025 14:11