Add correctness, rule-consistency, and currency metrics#4
Add correctness, rule-consistency, and currency metrics#4
Conversation
3b1f382 to
9f9bc44
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR introduces three new data quality metrics (Correctness, Currency, and Rule-based Consistency), refactors writer implementations to use a shared SQLAlchemy-based DatabaseWriter, and adds a SQLAlchemy ORM model for persisting results.
- New metrics: Correctness (distance vs. ground truth), Currency (exponential decay by age), Rule-based Consistency (rule aggregation with certainty).
- Refactor: Unify SQLite/Postgres writers via DatabaseWriter and SQLAlchemy ORM models; add DQDimension enum and update DQResult to use it.
- Config handling: Add MetricConfig base and load_config utility; allow passing config objects (notably for rule-based consistency).
Reviewed Changes
Copilot reviewed 18 out of 21 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| metis/writer/sqlite_writer.py | Switch SQLiteWriter to SQLAlchemy via DatabaseWriter; provide engine factory. |
| metis/writer/postgres_writer.py | Switch PostgresWriter to SQLAlchemy via DatabaseWriter; provide engine factory. |
| metis/writer/database_writer.py | New base writer using SQLAlchemy ORM models; centralizes table creation and writes. |
| metis/writer/console_writer.py | Minor typing tweak for optional config. |
| metis/utils/result.py | DQdimension type changed to DQDimension enum. |
| metis/utils/dq_dimension.py | New DQDimension StrEnum with dimensions. |
| metis/models.py | New SQLAlchemy declarative model and dynamic table registration. |
| metis/metric/metric.py | Add MetricConfig support and config loader. |
| metis/metric/currency/currency.py | Implement Currency metric with exponential decay by age. |
| metis/metric/currency/config.py | Config dataclass for Currency. |
| metis/metric/correctness/correctness.py | Implement Correctness metric (distance-based). |
| metis/metric/consistency/rule_consistency.py | Implement rule-based consistency with certainty annotation. |
| metis/metric/consistency/consistency.py | Make Consistency accept JSON config path; switch to DQDimension. |
| metis/metric/consistency/config.py | Config dataclasses for consistency metrics. |
| metis/metric/config.py | Base config dataclass helper. |
| metis/metric/completeness/completeness.py | Switch to DQDimension and updated typing. |
| metis/metric/init.py | Export new metrics. |
| metis/dq_orchestrator.py | Allow MetricConfig object in orchestrator assess API. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
9d0d1ad to
7319535
Compare
lisehr
left a comment
There was a problem hiding this comment.
The three metric implementations look good! Could you please still rename the metrics according to the new naming convention? _
7319535 to
df48399
Compare
5b741b4 to
fa03a80
Compare
e815ab8 to
78a52bb
Compare
78a52bb to
1c3d0a5
Compare
|
Metric names and their file names fit the new naming scheme now :) |
fa08c91 to
e1362ea
Compare
|
|
||
|
|
||
| @dataclass | ||
| class MetricConfig: |
There was a problem hiding this comment.
can we also create a naming scheme for metric configs in analogy to the metrics? like timeliness_config since we used this instead of camel case so far?
There was a problem hiding this comment.
For the configs, I have adopted the following naming scheme based on the name of the metric this config is used for:
file name: {metric file name}_config.py
class name: equal to the file name
For the consistency metric, this results in:
metric file name and class name: consistency_ruleBasedHinrichs.py
config file name and class name: consistency_ruleBasedHinrichs_config.py
lisehr
left a comment
There was a problem hiding this comment.
Left some comments directly in the code.
|
Thank you, I have adjusted the naming and docstrings accordingly :) |
|
@lisehr I've updated the branch :) Here is a short summary of the parts that have changed since your last review: Metrics
Framework Changes
|
lisehr
left a comment
There was a problem hiding this comment.
Generally, all additions look good. A few minor / structural comments:
- nullRate vs. nullRatio: do we really need both files? I suggest combining them to one file that calculates both, the rate (count) plus the ratio of nulls. Currently it seems that nullRate calculates the ratio too anyway.
- Please move all configs to Metis/configs/metric
- Why is correctness_heinrich in the writer folder?
lisehr
left a comment
There was a problem hiding this comment.
@jb3rndt: sorry, I messed up answering to your last questions.
- completeness_nullRate vs nullRatio: the new file indeed looks much better and I suggest calling it completeness_nullRatio + make clear to add information in the header that both aspects (count + ratio) are calculated and stored to the DB, also on different granularity levels.
- for completeness_nullAndDMVRate: also here, please make sure to declare FAHES with link to paper + github already in the header of the file. For the future, we could also use other DMV detectors like DisMis, for now, you can leave the filename as it is. Thank you!
b23f55b to
68e9684
Compare
This PR is based on #3, but I'm opening this already to get your feedback :)
Adds three new metrics:
curr(w, A) = exp(-decline(A) * age(w,A))(withwthe attribute value andAthe column)