Skip to content

Print warning in summary when comparing data frames with no rows #18

@MoritzPotthoffQC

Description

If you compare data frames that both have no rows, the summary correctly shows a perfect match.

In exploratory work, I sometimes run into cases where I get a perfect match but it is actually just caused by both input data frames being empty (e.g., because of a faulty previous join). I think it would be nice to add a warning to the summary in case both data frames have no rows.

Example:

import polars as pl
from diffly import compare_frames

left = pl.DataFrame({"id": [], "value": [], "name": []}).cast(
    {"id": pl.Int64, "value": pl.Float64, "name": pl.Utf8}
)
right = pl.DataFrame({"id": [], "value": [], "name": []}).cast(
    {"id": pl.Int64, "value": pl.Float64, "name": pl.Utf8}
)

comparison = compare_frames(left, right, primary_key="id")
print(comparison.summary())

prints

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                     Diffly Summary                                     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
                            --- Data frames match exactly! ---

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions