Skip to content

Resolving Functional Dependency (FD) in XLearner #10

@agiannoul

Description

@agiannoul

By default, in XLearner (XLearner.py), the parameter resolve_fd in the learn function is set to True.

As a result, the resolve_fd method is called. From my understanding, if the user does not provide any functional dependencies (FDs), resolve_fd will create FD edges. These edges are created between two variables X and Y if, for any $ x_i, x_j \in X $, the condition $y_i = y_j$ holds only when $ x_i = x_j $.

However, this behavior is not explained in the paper (unless I missed something).

Questions

  1. Is there a specific reason behind this logic?
  2. Should this be kept as the default behavior?
  3. Does this apply only to categorical values?

Code from resolve_fd at XLearner.py file under XDA/src

        if fd_edges is not None: fd_list = set(fd_edges)
        else:
            for cmb in permutations(cols, 2):
                col1 = cmb[0]
                col2 = cmb[1]
                if (col2, col1) in fd_list: continue
                mapper = {}
                has_dup = False
                if count[col2] == 1 or count[col2] > count[col1]:
                    continue
                for index, row in self.df.iterrows():
                    key = row[col1]
                    val = row[col2]
                    if key in mapper and mapper[key] != val:
                        has_dup = True
                        break
                    if key not in mapper:
                        mapper[key] = val
                if not has_dup:
                    fd_list.add(cmb)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions