Skip to content

update handling of global option of 'use_inf_as_na'#126

Open
AlrauneZ wants to merge 9 commits intomainfrom
124-update-handling-of-use_inf_as_na
Open

update handling of global option of 'use_inf_as_na'#126
AlrauneZ wants to merge 9 commits intomainfrom
124-update-handling-of-use_inf_as_na

Conversation

@AlrauneZ
Copy link
Contributor

@AlrauneZ AlrauneZ commented Feb 23, 2026

The global option 'use_inf_as_na' is deprecated and will be removed in a future version of pandas (from 3.0 onwards).

Thus, modify the function "filter_values()" in transformation.py to not need the global conversation and handling detection of 'inf' values (similar to NaN) differently.

@AlrauneZ AlrauneZ self-assigned this Feb 23, 2026
@AlrauneZ AlrauneZ added the bug Something isn't working label Feb 23, 2026
@AlrauneZ AlrauneZ linked an issue Feb 23, 2026 that may be closed by this pull request
@AlrauneZ AlrauneZ added this to MIBIREM Feb 23, 2026
@AlrauneZ AlrauneZ marked this pull request as ready for review February 23, 2026 14:16
Copy link
Collaborator

@JaroCamphuijsen JaroCamphuijsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you're fixing this, because it is breaking things across other pr's as well. Also nice that we are losing the pandas import and the pd.set_option call in the global scope of this module. It's always a bit tricky to stick things in there and might lead to unexpected behavior, e.g. if you're only using one of the functions in your script and not loading the whole module, it will not execute the global command pd.set_option.
There are some minor issues with the consistency of handling np.inf and its documentation.


# If there are any rows containing NULL cells, the NULL values will be filtered
if len(NaN_rows)>0:
if replace_NaN == 'remove':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here (in the case replace_NaN == 'remove') we drop rows that contain either an by pandas na identified value (as specified here: https://pandas.pydata.org/docs/reference/api/pandas.isna.html#pandas.isna) OR np.inf (both negative and positive). This is because we use the mask that we produce ourselves in line 66, to remove the rows, it includes the np.inf values so also removes these.

However in the following cases ("zero", float|int value, "average", or "median") we use a pandas function (pd.DataFrame.isna()) which only selects the NaN values as identified by pandas.isna and not the np.inf values.

This makes the behavior of this function somewhat unclear and unpredictable. Perhaps we should completely ignore the inf values, or if we want, we can include them but should do that persistently. If you think it is still useful and a much encountered value, you could consider adding another argument to the function (regard_inf_as_na) and just do the replacement (with a DataFrame.replace({np.inf : np.nan, -np.inf : np.nan}) at the start of the function.

Do you think that would be a good solution?

pd.set_option('mode.use_inf_as_na', True)

def filter_values(data_frame,
replace_NaN = 'remove',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we could use a slight improvement in the wording of what this function does and make it more consistent throughout the function. Currently the way it handles np.inf values is not documented in the docstrings and it is not consistent for all values of the replace_NaN argument.

data_frame : pd.dataframe
Tabular data containing variables to be evaluated with standard
column names and rows of sample data.
replace_NaN : string or float, default "remove"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this all lowercase instead? So replace_nan. It makes the function easier to use.

"""
data_mod = self.data.copy()
data_mod.iloc[2,25]=np.inf
data_filter = filter_values(data_mod,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently only works if you set the replace_NaN argument to 'remove'. In al the other cases it will not produce the expected result. Or it will actually, because the filtering of inf values is not described in the docstrings.

Perhaps add a test for handling inf values with the other cases as well if we decide to keep the np.inf handling.

@sonarqubecloud
Copy link

@AlrauneZ
Copy link
Contributor Author

@jaro: good point. I tried first, but it did not work. Now, I managed to get it working with replacing inf by NaN with the command you suggest.

During adapting of the testing, I realized that it is good to include handling of 'inf' values more generally. I adapted the example data now also including one inf-value. This required some adaptions in the testing and in the functions calculating average concentrations and non-zero value counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Update handling of use_inf_as_na

2 participants