-
Notifications
You must be signed in to change notification settings - Fork 6
Description
I'm encountering an issue when using categorical covariates with more than 6 levels in a multinomial model with pygformula. The model fitting process produces NaN values, which then prevents bootstrap results from being computed — due to a NoneType error in the results.
This does not happen if I:
- Reduce the number of levels in the categorical variable to 6 or fewer.
- Use one-hot encoding and fit a one-vs-rest model instead of multinomial.
I would prefer to use the multinomial option directly with categorical covariates if possible.
My questions are:
What could be causing the multinomial model to fail when categorical variables have many levels?
Is there a known limitation or workaround in pygformula or statsmodels?
If this can't be fixed cleanly, would using one-vs-rest with one-hot encoding be a valid alternative?
Here is the relevant part of the error output, the warnings repeat across all the execution :
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3028: RuntimeWarning: invalid value encountered in divide
return eXB/eXB.sum(1)[:,None]
Optimization terminated successfully.
Current function value: nan
Iterations 4
Optimization terminated successfully.
Current function value: 0.214036
Iterations 12
Optimization terminated successfully.
Current function value: 0.458476
Iterations 8
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3027: RuntimeWarning: overflow encountered in exp
eXB = np.column_stack((np.ones(len(X)), np.exp(X)))
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3028: RuntimeWarning: invalid value encountered in divide
return eXB/eXB.sum(1)[:,None]
Optimization terminated successfully.
Current function value: nan
Iterations 5
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3027: RuntimeWarning: overflow encountered in exp
eXB = np.column_stack((np.ones(len(X)), np.exp(X)))
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3028: RuntimeWarning: invalid value encountered in divide
return eXB/eXB.sum(1)[:,None]
Optimization terminated successfully.
Current function value: 0.471235
Iterations 8
Optimization terminated successfully.
Current function value: nan
Iterations 4
Optimization terminated successfully.
Current function value: 0.205321
Iterations 10
Optimization terminated successfully.
Current function value: 0.458697
Iterations 8
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3027: RuntimeWarning: overflow encountered in exp
eXB = np.column_stack((np.ones(len(X)), np.exp(X)))
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3028: RuntimeWarning: invalid value encountered in divide
return eXB/eXB.sum(1)[:,None]
Optimization terminated successfully.
Current function value: nan
Iterations 4
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3027: RuntimeWarning: overflow encountered in exp
eXB = np.column_stack((np.ones(len(X)), np.exp(X)))
/opt/anaconda3/lib/python3.12/site-packages/statsmodels/discrete/discrete_model.py:3028: RuntimeWarning: invalid value encountered in divide
return eXB/eXB.sum(1)[:,None]
Optimization terminated successfully.
Current function value: nan
Iterations 4
Traceback (most recent call last):
File "/Users/juanito/Documents/Paper1-HSU/HSU-PSID-Data/Main_Analysis/prueba_error.py", line 147, in
g.fit()
File "/opt/anaconda3/lib/python3.12/site-packages/pygformula/parametric_gformula/parametric_gformula.py", line 742, in fit
boot_results_dicts[i]['boot_results'][j] for j in range(len(self.int_descript)))
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable
Thanks for all the work on this package!