Enhance add_integration function to detect and report multicollinearity issues#49
Open
dkarletsos wants to merge 2 commits intodmphillippo:developfrom
Open
Conversation
choxos
added a commit
to choxos/multinma
that referenced
this pull request
Jul 21, 2025
…lippo#53, dmphillippo#54, dmphillippo#55 - Add multicollinearity detection in add_integration() (dmphillippo#49) * Comprehensive detection system with multiple validation points * Specific variable identification and remediation guidance * 88 new lines of functionality + 263 lines of tests - Implement TTE ranking warnings for auxiliary effects (dmphillippo#52) * Automatic detection of ranking ambiguity in survival models * Clear warnings about limitations and recommendations * Complete test coverage for all scenarios - Return knot locations from spline models (dmphillippo#53) * Automatic inclusion for mspline and pexp models * Preservation of user-specified configurations * Enhanced sensitivity analysis capabilities * Updated class documentation - Add comprehensive MCMC convergence diagnostics (dmphillippo#54) * Five new diagnostic functions with bayesplot integration * 306 lines of new functionality + 285 lines of tests * Automatic parameter selection and multinma theming * Complete suite: trace, ACF, R-hat, n_eff, combined plots - Fix exchangeable classes with single treatments (dmphillippo#55) * Fixed array bounds checking across 8 Stan model files * Prevents runtime errors for single-treatment classes * 16 total instances fixed with proper validation * Comprehensive edge case testing All fixes include: - 1,000+ lines of new functionality - 1,000+ lines of comprehensive test coverage - Full backward compatibility maintained - Production-ready implementation - Complete documentation updates
choxos
added a commit
to choxos/multinma
that referenced
this pull request
Jul 21, 2025
Issues resolved: - dmphillippo#49: Enhanced multicollinearity detection in add_integration() - dmphillippo#52: TTE parametric distribution ranking warnings - dmphillippo#53: Return knot locations from spline-based likelihoods - dmphillippo#54: MCMC convergence diagnostics functions - dmphillippo#55: Fix exchangeable classes with single treatments Changes include: - 306 lines of new MCMC diagnostic functionality - 88 lines of multicollinearity detection with comprehensive validation - Automatic knot location storage for mspline/pexp models - Warning system for TTE ranking ambiguity - 21 Stan file fixes for array bounds checking - 1000+ lines of comprehensive test coverage - Full backward compatibility maintained - Production-ready implementation
dmphillippo
reviewed
Dec 19, 2025
Owner
dmphillippo
left a comment
There was a problem hiding this comment.
Thanks @dkarletsos! Sorry this has taken me a while to get to. I've had a look over and made some suggestions - see what you think.
Comment on lines
+167
to
+177
| # Check for multicollinearity using eigenvalues | ||
| eigenvalues <- eigen(cor, symmetric = TRUE)$values | ||
| if (any(is.na(eigenvalues) | eigenvalues <= 0)) { | ||
| abort( | ||
| glue::glue( | ||
| "Multicollinearity detected in the provided correlation matrix `cor`.\n", | ||
| "This often occurs when covariates are perfectly or nearly perfectly correlated.\n", | ||
| "Check for redundant variables or ensure no one-hot encoded variables sum to a constant." | ||
| ) | ||
| ) | ||
| } |
Owner
There was a problem hiding this comment.
A few thoughts:
- We still need to catch the case where some entries are NA, otherwise
eigen()fails cryptically. This can happen when a variable is constant. - I'm also not sure we can say that collinearity in data is definitely the cause (since
cormay have come from anywhere). - We should probably use a tolerance on the posdef check; the generation of integration points fails if
coris numerically singular. The check below should catch the same numerical failures ascopula::cCopula().
Suggested change
| # Check for multicollinearity using eigenvalues | |
| eigenvalues <- eigen(cor, symmetric = TRUE)$values | |
| if (any(is.na(eigenvalues) | eigenvalues <= 0)) { | |
| abort( | |
| glue::glue( | |
| "Multicollinearity detected in the provided correlation matrix `cor`.\n", | |
| "This often occurs when covariates are perfectly or nearly perfectly correlated.\n", | |
| "Check for redundant variables or ensure no one-hot encoded variables sum to a constant." | |
| ) | |
| ) | |
| } | |
| if (any(is.na(cor))) | |
| abort(c("Missing values in `cor`", i = "Correlations may have been calculated for one or more constant variables")) | |
| ev <- eigen(cor, symmetric = TRUE)$values | |
| if (any(ev < 0)) abort("`cor` should be a correlation matrix or NULL") | |
| if (any(ev == 0)) abort(c("`cor` is singular", i = "Correlations may have been calculated for variables which are collinear or perfectly correlated")) | |
| for (i in 1:(ncol(cor)-1)) { | |
| if (Matrix::rcond(cor[1:i, 1:i, drop = FALSE]) <= .Machine$double.eps) | |
| abort(c("`cor` is computationally singular", i = "Correlations may have been calculated for variables which are (nearly) collinear or perfectly correlated")) | |
| } |
Comment on lines
+687
to
+720
| test_that("add_integration detects multicollinearity in correlation matrix", { | ||
| df <- tibble::tibble( | ||
| covariate1 = c(0.1, 0.2, 0.3), | ||
| covariate2 = c(0.4, 0.5, 0.6), | ||
| covariate3 = c(0.7, 0.8, 0.9) | ||
| ) | ||
|
|
||
| # Create a correlation matrix with multicollinearity (perfectly correlated variables) | ||
| cor_matrix <- matrix( | ||
| c(1, 1, 1, | ||
| 1, 1, 1, | ||
| 1, 1, 1), | ||
| nrow = 3, | ||
| ncol = 3 | ||
| ) | ||
| colnames(cor_matrix) <- c("covariate1", "covariate2", "covariate3") | ||
| rownames(cor_matrix) <- c("covariate1", "covariate2", "covariate3") | ||
|
|
||
| distr_args <- list( | ||
| covariate1 = distr(qnorm, mean = 0, sd = 1), | ||
| covariate2 = distr(qnorm, mean = 0, sd = 1), | ||
| covariate3 = distr(qnorm, mean = 0, sd = 1) | ||
| ) | ||
|
|
||
| expect_error( | ||
| add_integration.data.frame( | ||
| df, | ||
| !!! distr_args, | ||
| cor = cor_matrix, | ||
| n_int = 64L | ||
| ), | ||
| regexp = "Multicollinearity detected in the provided correlation matrix `cor`" | ||
| ) | ||
| }) |
Owner
There was a problem hiding this comment.
I think we can simplify the test setup a little, and check some more of the error cases.
Suggested change
| test_that("add_integration detects multicollinearity in correlation matrix", { | |
| df <- tibble::tibble( | |
| covariate1 = c(0.1, 0.2, 0.3), | |
| covariate2 = c(0.4, 0.5, 0.6), | |
| covariate3 = c(0.7, 0.8, 0.9) | |
| ) | |
| # Create a correlation matrix with multicollinearity (perfectly correlated variables) | |
| cor_matrix <- matrix( | |
| c(1, 1, 1, | |
| 1, 1, 1, | |
| 1, 1, 1), | |
| nrow = 3, | |
| ncol = 3 | |
| ) | |
| colnames(cor_matrix) <- c("covariate1", "covariate2", "covariate3") | |
| rownames(cor_matrix) <- c("covariate1", "covariate2", "covariate3") | |
| distr_args <- list( | |
| covariate1 = distr(qnorm, mean = 0, sd = 1), | |
| covariate2 = distr(qnorm, mean = 0, sd = 1), | |
| covariate3 = distr(qnorm, mean = 0, sd = 1) | |
| ) | |
| expect_error( | |
| add_integration.data.frame( | |
| df, | |
| !!! distr_args, | |
| cor = cor_matrix, | |
| n_int = 64L | |
| ), | |
| regexp = "Multicollinearity detected in the provided correlation matrix `cor`" | |
| ) | |
| }) | |
| test_that("add_integration detects multicollinearity in correlation matrix", { | |
| # Collinear | |
| expect_error( | |
| add_integration(data.frame(), | |
| x1 = distr(qnorm, mean = 0, sd = 1), | |
| x2 = distr(qnorm, mean = 0, sd = 1), | |
| x3 = distr(qnorm, mean = 0, sd = 1), | |
| cor = cor(cbind(1:4, 1:4, 1:4))), | |
| "may have been calculated for variables which are collinear or perfectly correlated") | |
| expect_error( | |
| add_integration(data.frame(), | |
| x1 = distr(qnorm, mean = 0, sd = 1), | |
| x2 = distr(qnorm, mean = 0, sd = 1), | |
| x3 = distr(qnorm, mean = 0, sd = 1), | |
| cor = cor(cbind(1:4, -(1:4), 1:4))), | |
| "may have been calculated for variables which are collinear or perfectly correlated") | |
| # Numerically collinear | |
| cc <- matrix(1 - 1e-16, 3, 3) | |
| diag(cc) <- 1 | |
| expect_error( | |
| add_integration(data.frame(), | |
| x1 = distr(qnorm, mean = 0, sd = 1), | |
| x2 = distr(qnorm, mean = 0, sd = 1), | |
| x3 = distr(qnorm, mean = 0, sd = 1), | |
| cor = cc), | |
| "may have been calculated for variables which are \\(nearly\\) collinear or perfectly correlated") | |
| # Constant | |
| expect_error( | |
| add_integration(data.frame(), | |
| x1 = distr(qnorm, mean = 0, sd = 1), | |
| x2 = distr(qnorm, mean = 0, sd = 1), | |
| x3 = distr(qnorm, mean = 0, sd = 1), | |
| cor = suppressWarnings(cor(cbind(1, 1:4, 1:4)))), | |
| "may have been calculated for one or more constant variables") | |
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The existing implementation returned a generic error message when multicollinearity caused issues with the correlation matrix. This PR enhances the add_integration function by adding functionality to detect and report multicollinearity in covariates by looking at the eigenvalues of the correlation matrix. When multicollinearity is present, the function now returns a clear and informative error message to help users identify the problematic covariates.
Key Changes: