Using s() for smooth continuous interactions in chapter 7 is problematic

In Chapter 7, you describe the different kinds of interactions that we might anticipate needing in a GAM setting. However, you focus on using `s()` for the case of a smooth interaction of two continuous covariates. This is dangerous because `s()` assumes a single smoothness parameter for the entire 2D smooth and hence it is isotropic - wiggliness in one variable is assumed to be the same as in the other variable. Even if the variables are measured in the same units, we could still run into severe problems using `s()` for smooth interactions because of this.

`s()` is not invariant to the scale of the data; let's assume that in a version of the bioluminesence example that one of the variables in the smooth interaction is strongly right-skewed, with some large values. We would suggest in such circumstances that the skewed variable be transformed to reduce the skew (by a log transform for example) such that we spread out the values and the tail (of the spline) is not wagging the dog (the bulk of the data). If you did this transformation, because the magnitudes of the values taken by the two covariates involved in the smooth interaction `s(x, log(z))` would now be so different, you can get a spuriously wiggly fit in one dimension.

I know you do go on to mention `te()` and `ti()` in Chapter 10, but you don't warn the reader of any potential problems with the workflow you describe in Chapter 7. Any discussion of interactions in GAMs is incomplete without mentioning `te()`; the most frequently encountered setting that I come across when helping people with GAMs is the one involving covariates measured on different scales. I would recommend that you use `te()` for the bioluminesence example in 7.2.

I would further suggest that the example be extended to fitting the `ti(x) + ti(z) + ti(x,z)` model in place of or to complement the use of AIC to assess model fits. The comparison by AIC is prediction-focused, and this is usually not what a user wants to base their modelling decisions on. The `ti()` decomposition addresses the problem of "how do I test for an interaction?" -- if that is something that people desperately want to do**

** I would argue *strongly* that we shouldn't really be doing that and should just estimate the smooth interaction and then decide if the "interaction" part is biologically relevant using conditional values (using `mgcv::vis.gam()` or `gratia::conditional_values()` say).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using s() for smooth continuous interactions in chapter 7 is problematic #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using s() for smooth continuous interactions in chapter 7 is problematic #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions