MaAsLin2: Comparing results with one fixed effect vs many

MaAsLin2 1.22.0

Hi,

I am hoping someone can help to explain the difference between running MaAsLin2 with multiple fixed effects vs running multiple models with just one fixed effect.

I am comparing results from running MaAsLin2 on amplicon data with 3 variables in my metadata. These are related and colinear values: variable 2 is a continuous value that is calculated from variable 3, and variable 1 is a categorical group value assigned using variable 2. I am interested in what features are associated with one or more of these variables. I ran it using each variable as a single fixed effect, and then using all three together, for a total of 4 models.

Results are summarized in the table below, where a 1 means that ASV was a significant feature in the single effect or multivariable (All) model:

ASV Multivariable Variable 1 Variable 2 Variable 3
ASV1 1 1 1
ASV2 1 1 1
ASV3 1 1 1
ASV4 1 1 1
ASV5 1 1 1
ASV6 1 1 1
ASV7 1 1 1
ASV8 1 1 1
ASV9 1 1
ASV10 1 1
ASV11 1 1
ASV12 1 1
ASV13 1 1
ASV14 1
ASV15 1
ASV16 1
ASV17 1
ASV18 1
ASV19 1
ASV20 1
ASV21 1
ASV22 1
ASV23 1

I am not surprised to see overlap in the ASVs identified in the single variable models, but I am surprised that there is no overlap in the ASVs identified in the single vs multivariable models. Indeed, when I compare the coefficients and p-values between the significant_results and the all_results files for each ASV and each model, they are very different:

A. Example where a significant ASV in a univariable model is not signficant in the multivariable model:

- result from the variable 1 “significant_results” file:

feature metadata value coef stderr N N.not.0 pval qval
ASV_A group Low 1.883638715 0.297951312 15 10 2.65E-05 0.013147049

- result for the same ASV from the multivariable model “all_results” file

feature metadata value coef stderr N N.not.0 pval qval
ASV_A group Low -0.018613023 1.099851636 15 10 0.986800928 0.998205153

B. Example where a signficant ASV in the multivariable model is not signficant in the corresponding univariable model:

- result from the multivariable model “significant_results” file:

feature metadata value coef stderr N N.not.0 pval qval
ASV_B group Low 2.061026575 0.412878329 15 2 0.000407739 0.247944036

- result for the same ASV from the variable 1 “all_results” file

feature metadata value coef stderr N N.not.0 pval qval
ASV_B group Low -0.334319141 0.20370076 15 2 0.124708758 0.505043988

I think this confusion is coming from a misunderstanding of what is happening in the multivariable models. I thought that the multivariable models were essentially doing a bunch of univariate tests, then adjusting the p-values for all the tests, and reporting associations where q-value < designated threshold. I read this sort of related post, I also followed the terminology in this post, and read Box 4 of this review and these were what led me to realize I was wrong.

Could someone please help clarify what is going on here? thank you!

Hi @bbm,

It is generally not recommended to include highly co-linear variables in the same multivariable model (especially variables that you derived from one another). This is because it makes it increasingly difficult to isolate the effect of each single co-variate on the value you are attempting to predict.

I would recommend sticking with a single variable that matches with your initial hypothesis (before you looked at all the results) and stick with that.

Jacob Nearing