MaAsLin2 1.22.0
Hi,
I am hoping someone can help to explain the difference between running MaAsLin2 with multiple fixed effects vs running multiple models with just one fixed effect.
I am comparing results from running MaAsLin2 on amplicon data with 3 variables in my metadata. These are related and colinear values: variable 2 is a continuous value that is calculated from variable 3, and variable 1 is a categorical group value assigned using variable 2. I am interested in what features are associated with one or more of these variables. I ran it using each variable as a single fixed effect, and then using all three together, for a total of 4 models.
Results are summarized in the table below, where a 1 means that ASV was a significant feature in the single effect or multivariable (All) model:
ASV | Multivariable | Variable 1 | Variable 2 | Variable 3 |
---|---|---|---|---|
ASV1 | 1 | 1 | 1 | |
ASV2 | 1 | 1 | 1 | |
ASV3 | 1 | 1 | 1 | |
ASV4 | 1 | 1 | 1 | |
ASV5 | 1 | 1 | 1 | |
ASV6 | 1 | 1 | 1 | |
ASV7 | 1 | 1 | 1 | |
ASV8 | 1 | 1 | 1 | |
ASV9 | 1 | 1 | ||
ASV10 | 1 | 1 | ||
ASV11 | 1 | 1 | ||
ASV12 | 1 | 1 | ||
ASV13 | 1 | 1 | ||
ASV14 | 1 | |||
ASV15 | 1 | |||
ASV16 | 1 | |||
ASV17 | 1 | |||
ASV18 | 1 | |||
ASV19 | 1 | |||
ASV20 | 1 | |||
ASV21 | 1 | |||
ASV22 | 1 | |||
ASV23 | 1 |
I am not surprised to see overlap in the ASVs identified in the single variable models, but I am surprised that there is no overlap in the ASVs identified in the single vs multivariable models. Indeed, when I compare the coefficients and p-values between the significant_results and the all_results files for each ASV and each model, they are very different:
A. Example where a significant ASV in a univariable model is not signficant in the multivariable model:
- result from the variable 1 “significant_results” file:
feature | metadata | value | coef | stderr | N | N.not.0 | pval | qval |
---|---|---|---|---|---|---|---|---|
ASV_A | group | Low | 1.883638715 | 0.297951312 | 15 | 10 | 2.65E-05 | 0.013147049 |
- result for the same ASV from the multivariable model “all_results” file
feature | metadata | value | coef | stderr | N | N.not.0 | pval | qval |
---|---|---|---|---|---|---|---|---|
ASV_A | group | Low | -0.018613023 | 1.099851636 | 15 | 10 | 0.986800928 | 0.998205153 |
B. Example where a signficant ASV in the multivariable model is not signficant in the corresponding univariable model:
- result from the multivariable model “significant_results” file:
feature | metadata | value | coef | stderr | N | N.not.0 | pval | qval |
---|---|---|---|---|---|---|---|---|
ASV_B | group | Low | 2.061026575 | 0.412878329 | 15 | 2 | 0.000407739 | 0.247944036 |
- result for the same ASV from the variable 1 “all_results” file
feature | metadata | value | coef | stderr | N | N.not.0 | pval | qval |
---|---|---|---|---|---|---|---|---|
ASV_B | group | Low | -0.334319141 | 0.20370076 | 15 | 2 | 0.124708758 | 0.505043988 |
I think this confusion is coming from a misunderstanding of what is happening in the multivariable models. I thought that the multivariable models were essentially doing a bunch of univariate tests, then adjusting the p-values for all the tests, and reporting associations where q-value < designated threshold. I read this sort of related post, I also followed the terminology in this post, and read Box 4 of this review and these were what led me to realize I was wrong.
Could someone please help clarify what is going on here? thank you!