I was tring to explore the association of microbial clades with sample metadata using MaAsLin2. Three interesting fixed effects in metadata, e.g. “effect 1” , “effect 2” and “effect 3” , were running together (max_significance = 0.25 ) or separately (max_significance = 0.05 ).
But, the significnat association features I got were totally different. When I run three fixed effects together, I got less features (less than 10 features for each fixed effect based on heatmap ) compared to runing separately (more than 20 features for each one based on the Significant results).
So, which one should I use, together or separataly? which one is convinced? and why?
Hope to get your help.
Thanks for the question! When using MaAsLin2 you can either run it in univariate (the second option above) or in multivariate where the tool finds the best associations for each independent variable while accounting for other variables in the model. This is why you received fewer significant results for the model that included all three variables.
Re: the question of if you should use the univariate or the multivariate model, that depends on what effect 1, 2, and 3 are. Do they confound each other? Are they co-linear or correlated? If you have something like age, bmi etc. I always like to adjust for those in my multivariate models because there is a substantial literature base on how those can impact the composition of the gut microbiome. Whereas, for multiple disease endpoints that are co-linear you may want to construct multiple models. Also you always want to check the QC files per-result (box or scatter plots) to make sure that your results are making sense and appear to be true trends and not impacted by outliers.
I hope this helps. Let us know if you have any additional questions.
Thanks for your useful explanations!
It is very helpful for me!