Global significance test for multilevel factor

Hello MaAsLin2 folks,

Thank you for this super useful tool!

I’ve been searching existing posts for a while and can’t seem to find an answer to this question.

When I include a fixed factor that has >2 levels is there a way to perform a global significance test? While having the result of every pairwise test is very useful, I would also like to know whether any significant differences exist between any level of the factor and the associated P value. For example, the likelihood ratio test (LRT) method in DESeq2 and the res_global in ANCOMBC calculates a p value testing for any differences among factor levels in factors with >2 levels. Is there an analagous MaAsLin2 output that I’m missing?

Thanks,
Connor

Hi Connor (@cfitz),

No, you are correct, we do not currently have a way to do a global significance test for a multi-level variable. We have discussed this internally and think that it is a great idea for a future add-on for the tool.

The only thing that I can think of to handle this currently and it certainly suffers from some issues, would be to binarize your variable if possible (e.g. healthy/disease).

Thank you for your question!
Best,
Kelsey

Thank you for the reply Kelsey! I have experiments where an intuitive reference within factors exists and the pairwise comparison output of MaAsLin2 is super convenient! But occasionally some factors have no clear reference, like locality, but I’m still interested in analyzing them as a fixed effect. I guess another nice feature might be for MaAsLin2 to output p values associated with random effect terms? This way a factor with no clear reference could still be included in the model with an associated p value? Much like one would perform a likelihood ratio test with a linear mixed effects model (e.g. function lmer from R package lme4) to calculate a p value for a random effect term.

Regarding p-values for random effect terms:

Getting a calibrated p-value for a random effect term is tricky. A lot of details from study design to incorrect distributional assumptions can throw off p-value calculations based on e.g. likelihood ratio tests. See Figure 3 in this paper for an example. It would be difficult to implement, test, and validate this sort of feature (particularly for the wide range of applications Maaslin2 gets used in), so we’re not planning to add that at this time.

You might want to ask yourself to articulate why you want to assess the significance of the random effects term in the first place. Maybe asking “does my effect show consistent subject-level variation?” is a question you need a statistically quantifiable yes/no answer to, but a lot of times researchers know a priori that the answer is “Yes” and don’t need to assess the significance of that question.

Yes, I can see why you do not want to wade in the direction of random effect significance, thanks for the reference.

I think there are several reasons why p values for random effects are appealing, especially in the realm of ecological research. We often only sample a small subset of possible factor levels (e.g. genotypes or ecotypes or localities) but we still wish to know whether this factor is significant. One might not want to code it as a fixed effect because the differences between the particular sampled levels of the factor aren’t terribly informative but the underlying population is. Anyways, a global test for significance for >2 level fixed factors is a reasonable work around I think.