I am using maaslin3 v‘0.99.1’ in Rstudio, and am having an issue where a feature is present in the significant_results.tsv file with no model errors, but is not showing up on the summary plot. I have imported the ggplot back into R and see that the feature is not present in the $data list, so it’s not an issue of it being outside the x-limits or something like that. I suspect it might be filtered out due to the high stderr - I’ve included the values below. There are three treatment groups, and the presence of the feature in each is 4/12, 4/11 and 11/11 - so clearly a higher prevalence in the last group. I’m not clear why the stderr would be so high for the prevalence test. If you could please provide any insight into whether you expect this feature to be filtered from the summary plot, I would appreciate it.
The summary plot only reports the taxa with most significant associations (by default 25 taxa), so it’s possible that the p-value and q-value on this association might not be significant enough relative to the rest of your results.
The fact that your coefficient and standard error are both very large suggests that there is linear separability in your data (i.e., all present or all absent in one group). Indeed, if the feature is present in every sample of one group, this is a classic example of linear separability. The data augmentation scheme tries to deal with this (see the MaAsLin 3 preprint or user manual for an explanation how). Below is an example where a default logistic regression breaks on this kind of data but the augmented model gives a significant effect as intended. Have you changed the augment parameter from its default (TRUE)?
What is the full model formula you’re using? Is it just ~ groups where groups has the 3 values you’ve described?
I don’t believe it is outside the 25 most significant, as there are only 10 significant associations in total.
I have tried two different models: 1) ~group and 2) ~group + (1|CageID) . This is from a rat study in which they are housed 2-3 per cage, so I have included CageID as a random effect - does this seem reasonable? I notice now that the issue only occurs with the second model. I haven’t changed the augment parameter.
In our experience, random effects with only 2-3 observations per group can produce issues for logistic mixed effect regression (see the bottom of the random effects section). If you use fixed effects instead, does that work, or does that increase the variance so much to make everything insignificant? Are all the rats in each cage all from the same group? If not, there are more advanced analysis routes you might be able to pursue.