Metaphlan3 analysis with Maaslin2

Hello all,

first of all thanks for this amazing support forum!

I would like to clarify some questions arising when analysing standard Metaphlan output (relative proportions from 0-100, taxonomy table reduced to 1 common level) with Maaslin2:

  • I have a typcial distribition of microbiome data, i. e. lots of zeros for a given feature. So I would reckon a non-LM analysis like ZINB would be appropriate. However, when running ZINB I do not find any significant taxa whereas with default LM analysis, I have 54 significant taxa. So is normal LM analysis superior to ZINB in this case?

  • This brings me to my next question: How to transform the data? Log and LOGIT transformation seems to result in the highest number of significant taxa, but this may be overfitted?

In particular, I ran the following analysis model/transformation combinations and got these numbers of significant associations (all other settings left to default):

LM
Log: 54
None: 1
AST: 0
LOGIT: 92

ZINB
Log: 0

NEGBIN
NONE: 4

CPLM
NONE: 16
AST: 5
LOG: 0
LOGIT: 0

Thank you,
Philipp

Hi @plicht - we usually don’t recommend one model over the others and leave it to the user’s best judgment. All these included models have been carefully validated (as described in our preprint) so that they together represent a multi-model system appropriate for many different microbial community data types (taxonomy or functional profiles), environments (human or otherwise), and measurements (counts or relative counts) along with the implementation of alternative normalization/transformation schemes and statistical models as we strongly believe that the best model for a given dataset is highly context-dependent.

In your case, the total number of detected features is only one way to assess this performance. I recommend deep-diving into the detected features if they are meaningful biologically with respect to effect size, overall distribution, or prior knowledge. An intersection of a few plausible results is a good starting point if you want to start from a reduced set of features.

One minor point: for relative abundances, count models such as negative binomial and ZINB are not appropriate, which might explain why you are not seeing any significant results from running those models. Other than that, CPLM is also an appropriate model for a high number of zero counts in the data. I hope this helps in your decision-making to some extent :slight_smile:

Best,
Himel

Hello @himel.mallick,

thanks a lot for your thoughts. How is the effect beeing size calculated?

When I use excel to calculate
(-log(qval)*SIGN(coeff))
in one case (CPLM model) I get effect values that extend the legend of the heatmap. The taxon shows value -26,26 wheras the heatmap only scales to -20. Is it a display problem or do I calculate wrong?

Best
Philipp

Hi @plicht - I believe the formula is correct and there are external packages like ComplexHeatmap where you can change the range manually.