Metaphlan3 analysis with Maaslin2

plicht · February 11, 2021, 10:32am

Hello all,

first of all thanks for this amazing support forum!

I would like to clarify some questions arising when analysing standard Metaphlan output (relative proportions from 0-100, taxonomy table reduced to 1 common level) with Maaslin2:

I have a typcial distribition of microbiome data, i. e. lots of zeros for a given feature. So I would reckon a non-LM analysis like ZINB would be appropriate. However, when running ZINB I do not find any significant taxa whereas with default LM analysis, I have 54 significant taxa. So is normal LM analysis superior to ZINB in this case?
This brings me to my next question: How to transform the data? Log and LOGIT transformation seems to result in the highest number of significant taxa, but this may be overfitted?

In particular, I ran the following analysis model/transformation combinations and got these numbers of significant associations (all other settings left to default):

LM
Log: 54
None: 1
AST: 0
LOGIT: 92

ZINB
Log: 0

NEGBIN
NONE: 4

CPLM
NONE: 16
AST: 5
LOG: 0
LOGIT: 0

Thank you,
Philipp

himel.mallick · February 11, 2021, 2:44pm

Hi @plicht - we usually don’t recommend one model over the others and leave it to the user’s best judgment. All these included models have been carefully validated (as described in our preprint) so that they together represent a multi-model system appropriate for many different microbial community data types (taxonomy or functional profiles), environments (human or otherwise), and measurements (counts or relative counts) along with the implementation of alternative normalization/transformation schemes and statistical models as we strongly believe that the best model for a given dataset is highly context-dependent.

In your case, the total number of detected features is only one way to assess this performance. I recommend deep-diving into the detected features if they are meaningful biologically with respect to effect size, overall distribution, or prior knowledge. An intersection of a few plausible results is a good starting point if you want to start from a reduced set of features.

One minor point: for relative abundances, count models such as negative binomial and ZINB are not appropriate, which might explain why you are not seeing any significant results from running those models. Other than that, CPLM is also an appropriate model for a high number of zero counts in the data. I hope this helps in your decision-making to some extent

Best,
Himel

plicht · February 11, 2021, 3:50pm

Hello @himel.mallick,

thanks a lot for your thoughts. How is the effect beeing size calculated?

When I use excel to calculate
(-log(qval)*SIGN(coeff))
in one case (CPLM model) I get effect values that extend the legend of the heatmap. The taxon shows value -26,26 wheras the heatmap only scales to -20. Is it a display problem or do I calculate wrong?

Best
Philipp

himel.mallick · February 11, 2021, 4:26pm

Hi @plicht - I believe the formula is correct and there are external packages like ComplexHeatmap where you can change the range manually.

Topic		Replies	Views
How to choose the results found by different model and transformation/normalization methods? MaAsLin	2	619	June 10, 2023
Dont understand the the model parameters MaAsLin	1	282	October 2, 2023
Choosing analysis method for maaslin2 MaAsLin	10	4628	May 6, 2024
How to define the transformation / the normalization to use in Maaslin2 MaAsLin	4	2860	May 26, 2021
Questions about choosing analysis method Masslin3 MaAsLin	17	254	April 17, 2025

Metaphlan3 analysis with Maaslin2

Related topics