How to define the transformation / the normalization to use in Maaslin2

Hi,

Thank you very much for this great tool to achieve easily multivariate analyzes in microbiome studies.

I still do not fully understand which parameters in my dataset can help me to choose the good method of normalization or or transformation.

I used rarefied dataset with relative abundance with about 150 different samples (not related, one time point). I want to compare healthy (n=67) vs patients (n=125). In another analysis, I want also to compare within patients different metadata (with some values presented in 5 to 100 patients).

From the paper by Weiss et al, (Microbiome, 2017), I make the asumption that using rarefied data, I may not need further normalization. Do you think I am wrong ?

I tried several methods of transformation, with various results from one to another that puzzled me. Comparing to a more linear analysis with LEFSE, I found that the closer method was using AST transformation with no other normalization applied.

Do you have any insights that can give me a proper method to define properly the settings I have to use in my analysis and which parameter I have to take into account to do so?

Thank you very much,

Nicolas

1 Like

Hi @NicolasB - although we did not include rarefaction in our own evaluation, a recent preprint concluded that MaAsLin 2 (particularly with rarefied data) could also be a reasonable choice for users looking for increased statistical power at the potential cost of more false positives.

Coming back to your question, you are right that rarefied data can be considered normalized data and likewise, you don’t need additional normalization before statistical modeling.

As for alternative models/transformations, we usually do not recommend a particular combination over another as the choice is usually problem- and data-specific. Apart from trying out various transformations with the LM models, you can also consider other non-LM models without normalization/transformation and see if that supports your hypothesis.

Check out the following discussions for some more insights:

  1. Metaphlan3 analysis with Maaslin2
  2. Choosing analysis method for maaslin2

Best,
Himel

Thank you very much Himel ! This helps a lot!

It looks like CPLM analysis with AST transformation fits the most my hypothesis but I have few remaining naive questions:

  • What does bring AST transformation exactly ?
  • Is there specific conditions in which it has to be used/ avoid ?

Same questions for LOG and LOGIT…

Thanks a lot,

Nicolas

Hi @NicolasB - most of these variance-stabilizing transformations (LOGIT and AST for proportions and LOG for any positive values) are applied to approximate homoscedasticity when applying linear models. For the non-LM models, you do not need this transformation as most of these GLM-based models (CPLM, NEGBIN, and ZINB) intrinsically apply a log link function by default. In other words, to maintain interpretation, transform should be set to 'NONE' for all the non-LM models.

Best,
Himel

Thank you very much Himel ! This is very clear for me now,

Best,

Nicolas