Hi,
Thank you very much for this great tool to achieve easily multivariate analyzes in microbiome studies.
I still do not fully understand which parameters in my dataset can help me to choose the good method of normalization or or transformation.
I used rarefied dataset with relative abundance with about 150 different samples (not related, one time point). I want to compare healthy (n=67) vs patients (n=125). In another analysis, I want also to compare within patients different metadata (with some values presented in 5 to 100 patients).
From the paper by Weiss et al, (Microbiome, 2017), I make the asumption that using rarefied data, I may not need further normalization. Do you think I am wrong ?
I tried several methods of transformation, with various results from one to another that puzzled me. Comparing to a more linear analysis with LEFSE, I found that the closer method was using AST transformation with no other normalization applied.
Do you have any insights that can give me a proper method to define properly the settings I have to use in my analysis and which parameter I have to take into account to do so?
Thank you very much,
Nicolas
1 Like
Hi @NicolasB - although we did not include rarefaction in our own evaluation, a recent preprint concluded that MaAsLin 2 (particularly with rarefied data) could also be a reasonable choice for users looking for increased statistical power at the potential cost of more false positives
.
Coming back to your question, you are right that rarefied data can be considered normalized
data and likewise, you don’t need additional normalization before statistical modeling.
As for alternative models/transformations, we usually do not recommend a particular combination over another as the choice is usually problem- and data-specific. Apart from trying out various transformations with the LM models, you can also consider other non-LM models without normalization/transformation and see if that supports your hypothesis.
Check out the following discussions for some more insights:
- Metaphlan3 analysis with Maaslin2
- Choosing analysis method for maaslin2
Best,
Himel
Thank you very much Himel ! This helps a lot!
It looks like CPLM analysis with AST transformation fits the most my hypothesis but I have few remaining naive questions:
- What does bring AST transformation exactly ?
- Is there specific conditions in which it has to be used/ avoid ?
Same questions for LOG and LOGIT…
Thanks a lot,
Nicolas
Hi @NicolasB - most of these variance-stabilizing transformations (LOGIT and AST for proportions and LOG for any positive values) are applied to approximate homoscedasticity when applying linear models. For the non-LM models, you do not need this transformation as most of these GLM-based models (CPLM
, NEGBIN
, and ZINB
) intrinsically apply a log link function by default. In other words, to maintain interpretation, transform
should be set to 'NONE'
for all the non-LM models.
Best,
Himel
Thank you very much Himel ! This is very clear for me now,
Best,
Nicolas