Dear developers, thank you very much for developing such a great and versatile package. Our group works on respiratory microbiota data and I frequently advise colleagues to use MaAsLin2 over other packages.
However, there are related two issues I recently came across, that I think should either be changed or should be more explicitly underscored in the tutorial.
- Filtering occurs before normalization. In the case of raw/count input data combined with for example TSS-normalization, this seems odd, as the relative abundances of features/taxa would be vastly different depending on the
min_abundance
/min_prevalence
andmin_variance
-settings. This also means that p-values are shifting depending on the presence of other data fed to the function (and depend on the amount of filtering).
Generally in our lab, we are used to first converting to relative abundance and then filtering (so that a given feature, say ‘Streptococcus_1’, is present in 20% in a particular sample and not 17%/15%/26% depending on the filtering parameters). - In the tutorial the selection of a correct normalization method is well explained. Yet, the example data (
HMP2_taxonomy.tsv
/HMP2_metadata.tsv
) includes TSS-normalized data, while the developers used default normalization in the tutorial (which is TSS). This implies that the data are TSS-normalized twice (which is inappropriate combined with themin_prevalence
-filtering step which is also applied by default). I think this should be changed. In addition, the developers could consider to change the default TSS-normalization to NONE, advising users to in principle use prenormalized data or consider the implications of not normalising themselves.
Curious to hear your views on this and happy to discuss further if needed.