Hi there! I’m currently using MaAsLin2 and was wondering the choice of transformation between “CLR” and “TSS” on compositional microbiome data. It seems recently quite a few studies have been discussing the bias of using TSS towards compositional data, e.g., relative abundance. They argue that change in the abundance of a single taxon can alter the relative abundances of all taxa and the FDR generated from TSS-based analyses could be large.
So my question is that what’s the advantage of using the default TSS over CLR in MaAsLin2? Or is it actually better to use CLR for dealing with relative abundance data? Thank you!
As stated before, we usually don’t recommend one model or normalization over the others and leave it to the user’s best judgment. All the included options in MaAsLin 2 have been carefully validated (as described in our paper) so that they together represent a multi-model system appropriate for many different microbial community data types (taxonomy or functional profiles), environments (human or otherwise), and measurements (counts or relative counts) along with the implementation of alternative normalization/transformation schemes and statistical models as we strongly believe that the best model for a given dataset is highly context-dependent.
TSS is a normalization technique, and CLR is normalization + transformation.
Normalization removes per-sample technical effects (e.g. library size), and transformation make skewed data nicer so the model fits are valid.
You should always do both. MaAsLin 2 default is LOG transformation + TSS normalization.
In practice, CLR(data) = CLR(TSS(data)) ~ LOG(TSS(data)). So most often you should see similar results by using CLR, vs. by using the MaAsLin2 default.