TSS normalization LOG transformation impact on relative abundance data

When performing default TSS normalization and LOG transformation during Maaslin3 analysis, 0 values are coerced into NAs. Does that have an impact on the overall magnitude difference between signals?

Introducing NAs instead of 0, removes the weakest signal samples from abundance comparison. Here is a comparison of filtered_data, filtered_data_norm and filtered_data_norm_transformed from a dataset where a cohort of preterm infants were intervened with probiotic B. longum and another cohort was not intervened (Nguyen et al 2021). Converting 0 values into NAs led to differences between average magnitudes of *B. longum * relative abundance signals between no normalization-transformation and yes normalization-transformation conditionsSo, here is the question how does Maaslin3 treats 0/NA values when performing relative abundance comparison under no normalization-transformation and yes normalization-transformation conditions? Thank you for your time.

In case you want to have a deeper look at the comparison, please see normalization_transformation_check. In the document, different spreadsheets describes as follows:

filtered_data: all data, untreated

filtered_data_longum: probiotic species data, untreated

filtered_data_norm: TSS normalized data

my_transformation: the data where TSS normalization performed by user, but not Maaslin3

filtered_data_norm_transformed: LOG transformed normalized data

filtered_data_norm_transformed_longum: LOG transformed normalized probiotic data

filtered_data_norm_transformed_longum_clean: LOG transformed normalized probiotic data, no NAs

Metadata: metadata

The magnitude of difference in average B. longum signals between no and yes probiotic cohorts’ comparison between filtered_data_longum and filtered_data_norm_transformed_longum_clean varies.

Hi @Baris_Erhan_Ozdinc ,

Thanks for checking out Maaslin3. As you noticed 0 values are treated different in maaslin3 than in maaslin2. In maaslin3 0 values are used in the logistic regression models (prevalence testing) but then discarded in the linear regression models (abundance). This allows maaslin3 to separate out the effect of prevalence from abundance.

Because of this two tiered model system we are able to log the abundance values without a pseudo count since the 0s are discarded anyway in the abundance component.

You can check out the tutorial for maaslin3 here:

thanks,
Jacob Nearing