CLR normalization and min_abundance in MaAsLin3

Hello,

Most metagenomic shotgun sequencing pipelines provide relative abundances as output table and hence TSS may not be appropriate for normalization and CLR will be a better option.

I have a question on the “min_abundance” argument when “CLR” normalization is used.

In the manual it says: “Features with abundances more than min_abundance in more than min_prevalence of the samples will be included for analysis. The threshold is applied after normalization and before transformation.

In case of CLR, some low relative abundance species will have negative abundance due to log transformation, what happens to those species? based on the manual they should be excluded as the min_abundance step comes after the normalization, however, when I looked at the filtered data in the output folder, I can see those species with negative CLR normalized abundances not filtered out (not assigned as NA).

This makes me question the language in the manual. I believe the statement is correct for TSS normalization, but perhaps not for CLR. Is that right?

Also, when “zero_threshold” is applied, is it on the normalized data, transformed or filtered ? As this also can impact CLR transformed data.

Another question, does the package support the use of a preprocessed and pre-transformed (such as rclr, clr, etc) relative abundance table as input? We can set normalization to “NONE” but then the “min_abundance” and “zero_threshold” seem to filterout some negative CLR transformed relative abundances, is there a way to stop those arguments when using normalization = “NONE”?

Thank you for this great package!

Hi,

Thanks for using the tool!

First, I’d push back on your assertion that “TSS may not be appropriate for normalization and CLR will be a better option.” CLR is primarily used to deal with compositionality in microbiome analysis: the fact that testing for differences in relative abundance is not the same as testing for differences in absolute abundance. However, (1) MaAsLin 3 uses a median comparison that also handles relative vs. absolute tests but in a way that makes coefficients more interpretable (see here and the last comment here) and (2) sometimes people do actually care about differences in relative abundance, and then using TSS is clearly the right thing to do. In our benchmarking, we compare MaAsLin 3 against other tools such as ALDEx2 that use CLR, and we maintain better performance, even when high degrees of compositionality would cause tools that don’t specifically correct for this (e.g., MaAsLin 2) to have inflated false positives. As stated in that forum post, I’d highly encourage MaAsLin 3 users to use the TSS option with median comparison since the results are more interpretable and that’s what we’ve benchmarked.

To answer your actual questions :slight_smile: :
If more than min_prevalence of the samples have a feature with less than min_abundance after CLR, that feature will be dropped. This doesn’t filter out specific instances of the feature having less than min_abundance, only the entire feature if it is sufficiently rare. With min_prevalence=0, as long as 1 sample has the feature with a CLR-transformed abundance above 0, it will be kept. The idea here is that some features are rare enough you just don’t care about them, but once you decide you care about a feature, you want to look at all samples that had it. However, if all CLR values are negative for a feature, it should be dropped with a min_abundance threshold of 0. In this toy example, the first all-negative column is dropped:

mat_in <- matrix(c(-1, -2, -3, 4, -5, -6, 7, -8, -9), nrow = 3, ncol = 3)
rownames(mat_in) <- c("a", "b", "c")
maaslin3::maaslin_filter(normalized_data = mat_in, 'tmp_out', min_abundance = 0)

The zero_threshold is applied before CLR on the raw data.

If you set min_abundance to -Inf, nothing should be filtered out. If you set zero_threshold to -Inf, nothing should be turned into a zero (though you’ll then have to figure out what to do with zeros yourself and maybe call the maaslin_transform and maaslin_fit functions manually). Again, I’d still recommend using the defaults since the median comparison test accounts for compositionality like CLR would, and zeros are handled directly in the prevalence model (as opposed to handling them with rclr).

Will

Thank you Will for this informative reply.

One more question, based on what you mentioned, could I use the TSS with my input even if the input is already formatted as relative abundance (i.e. percentages/proportions) [will that be counted as double normalization?]
or
it’s better to use Normalization = “NONE” and transform = “LOG” with median comparison?

Thank you again!

If your input is TSS normalized, running TSS normalization won’t change anything, so both should be equivalent.

1 Like

Got it! thank you very much!