Hi,
Thanks for using the tool!
First, I’d push back on your assertion that “TSS may not be appropriate for normalization and CLR will be a better option.” CLR is primarily used to deal with compositionality in microbiome analysis: the fact that testing for differences in relative abundance is not the same as testing for differences in absolute abundance. However, (1) MaAsLin 3 uses a median comparison that also handles relative vs. absolute tests but in a way that makes coefficients more interpretable (see here and the last comment here) and (2) sometimes people do actually care about differences in relative abundance, and then using TSS is clearly the right thing to do. In our benchmarking, we compare MaAsLin 3 against other tools such as ALDEx2 that use CLR, and we maintain better performance, even when high degrees of compositionality would cause tools that don’t specifically correct for this (e.g., MaAsLin 2) to have inflated false positives. As stated in that forum post, I’d highly encourage MaAsLin 3 users to use the TSS option with median comparison since the results are more interpretable and that’s what we’ve benchmarked.
To answer your actual questions :
If more than min_prevalence
of the samples have a feature with less than min_abundance
after CLR, that feature will be dropped. This doesn’t filter out specific instances of the feature having less than min_abundance
, only the entire feature if it is sufficiently rare. With min_prevalence=0
, as long as 1 sample has the feature with a CLR-transformed abundance above 0, it will be kept. The idea here is that some features are rare enough you just don’t care about them, but once you decide you care about a feature, you want to look at all samples that had it. However, if all CLR values are negative for a feature, it should be dropped with a min_abundance
threshold of 0. In this toy example, the first all-negative column is dropped:
mat_in <- matrix(c(-1, -2, -3, 4, -5, -6, 7, -8, -9), nrow = 3, ncol = 3)
rownames(mat_in) <- c("a", "b", "c")
maaslin3::maaslin_filter(normalized_data = mat_in, 'tmp_out', min_abundance = 0)
The zero_threshold
is applied before CLR on the raw data.
If you set min_abundance
to -Inf
, nothing should be filtered out. If you set zero_threshold
to -Inf
, nothing should be turned into a zero (though you’ll then have to figure out what to do with zeros yourself and maybe call the maaslin_transform
and maaslin_fit
functions manually). Again, I’d still recommend using the defaults since the median comparison test accounts for compositionality like CLR would, and zeros are handled directly in the prevalence model (as opposed to handling them with rclr).
Will