I am using Maaslin to analyse differential abundance of mapping to gut-brain-modules in my samples. I mapped the metagenomes to Humann to generate abundance data for KOs (expressed in copies per million), then used omixer-Rpm to map these to gut-brain-modules. I have seen this done in a few published papers.
I previously ran this analysis in Maaslin2 on both CLR and LOG transform options, but I noticed that you now offer the prevalence feature in Maaslin3, so thought I would repeat this.
When I run the CLR analysis in Maaslin3, some of my features which have all negative numbers after CLR transform, are being filtered out. I believe this is because I have selected prev and abundance filters = 0. However, why does this occur in Maaslin3 and not in Maaslin2? I’m asking as it is affecting my results - in Maaslin2 these features were showing as weakly significant in my output. I would really appreciate your help explaining this and what it means for interpretation of my previous results?
I’m also finding differences in my output using LOG transform for Maaslin2 and Maaslin3.
Many thanks,
Katherine
When using MaAsLin 3, I would recommend using the log transform and median_comparison_abundance since this will account for compositionality (the reason people use CLR in the first place). The flow in MaAsLin 3 is normalization (where CLR applies) → filtering → transformation, so if you still want to use CLR, I would recommend setting the filter thresholds to -Inf.
Regarding the differences within the log transformed results, if you have very high sparsity in your data (lots of 0s), you could end up with different results between MaAsLin 2 and 3 since MaAsLin 3 is splitting out the zeros and handling them differently.
I think this is an interesting point you raise about the differences with how MaAslin 2 and 3 deal with zeros that some of my colleagues and I have been thinking about recently. We understand that the difference is coming from the new prevalence modeling support in MaAslin 3 (which is an interesting way of dealing with the zeros).
For people who are interested in running only the abundance model (perhaps by explicitly setting the evaluate_only = abundance) is there any way to force MaAslin 3 to use psuedo-counts for zeros (other than manually pre-filling zeros before passing the data in)? The fact that certain samples get dropped from certain taxa level comparisons in the abundance-only model given the new zero-handling was a bit unexpected for us in the context of the abundance-only model (it makes more sense when running both the prevalence and the abundance model at the same time).
Just wanted to raise this as it took my colleague a bit of digging to figure it out for us and we imagined that other folks might not have caught that the fact that the interpretation of the MaAslin 3 abundance results is now subtly different due to this change. (I think this is also related to @Katb 's other issue as well)