Hello! I am encountering an issue when running MaAsLin2, wherein the “n.not.0” column in the all_results.tsv does not match expected results. I tested this with publicly available data to ensure it wasn’t an issue with my dataset, and got the same problem.
I’ve uploaded the all_results.tsv from this model here. Immediately, you can see that the “N.not.0” column has listed many features as being present in less than 45 samples, when I had explicitly filtered the dataset to only include taxa present in at least 45 samples.
Does the CLR transformation affect this column? I also tested using input data that was filtered but not CLR transformed and used the CLR function within MaAsLin2, but encountered the same issue with the column numbers. Are the rest of the values in the automatically generated all_results spreadsheet reliable if the input data has already been CLR transformed?
Hi Jacob, thank you for your quick response! I don’t think this issue applies to my dataset, either the example one or the original one, unless I am interpreting the original post incorrectly. All of my metadata cells have non-NA values, and all of my taxonomy cells have numerical values, so I believe that means all of my cases are complete cases. I am also getting the reverse of the original poster’s problem. Rather than the N.not.0 column being inflated, it is actually much smaller than anticipated. For example, one result says that N.not.0 =6 or that the particular taxon was only present in 6 samples, when I know it should be minimally present in 45 samples.
After looking into this deeper it does seem there is an issue in the codebase for calculating this column value when using CLR transformations. Unfortunately, my week is a bit busy at the moment but I can look into pushing a fix to this to our GitHub repository next week. For now I would ignore this column when using CLR transformation.
I don’t think this was fixed. I’m having the same issue, where the values are being underestimated. Weirdly enough, when I run the Maaslin function with my data, I get the problem, but when I actually copy and paste the contents of the function and run it, the values are being correctly calculated
After some investigation it looks like the BioC version is a bit out of date/missing this fix. If you want to fix this I would suggest installing the latest version from Github using devtools.
Hello,
I’m also having the same issue of N_not_0 column being underestimated. I am using MaAsLin3 version 0.99.8. In the source code, the sample calculation is called after fitting the models, but I am not completely sure whether the issue is with that call or not. I am using normalization CLR and no transformation. I also wonder whether only n_not_zero amount of data were included in the models? Could you please advise?
In MaAsLin 3, the zero component is split from the non-zero component so that a prevalence model (presence vs. absence) can be fit on the zeros vs. nonzeros and an abundance model (how much, if it’s there) can be fit on the non-zeros. Therefore, only the non-zero piece of the data is included in the abundance model, but everything is included in the prevalence model.
Looking back at the code, I realized the N_not_zero calculation implicitly assumed the transformation was not CLR. I’ve updated it now (version 0.99.11) so that it should work. You’ll probably want to set min_abundance to -Inf to make sure the filtering doesn’t remove your CLR transformed samples.
With that said, I’d highly recommend using the default median_comparison_abundance with TSS normalization and LOG transformation rather than CLR. Both are ways of accounting for compositionality, but median_comparison_abundanceproduces more interpretable results. We mostly have kept CLR as a legacy option, and we have only benchmarked the median_comparison_abundance strategy.