N.not.0 column issues with CLR data

chelsea307 · August 28, 2023, 5:19am

Hello! I am encountering an issue when running MaAsLin2, wherein the “n.not.0” column in the all_results.tsv does not match expected results. I tested this with publicly available data to ensure it wasn’t an issue with my dataset, and got the same problem.

My input data is
dietswap_filt_clr.csv (447.4 KB)
sample_data.csv (10.1 KB)
.csv, as a dataframe in R. This data has been filtered to taxa only present in at least 20% of samples, or 45/222 samples, and then CLR transformed. (The original dataset is from here: Fat, fibre and cancer risk in African Americans and rural Africans | Nature Communications)
My input metadata is sample_data.csv, the metadata from the above study, as a dataframe in R.

I ran the below code, including interaction effects and random effects as these are both in the model I’m using for my actual dataset:

sample_data$timepoint_overweight = (sample_data$bmi_group == "overweight") *
  sample_data$timepoint

sample_data$timepoint_obese = (sample_data$bmi_group == "obese") *
  sample_data$timepoint

fit_data2 = Maaslin2(input_data     = dietswap_filt_clr_df, 
                     input_metadata = sample_data, 
                     min_prevalence = 0,
                     normalization  = "NONE",
                     transform = "NONE",
                     standardize = "FALSE",
                     output         = "test_n.not.0", 
                     fixed_effects  = c("timepoint_overweight", "timepoint_obese","sex","bmi_group"),
                     random_effects = c("subject"),
                     reference      = c("sex,M","bmi_group,lean"))

I’ve uploaded the all_results.tsv from this model here. Immediately, you can see that the “N.not.0” column has listed many features as being present in less than 45 samples, when I had explicitly filtered the dataset to only include taxa present in at least 45 samples.

Does the CLR transformation affect this column? I also tested using input data that was filtered but not CLR transformed and used the CLR function within MaAsLin2, but encountered the same issue with the column numbers. Are the rest of the values in the automatically generated all_results spreadsheet reliable if the input data has already been CLR transformed?

Thank you for your help!
all_results.tsv (59.3 KB)
dietswap_filt_clr.csv (447.4 KB)
sample_data.csv (10.1 KB)

nearinj · August 28, 2023, 6:59pm

Hello,

I’m linking this previous forum thread that I suspect is causing the same issue that you are encountering.

Let me know if that is the case.

Cheers,
Jacob Nearing

chelsea307 · August 28, 2023, 7:46pm

Hi Jacob, thank you for your quick response! I don’t think this issue applies to my dataset, either the example one or the original one, unless I am interpreting the original post incorrectly. All of my metadata cells have non-NA values, and all of my taxonomy cells have numerical values, so I believe that means all of my cases are complete cases. I am also getting the reverse of the original poster’s problem. Rather than the N.not.0 column being inflated, it is actually much smaller than anticipated. For example, one result says that N.not.0 =6 or that the particular taxon was only present in 6 samples, when I know it should be minimally present in 45 samples.

nearinj · August 28, 2023, 8:23pm

Hello,

After looking into this deeper it does seem there is an issue in the codebase for calculating this column value when using CLR transformations. Unfortunately, my week is a bit busy at the moment but I can look into pushing a fix to this to our GitHub repository next week. For now I would ignore this column when using CLR transformation.

Sorry for the issue.
Thanks,
Jacob Nearing

nearinj · September 13, 2023, 12:26am

Just to update this…

We have since pushed an update to the GitHub version of Maaslin2 that solves this issue.

Thanks,
Jacob Nearing

mdanb · December 14, 2024, 2:30am

I don’t think this was fixed. I’m having the same issue, where the values are being underestimated. Weirdly enough, when I run the Maaslin function with my data, I get the problem, but when I actually copy and paste the contents of the function and run it, the values are being correctly calculated

nearinj · December 16, 2024, 3:38pm

Hi @mdanb

Thanks for bringing this to our attention. Could you let me know what version you are running? It’s possible that something was missed.

Cheers,
Jacob

mdanb · December 17, 2024, 8:21pm

@nearinj Yes it’s version 1.16.0

nearinj · December 20, 2024, 7:39pm

Hi @mdanb

After some investigation it looks like the BioC version is a bit out of date/missing this fix. If you want to fix this I would suggest installing the latest version from Github using devtools.

Cheers,
Jacob

emntsh · March 10, 2025, 10:09am

Hello,
I’m also having the same issue of N_not_0 column being underestimated. I am using MaAsLin3 version 0.99.8. In the source code, the sample calculation is called after fitting the models, but I am not completely sure whether the issue is with that call or not. I am using normalization CLR and no transformation. I also wonder whether only n_not_zero amount of data were included in the models? Could you please advise?

Kind regards,
Natasha

WillNickols · March 10, 2025, 2:30pm

Hi Natasha,

In MaAsLin 3, the zero component is split from the non-zero component so that a prevalence model (presence vs. absence) can be fit on the zeros vs. nonzeros and an abundance model (how much, if it’s there) can be fit on the non-zeros. Therefore, only the non-zero piece of the data is included in the abundance model, but everything is included in the prevalence model.

Looking back at the code, I realized the N_not_zero calculation implicitly assumed the transformation was not CLR. I’ve updated it now (version 0.99.11) so that it should work. You’ll probably want to set min_abundance to -Inf to make sure the filtering doesn’t remove your CLR transformed samples.

With that said, I’d highly recommend using the default median_comparison_abundance with TSS normalization and LOG transformation rather than CLR. Both are ways of accounting for compositionality, but median_comparison_abundance produces more interpretable results. We mostly have kept CLR as a legacy option, and we have only benchmarked the median_comparison_abundance strategy.

Will

Topic		Replies	Views
Maaslin2 N and N.not.0 output appears incorrect MaAsLin	1	602	June 10, 2022
Output All results N.not.0=0 MaAsLin	5	1558	December 1, 2020
Different results running GLM and Maaslin2 using same methods/transformations MaAsLin	6	2253	December 12, 2024
MaAsLin2 CLR transformation differing results MaAsLin	1	98	December 18, 2024
Incorrect counts of N and N.not.0 MaAsLin	3	266	June 12, 2023

N.not.0 column issues with CLR data

Related topics