N.not.0 column issues with CLR data

Hello! I am encountering an issue when running MaAsLin2, wherein the “n.not.0” column in the all_results.tsv does not match expected results. I tested this with publicly available data to ensure it wasn’t an issue with my dataset, and got the same problem.

I ran the below code, including interaction effects and random effects as these are both in the model I’m using for my actual dataset:

sample_data$timepoint_overweight = (sample_data$bmi_group == "overweight") *
  sample_data$timepoint

sample_data$timepoint_obese = (sample_data$bmi_group == "obese") *
  sample_data$timepoint

fit_data2 = Maaslin2(input_data     = dietswap_filt_clr_df, 
                     input_metadata = sample_data, 
                     min_prevalence = 0,
                     normalization  = "NONE",
                     transform = "NONE",
                     standardize = "FALSE",
                     output         = "test_n.not.0", 
                     fixed_effects  = c("timepoint_overweight", "timepoint_obese","sex","bmi_group"),
                     random_effects = c("subject"),
                     reference      = c("sex,M","bmi_group,lean"))

I’ve uploaded the all_results.tsv from this model here. Immediately, you can see that the “N.not.0” column has listed many features as being present in less than 45 samples, when I had explicitly filtered the dataset to only include taxa present in at least 45 samples.

Does the CLR transformation affect this column? I also tested using input data that was filtered but not CLR transformed and used the CLR function within MaAsLin2, but encountered the same issue with the column numbers. Are the rest of the values in the automatically generated all_results spreadsheet reliable if the input data has already been CLR transformed?

Thank you for your help!
all_results.tsv (59.3 KB)
dietswap_filt_clr.csv (447.4 KB)
sample_data.csv (10.1 KB)

1 Like

Hello,

I’m linking this previous forum thread that I suspect is causing the same issue that you are encountering.

Let me know if that is the case.

Cheers,
Jacob Nearing

Hi Jacob, thank you for your quick response! I don’t think this issue applies to my dataset, either the example one or the original one, unless I am interpreting the original post incorrectly. All of my metadata cells have non-NA values, and all of my taxonomy cells have numerical values, so I believe that means all of my cases are complete cases. I am also getting the reverse of the original poster’s problem. Rather than the N.not.0 column being inflated, it is actually much smaller than anticipated. For example, one result says that N.not.0 =6 or that the particular taxon was only present in 6 samples, when I know it should be minimally present in 45 samples.

Hello,

After looking into this deeper it does seem there is an issue in the codebase for calculating this column value when using CLR transformations. Unfortunately, my week is a bit busy at the moment but I can look into pushing a fix to this to our GitHub repository next week. For now I would ignore this column when using CLR transformation.

Sorry for the issue.
Thanks,
Jacob Nearing

Just to update this…

We have since pushed an update to the GitHub version of Maaslin2 that solves this issue.

Thanks,
Jacob Nearing

I don’t think this was fixed. I’m having the same issue, where the values are being underestimated. Weirdly enough, when I run the Maaslin function with my data, I get the problem, but when I actually copy and paste the contents of the function and run it, the values are being correctly calculated

Hi @mdanb

Thanks for bringing this to our attention. Could you let me know what version you are running? It’s possible that something was missed.

Cheers,
Jacob

@nearinj Yes it’s version 1.16.0

Hi @mdanb

After some investigation it looks like the BioC version is a bit out of date/missing this fix. If you want to fix this I would suggest installing the latest version from Github using devtools.

Cheers,
Jacob