Hello! I am encountering an issue when running MaAsLin2, wherein the “n.not.0” column in the all_results.tsv does not match expected results. I tested this with publicly available data to ensure it wasn’t an issue with my dataset, and got the same problem.
My input data is
dietswap_filt_clr.csv (447.4 KB)
sample_data.csv (10.1 KB)
.csv, as a dataframe in R. This data has been filtered to taxa only present in at least 20% of samples, or 45/222 samples, and then CLR transformed. (The original dataset is from here: Fat, fibre and cancer risk in African Americans and rural Africans | Nature Communications)
My input metadata is sample_data.csv, the metadata from the above study, as a dataframe in R.
I ran the below code, including interaction effects and random effects as these are both in the model I’m using for my actual dataset:
sample_data$timepoint_overweight = (sample_data$bmi_group == "overweight") * sample_data$timepoint sample_data$timepoint_obese = (sample_data$bmi_group == "obese") * sample_data$timepoint fit_data2 = Maaslin2(input_data = dietswap_filt_clr_df, input_metadata = sample_data, min_prevalence = 0, normalization = "NONE", transform = "NONE", standardize = "FALSE", output = "test_n.not.0", fixed_effects = c("timepoint_overweight", "timepoint_obese","sex","bmi_group"), random_effects = c("subject"), reference = c("sex,M","bmi_group,lean"))
I’ve uploaded the all_results.tsv from this model here. Immediately, you can see that the “N.not.0” column has listed many features as being present in less than 45 samples, when I had explicitly filtered the dataset to only include taxa present in at least 45 samples.
Does the CLR transformation affect this column? I also tested using input data that was filtered but not CLR transformed and used the CLR function within MaAsLin2, but encountered the same issue with the column numbers. Are the rest of the values in the automatically generated all_results spreadsheet reliable if the input data has already been CLR transformed?