Discrepancy in the number of non-zero samples in input file and output for Maaslin2

Hi all,

I’m using Maaslin2 to analyze the output from Metaphlan3. Below is the relative abundance table of the species and metadata.

Species_relab.txt (95.9 KB)

Metadata.txt (637 Bytes)

Then in R,

relab ← read.table(“Species_relab.txt”, header = TRUE, quote = “”, sep = “\t”, row.names = 1, stringsAsFactors = FALSE)

metadata ← read.table(“Metadata.txt”, header=TRUE, sep="\t", row.names=1, stringsAsFactors=FALSE)

fit_data ← Maaslin2(
relab, metadata,‘output’, transform = “AST”,
fixed_effects = c(‘Treatment’),
normalization = ‘NONE’,
standardize = FALSE)

Output for all the features,
all_results.tsv (94.8 KB)

Taking Faecalibacterium prausnitzii as an example, the number of non-zero sample is only 6. However, if checking the input Species_relab.txt, the actual count of non-zero sample is 52.

metadata feature value coef stderr N N.not.0 pval qval
Treatment Faecalibacterium_prausnitzii B -0.511171192 0.114657358 52 6 0.021009755 0.802131021
Treatment Faecalibacterium_prausnitzii C -0.380650407 0.14503136 52 6 0.078688933 0.802131021

After checking all the 450 species, actually there are 126 discrepancies.
I would like to ask why there is a discrepancy in the count of non-zero sample between the actual input file and output file? Any ideas would be highly appreciated.

Thank you!

Hi @Claire ,

The issue here is that you are on the scale 0-100 (relative abundance) for AST transformation the data needs to be in the 0-1 scale. So currently when MaAsLin AST transforms the data points in 0-1 are non-zero “transformed”, but anything above 1 is converting to a NaN. You should see that there were a lot of warnings after MaAsLin runs. We are currently working on getting MaAsLin to throw an error instead of letting AST transformation run when the underlying data isn’t in the 0-1 scale. Switching to a log transformation or converting your data frame to 0-1 - should solve the issues you are seeing.

Sorry for the confusion - I hope this helps!



If I’m not mistaken, entering relative abundances with very low values (e.g., 0.0003) cannot be detected by the model and counted in the results column N.not.zero, even if min_abundance = 0, min_prevalence = 0 and transform the data with AST (input_data: scale 0-1). However, I believe that by introducing counts the model can account for all of them correctly.

Do you consider it acceptable to run the analysis with counts and then plot the raw data with relative abundance (%) so as not to lose information?

Thank you very much for your work.

All the best,