Taking Faecalibacterium prausnitzii as an example, the number of non-zero sample is only 6. However, if checking the input Species_relab.txt, the actual count of non-zero sample is 52.
metadata
feature
value
coef
stderr
N
N.not.0
pval
qval
Treatment
Faecalibacterium_prausnitzii
B
-0.511171192
0.114657358
52
6
0.021009755
0.802131021
Treatment
Faecalibacterium_prausnitzii
C
-0.380650407
0.14503136
52
6
0.078688933
0.802131021
After checking all the 450 species, actually there are 126 discrepancies.
I would like to ask why there is a discrepancy in the count of non-zero sample between the actual input file and output file? Any ideas would be highly appreciated.
The issue here is that you are on the scale 0-100 (relative abundance) for AST transformation the data needs to be in the 0-1 scale. So currently when MaAsLin AST transforms the data points in 0-1 are non-zero “transformed”, but anything above 1 is converting to a NaN. You should see that there were a lot of warnings after MaAsLin runs. We are currently working on getting MaAsLin to throw an error instead of letting AST transformation run when the underlying data isn’t in the 0-1 scale. Switching to a log transformation or converting your data frame to 0-1 - should solve the issues you are seeing.
If I’m not mistaken, entering relative abundances with very low values (e.g., 0.0003) cannot be detected by the model and counted in the results column N.not.zero, even if min_abundance = 0, min_prevalence = 0 and transform the data with AST (input_data: scale 0-1). However, I believe that by introducing counts the model can account for all of them correctly.
Do you consider it acceptable to run the analysis with counts and then plot the raw data with relative abundance (%) so as not to lose information?