Understanding Parameters (stat_q) for Environmental sample

I used the default parameters for metaphlan3 to analyze trimmed metagenomic reads from a glacial sample. I do not get many hits (wc -l profiled_metagenome.txt == 31) and they are dominated (97%) by cyanobacteria, which I do not think corresponds to the composition of the sample very well. I have seen mentioned in other threads to lower --stat_q and --min_mapq_val for environmental samples. I understand --min_mapq_val , and I think I do not want to lower that much. I do not understand --stat_q. Can anyone explain stat_q to me? Does anyone have any thoughts on minimum values for these parameters for which the results would still be trustworthy?

Or maybe the compositioin of my sample is just not well represented in the marker gene database? I do not think data quality is the problem because assembly and binning went very well.

Finally, can I use my bowtie2 output from the default run as an input when I change parameters? Or do these parameters affect the creation of the bowtie2 output?

Thank you!

1 Like

According to the quantile value chosen, the stat_q value, the markers taken into consideration for the relative abundance calculation are the one falling between the 20th and 80th percentile.The default value was chosen as 0.2 as a tradeoff in order to reduce the FP, decreasing it you’ll be able to detect more species but you will also increase the risk of having false positives.

It is a possibility, we have expanded a lot the genome catalog, but still environmental species are very scarce.

Yes! You can run the first time with default parameters and then use the bowtie2out file as input when the parameters are changed.

1 Like

Thank you @fbeghini !