I wonder if a user can control the strictness of taxonomy profiling when using metaphlan3 for species detection. Can I make the software identify more species at the cost of less accuracy? There seems to be a parameter called ‘stat_q’ that determines how many markers to use. Would tuning this parameter be the only way to achieve my goal? Or should I touch the parts from the bowtie aligner?
Are different pangenome databases used in MetaPhlAn3 and HUMAnN3? I know that MetaPhlAn3 is embedded in the pipeline of HUMAnN3. However, the number of species included in the pangenome database(chocophlan) obtained when installing HUMAnN3 seems to be different from what I’ve read in the MetaPhlAn3 paper.
Internally, Bowtie2 is run using the --very-sensitive preset, but if you want to tweak the parameters, you can align your metagenome to the MetaPhlAn markers with the changed parameters and then use the SAM output file as input for MetaPhlAn (keep in mind that if you use a SAM file as input you need to specify the size of the metagenome in reads using the --nreads parameter.
You can also control the number of markers considered in the robust average by changing the stat_q, by lowering the value you will consider more markers (e.g. a stat_q value of 0.05 will consider reads between the 5th and 95th percentile) but this can result in having more false positives.
The species are overlapping but for some of them, identifying the pangenome using the ChocoPhlAn pipeline resulted in a failure due to annotation problems. Also, viruses are not included in the HUMAnN pangenomes (see HUMAnN Is Ignoring Viruses - #6 by franzosa)