Which parameters to tweak to improve abundance calculations?

We recently analysed a reference standard of known composition using MetaPhlAn3 and found that the relative abundance values for many of the taxa deviated quite a bit from the ground truth. We believe this is a MetaPhlAn3 problem as the same data analysed by a commercial provider using a different tool yielded abundance values much closer to the ground truth. On the flip side, MetaPhlan3’s false positive and false negative rates were excellent.

Which MetaPhlAn3 parameters can / should we tweak to improve the software’s abundance estimates? I understand any tweaking may have detrimental effects on false positive and false negative rates.

Decreasing the value of the stat_q parameter will allow more markers to be considered when calculating relative abundances. You can also remove the filter on the MAPQ value by setting min_mapq_val to -1 so no reads are removed because of multiple mapping. There is another thread here https://forum.biobakery.org/t/understanding-parameters-stat-q-for-environmental-sample/2204 in which I explain what stat_q is.

