Quality control of MetaPhlAn4 count table

yujeongheo · January 14, 2026, 9:27am

Hi there,

I’ve ran the MetaPhlAn4 with --ignore_eukaryotes --ignore_archaea -t rel_ab_w_read_stats options and got the absolute count table of the human gut microbiome.

For further analysis, I wonder if there is a well defined quality control pipelines.

Do you recommend quality control with raw read counts, such as removing samples or species with low read counts.

I guess the most common quality control of metagenomic data is the minimum relative abundance and prevalence. However, do you have any recommendation for a specific threshold, for example removing species with minimum relative abundance 0.0001 at more than 5% of the total samples. If not, how can we choose one that fits our own data?

Afterall, do you have any golden number for the number of species present in the gut microbiome?

Thanks in advance for your answers.

nickp60 · January 15, 2026, 3:00pm

Hi! I’m just a fellow user, but I’ll weigh in anyway. In our experience Metaphlan’s marker-based approach is pretty conservative. kmer based tools tend to have a long tail of rare bugs that may or may not be artifacts that need removal; Metaphlan doesn’t. I generally recommend people trust the results out of the box. If your sample has insufficient read depth (and you can understand this some by the coverage estimate you get running with --rel_ab_w_read_stats), you risk missing low-abundance organisms. However, the authors of Maaslin3 recommend accounting for this by including the read depth as a model feature, rather than making arbitrary filtering decisions beforehand.

There are no golden thresholds, but in our experience those threshold are more relevant for things like 16S analysis or kmer-based mgx profiles, both of which result in that long tail of low abundance and/or spurious taxa. Everything will depend on the analyses you are trying to perform, but I’d start with taking the results as-is.

Good luck!

Topic		Replies	Views
Can we add a column with the number of reads for each taxon next to relative_abundance? MetaPhlAn	7	598	July 25, 2025
Quality Check before metaphlan analysis MetaPhlAn	14	1580	November 27, 2020
MetaPhlAn2: no taxonomy annotated for some samples MetaPhlAn	3	492	September 24, 2020
MetaPhlan parameters for metatranscriptomics data MetaPhlAn	0	80	March 6, 2025
Should I remove very low abundance taxa from Metaphlan abundance output file? MetaPhlAn	0	349	August 8, 2020

Quality control of MetaPhlAn4 count table

Related topics