Discrepancy Between Input Reads and Estimated Read Counts in MetaPhlAn 4

Uzma_Basit_Khan · July 2, 2025, 1:25pm

Hi all,

I noticed something while reviewing the data for one of my samples (49915_2_156), which is paired-end and contains no singleton reads. I ran it through MetaPhlAn 4.0.6using the -t rel_ab_w_read_stats option and the updated 2024 MetaPhlAn database(#mpa_vJun23_CHOCOPhlAnSGB_202403).

The MetaPhlAn output reports that 137,492 reads were processed, yet the estimated read counts mapped/unmapped to taxa sum to 1,554,273 for this sample.

I was expecting these numbers to be closer. Does MetaPhlAn count individual reads (rather than read pairs), and is it normal for the estimated read count to significantly exceed the number of processed reads? I’d really appreciate an explanation for this disparity between the input and output read counts.

Thanks in advance!
Uzma

Claudia_Mengoni · July 2, 2025, 2:58pm

Hi @Uzma_Basit_Khan

The number of reads assigned to each species is just an estimation based on the avg coverage of the markers and the median length of the species’ genomes, so it is expected that the number of processed reads does not match the estimated count. However, usually it is closer estimation. Did you perhaps consider different taxonomic levels when summing up to 1,554,273? You should only consider rows from the same taxonomic level, because each level is representing the same information, just at different depth and each level will sum up to the same amount e.g. to count reads assigned to SGBs: cat profile.tsv | grep ‘t__’ | cut -f5
Let me know if this was the issue, if not could you share the profile?
Thanks

Topic		Replies	Views
Metaphlan output question MetaPhlAn	4	910	August 25, 2022
Metaphlan4 estimated_number_of_reads MetaPhlAn	1	343	March 5, 2024
Metaphlan4 discordance between relative_abundance and estimated_number_of_reads MetaPhlAn	6	737	October 26, 2022
Question about the unclassified estiamtion MetaPhlAn	1	8	July 2, 2025
Metaphlan output estimated reads mapped to known clades StrainPhlAn	1	427	August 12, 2022

Discrepancy Between Input Reads and Estimated Read Counts in MetaPhlAn 4

Related topics