Genome size normalization in Metaphlan2

fearry · March 5, 2020, 4:34am

Hello,
This is mentioned in the Nature Methods paper it’s based on but doesn’t seem to be referenced elsewhere.

Can someone provide me more details? E.g. where did the genome size estimates come from, was it done separately for each clade or at a chosen taxonomic rank. That paper was from 2012, is it still the done thing? Thanks!

fbeghini · April 6, 2020, 4:14pm

Hi,
the genome size estimation is done at each taxonomic rank when the taxonomy tree is built and used the relative abundance are computed.
The size of all the genomes used for identifying the marker genes is included in the pkl file, it’s the value of each entry present under the ‘taxonomy’ key of the dict.
In the current MetaPhlAn implementation, we estimate the average genome length only at the species level in order to avoid the overestimation of the size at upper taxonomic levels.

Best

Topic		Replies	Views
Metaphlan3 relative abundance MetaPhlAn	14	7403	June 9, 2025
Obtain taxonomy for marker_ab_table MetaPhlAn	0	292	January 15, 2022
Origin clade specific marker genes MetaPhlAn	1	431	July 18, 2022
About the Metaphlan category MetaPhlAn	1	842	July 27, 2022
Output as counts not proportions MetaPhlAn	4	1098	July 22, 2020

Genome size normalization in Metaphlan2

Related topics