This is mentioned in the Nature Methods paper it’s based on but doesn’t seem to be referenced elsewhere.
Can someone provide me more details? E.g. where did the genome size estimates come from, was it done separately for each clade or at a chosen taxonomic rank. That paper was from 2012, is it still the done thing? Thanks!
the genome size estimation is done at each taxonomic rank when the taxonomy tree is built and used the relative abundance are computed.
The size of all the genomes used for identifying the marker genes is included in the pkl file, it’s the value of each entry present under the ‘taxonomy’ key of the dict.
In the current MetaPhlAn implementation, we estimate the average genome length only at the species level in order to avoid the overestimation of the size at upper taxonomic levels.