Origin clade specific marker genes

Hi, I was wondering if the clade specific marker genes were genes for one taxonomic level (species) or if there are also marker genes for higher taxonomic levels. I remember that in the articles its specified that the latter is true.

“Such marker genes are chosen so that essentially all of the strains in a clade (species or otherwise) possess such genes, and at the same time no other clade contains homologs close enough to incorrectly map metagenomic reads.”

“we identified more than 2 million potential markers from which we selected a subset of 400,141 genes most representative of each taxonomic unit (Online Methods). The resulting catalog spans 1,221 species with 231 (s.d. 107) markers per species and >115,000 markers at higher taxonomic levels”

However, in the metaphlan database marker info file ‘mpa_v30_CHOCOPhlAn_201901_marker_info’ I could not find any higher taxonomic level marker genes. (I searched for [‘clade’: 'g_])

I am most likely not finding them, so I would appreciate any clarification on this.

Kind regards,
Moelong Yu

Hi @Moellie
I understand the confusion. The text you are citing here is from the original MetaPhlAn paper (version 1) from 2012. In the current MetaPhlAn version (version 3), only the species-level markers are being used. Please, have a look at this manuscript: Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 | eLife

1 Like