Building a custom marker genes DB for running Metaphlan

Hi bioBakery crew,

I’ve been running some tests with Metaphlan to profile marine metagenomes, but the tests return 0 hits (e.g. profiled_metagenome.txt: UNKNOWN -1 100.0). It thus seems obvious to conclude that the marker genes DB doesn’t properly cover marine microbiomes. Is that correct?

Also, I recently built a large plankton-specific DB (UniRef90 annotations) which has proven quite effective at profiling plankton communities using Humann 3.0. This DB was previously clustered using Linclust. The question that arises now is: how would you extract marker genes from such a customized DB in order to expand the Chocophlan DB used by Metaphlan?

Any input would be greatly appreciated!


Hi @jagut
Thanks for getting in touch. Generating a custom marker database will require two main steps:

  1. Select core genes for each of the species spanned in your genome database.
  2. Map all the core genes against all the available genomes to define species specific marker genes (core genes not present in other species).
    You can have a look at the biobakery 3 paper for more detailed info: Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 | eLife