The bioBakery help forum

How to create marker sequences from a genome to add to metaphlan database?

Hi All,
I was wondering how can I add a new genome to the metaphlan database. I am following these steps from the tutorial but it is not clear how to generate marker sequences from the query genome that need to be stored in a file called new_marker.fasta

Customizing the database

In order to add a marker to the database, the user needs the following steps:

  1. Reconstruct the marker sequences (in fasta format) from the MetaPhlAn2 bowtie2 database by:
    #!bash
    bowtie2-inspect metaphlan2/databases/mpa_v20_m200 > metaphlan2/markers.fasta

  2. Add the marker sequence stored in a file new_marker.fasta to the marker set:
    #!bash
    cat new_marker.fasta >> metaphlan2/markers.fasta

  3. Rebuild the bowtie2 database:

Thanks for your support,

Juliana

Hi Juliana,
In order to add a new genome to the MetaPhlAn database you need to annotate the reference genome and identify marker genes which usually are genes that are core genes for the species and unique for the species (no other species included in the database should share the same gene).
I’ll refer you to this issue on the GitHub repository for more details https://github.com/biobakery/MetaPhlAn/issues/103

Thanks for the explanation. Another question: It seems that version 3.0 has ~ 110 eukaryotic reference genomes but there is none belonging to Pichiaceae, fungal family. Is there a selection criteria that you use to select reference genomes to build the database? Just wondering because it could bias the results of the taxonomic annotation

Thanks,

The genomes included are the one having an annotated reference genome in the UniProt Proteomes portal. To date I see that are available 10 genomes, but at the time the database was created, no one was present.