How extract marker genes from MAGs?

almogangel · June 11, 2020, 1:45pm

Hi, thank you very much for this wonderful tool.

I would like to identify marker genes from clusters of genomes in order to create a customized Metaphlan database.

I have looked in Segata’s paper from 2012 and found the following key steps:

Identification of clade-specific core genes:

Identify non-redundant genes from each genome of the species
Cluster NR genes based on 75% nucleotide sequence identity threshold
Realign clade-specific gene families against the raw genomes
Compute the posterior probability density function (using the beta distribution)

Screening of core genes for unique taxonomic marker genes:
5. Exclude core genes that are not uniquely present in a clade
6. Exclude multi-copy genes if possible (from step 1)
7. Define “uniqueness index” for core-genes

My questions are:
A) Is there some script that you can share to help me run those steps?
B) If not (for question A), Can you explain the 4th and the 7th steps?

Thanks,
Almog.

fbeghini · June 12, 2020, 12:11pm

Hi Almog,
unfortunately, I don’t have any script for extracting markers from MAGs, we have an ad hoc pipeline for generating the MetaPhlAn database, but it’s not straightforwardly editable in order to handle MAGs since the data source is different. In the pyphlan repo (https://github.com/SegataLab/pyphlan), there’re present some script called choco_ but I don’t have any guidance for running/using them.

In the current version of the pipeline, we don’t take into account step 4. For a species A, the “uniqueness index” is calculated as the number of species besides A in which the marker is also present. This can be easily calculated by mapping all the markers identified to the full set of reference genomes.

nick-youngblut · July 2, 2020, 1:15pm

Can someone be contacted to understand how to use this script? The info at MetaPhlAn 3.0 · biobakery/MetaPhlAn Wiki · GitHub states how to add markers, but not how to add markers in a manner that matches the default database (eg., how to cluster).

Topic		Replies	Views
What is the logic for determining a marker gene? MetaPhlAn	1	210	March 5, 2024
Building a custom marker genes DB for running Metaphlan MetaPhlAn	1	457	July 18, 2022
Origin clade specific marker genes MetaPhlAn	1	415	July 18, 2022
MetaPhlan3-Customizing the database MetaPhlAn	1	452	February 21, 2023
Biobakery_workflows reference genomes MetaPhlAn	3	620	August 3, 2022

How extract marker genes from MAGs?

Related topics