Building a MetaPhlAn database from fasta sequences

ecalfapietra · March 9, 2023, 3:52pm

Hello !
I would like to compare the Kraken2, SRA STAT and MetaPhlAn 4.0 tools, and to do this I would like to use the same database. I built my Kraken2 and SRA STAT databases from fasta files (from RefSeq, obtained after the Kraken2 database was created).
I did see the docs on GitHub and discussions like Customizing Chochophlan panproteome and Metaphlan marker gene databases with new taxa , How to create marker sequences from a genome to add to metaphlan database? and [BUG] database installation error · Issue #103 · biobakery/MetaPhlAn · GitHub.
Is there a pipeline or script to create your own MetaPhlAn 4.0 database from fasta sequences? Or is there something like it?
Thanks in advance!

aitor.blancomiguez · March 13, 2023, 8:46am

Hi @ecalfapietra
Currently, there is not script to generate a new metaphlan 4 database from scratch.
But if you are interested on doing it, the procedure goes as follows:

Classify your genomes into species-level genome bins (SGB) by clustering them at 95% genome identity
For each SGB, annotate the FASTA sequences and define a set of core gene families (clustering the CDS at 90% identity)
Map all the core gene families against the initial set of genomes to define SGB-specific and unique set of marker genes
For a deeper explanation, you can have a look at the m&m of the metaphlan 4 paper: Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4 | Nature Biotechnology

Ali_Rahnavard · June 28, 2024, 6:37pm

Is there a script to use MetaPhlAn and regenerate marker genes? Alternatively, is there a place to access the raw DNA data of marker genes?

Thanks!
Ali

Topic		Replies	Views
Request tutorial on generating a custom database for MetaPhlAn 4 from scratch (not updating the existing database) MetaPhlAn	1	737	March 23, 2023
MetaPhlan3-Customizing the database MetaPhlAn	1	470	February 21, 2023
How to create marker sequences from a genome to add to metaphlan database? MetaPhlAn	3	1855	October 22, 2020
Building a custom marker genes DB for running Metaphlan MetaPhlAn	1	471	July 18, 2022
MetaPhlAn - help with customising database MetaPhlAn	1	305	May 16, 2023

Building a MetaPhlAn database from fasta sequences

Related topics