Building a MetaPhlAn database from fasta sequences

aitor.blancomiguez · March 13, 2023, 8:46am

Hi @ecalfapietra
Currently, there is not script to generate a new metaphlan 4 database from scratch.
But if you are interested on doing it, the procedure goes as follows:

Classify your genomes into species-level genome bins (SGB) by clustering them at 95% genome identity
For each SGB, annotate the FASTA sequences and define a set of core gene families (clustering the CDS at 90% identity)
Map all the core gene families against the initial set of genomes to define SGB-specific and unique set of marker genes
For a deeper explanation, you can have a look at the m&m of the metaphlan 4 paper: Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4 | Nature Biotechnology

Topic		Replies	Views
Request tutorial on generating a custom database for MetaPhlAn 4 from scratch (not updating the existing database) MetaPhlAn	1	689	March 23, 2023
MetaPhlan3-Customizing the database MetaPhlAn	1	459	February 21, 2023
How to create marker sequences from a genome to add to metaphlan database? MetaPhlAn	3	1811	October 22, 2020
Building a custom marker genes DB for running Metaphlan MetaPhlAn	1	459	July 18, 2022
MetaPhlAn - help with customising database MetaPhlAn	1	285	May 16, 2023

Building a MetaPhlAn database from fasta sequences

Related topics