List of taxa in Metaphlan4 database

The docs at MetaPhlAn 4 · biobakery/MetaPhlAn Wiki · GitHub don’t seem to link to a list of all taxa in the MetaPhlAn-4 database. This would be helpful for situations where the researcher wants to know whether Metaphlan4 includes their target microbe(s) of interest

Hi @nick-youngblut
Thanks for noticing it, we had added the info in the wiki, and you can find the file here: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt.bz2

Thanks @aitor.blancomiguez for the link! It’s a bummer that the species I’m most interested in is not in that database, but it is a newly characterized species.

Hi @nick-youngblut
Because of the SGB definition, it might be that the species you are interested in is part of a SGB in the database but not the main one (i.e. the species with more ref. genomes in the SGB). I will create a file with all the taxonomic species contained in each SGB and upload it to the documentation too. In the meanwhile, if you want to check it, parsing the metaphlan database (pickle file) with python and inspecting the merged_taxon dictionary will give you all the additional species.

1 Like

Hi @nick-youngblut
The full list of the species included in each SGB is already available in the documentation. Here the file: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.bz2

Thanks for the file! I care about methanogens in the gut. It’s odd that the most abundant methanogen in the human gut (M. smithii) isn’t in your species file. The Methanobrevibacter species present:

s__Methanobrevibacter_thaueri
s__Methanobrevibacter_sp_NOE
s__Methanobrevibacter_wolinii
s__Methanobrevibacter_millerae
s__Methanobrevibacter_cuticularis
s__Methanobrevibacter_arboriphilus
s__Methanobrevibacter_filiformis
s__Methanobrevibacter_curvatus
s__Methanobrevibacter_olleyae
s__Methanobrevibacter_ruminantium
s__Methanobrevibacter_sp_AbM4
s__Methanobrevibacter_sp_87_7
s__Methanobrevibacter_sp_A54
s__Methanobrevibacter_woesei
s__Methanobrevibacter_sp_A27
s__Methanobrevibacter_sp_YE315
s__Methanobrevibacter_millerae
s__Methanobrevibacter_oralis

Also, my target microbe isn’t there (sad)

M. smithii is present there


bzgrep s__Methanobrevibacter_smithii mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.bz2 [10:17:28]
SGB714_group k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_smithii,k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_sp_A54

Hi @nick-youngblut
M. smithii is actually in the file:
SGB714_group k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_smithii,k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_sp_A54
I’m really sorry your target species is not currently part of the database. For MetaPhlAn 4, we plan to release a new version of the database (roughly) every 6 months, so if your species currently have a reference genome in NCBI, it will probably be available in the next version.

My bad, I didn’t understand the format of the file

How about genus Howardella? It was defined in 2007 but appears to be missing from both lists. In 16S sequencing data of the same ecosystem, I see Howardella is detected in one-quarter of the samples but zero in MetaPhlAn metagenomics results. Why would Howardella ureilytica not be in the database yet?

Hi @Dario
The MetaPhlAn 4 database only includes reference genomes and taxonomies from the NCBI database. Currently, the database do not contain any Howardella assembly No items found - Assembly - NCBI

Good day!

Is there any way to identify the organism’s species name with a given SGB (for example, SGB35831)?

HI @sasakitomiyano
The assignment between species and SGBs is available in the following files for the releases:

Thank you for this! However, I can’t seem to find the scientific name for the species (e.g., Bacillus anthracis). Or, does it not provide this information (only SGBs will be provided)?

Hi @sasakitomiyano
For B. anthracis, you should look for the string: s__Bacillus_anthracis
E.g. In the oct22 db belongs to the SGB7703_group

I see. However, how should I address the assignment between SGB and species if the species was not listed (for example: SGB35831)?

Thank you!

The SGB35831 is a really uncharacterized species purely defined by mags. It is more than 30% nucleotide identity far from any reference genomes we have in the database, so we can only assign its taxonomy up to the phylum level (Bacteroidetes). For lower taxonomic levels, we can only assign an ID based on genomic clustering based on genomic distances