List of taxa in Metaphlan4 database

The docs at MetaPhlAn 4 · biobakery/MetaPhlAn Wiki · GitHub don’t seem to link to a list of all taxa in the MetaPhlAn-4 database. This would be helpful for situations where the researcher wants to know whether Metaphlan4 includes their target microbe(s) of interest

Hi @nick-youngblut
Thanks for noticing it, we had added the info in the wiki, and you can find the file here: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt.bz2

Thanks @aitor.blancomiguez for the link! It’s a bummer that the species I’m most interested in is not in that database, but it is a newly characterized species.

Hi @nick-youngblut
Because of the SGB definition, it might be that the species you are interested in is part of a SGB in the database but not the main one (i.e. the species with more ref. genomes in the SGB). I will create a file with all the taxonomic species contained in each SGB and upload it to the documentation too. In the meanwhile, if you want to check it, parsing the metaphlan database (pickle file) with python and inspecting the merged_taxon dictionary will give you all the additional species.

1 Like

Hi @nick-youngblut
The full list of the species included in each SGB is already available in the documentation. Here the file: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.bz2

Thanks for the file! I care about methanogens in the gut. It’s odd that the most abundant methanogen in the human gut (M. smithii) isn’t in your species file. The Methanobrevibacter species present:

s__Methanobrevibacter_thaueri
s__Methanobrevibacter_sp_NOE
s__Methanobrevibacter_wolinii
s__Methanobrevibacter_millerae
s__Methanobrevibacter_cuticularis
s__Methanobrevibacter_arboriphilus
s__Methanobrevibacter_filiformis
s__Methanobrevibacter_curvatus
s__Methanobrevibacter_olleyae
s__Methanobrevibacter_ruminantium
s__Methanobrevibacter_sp_AbM4
s__Methanobrevibacter_sp_87_7
s__Methanobrevibacter_sp_A54
s__Methanobrevibacter_woesei
s__Methanobrevibacter_sp_A27
s__Methanobrevibacter_sp_YE315
s__Methanobrevibacter_millerae
s__Methanobrevibacter_oralis

Also, my target microbe isn’t there (sad)

M. smithii is present there


bzgrep s__Methanobrevibacter_smithii mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.bz2 [10:17:28]
SGB714_group k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_smithii,k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_sp_A54

Hi @nick-youngblut
M. smithii is actually in the file:
SGB714_group k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_smithii,k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_sp_A54
I’m really sorry your target species is not currently part of the database. For MetaPhlAn 4, we plan to release a new version of the database (roughly) every 6 months, so if your species currently have a reference genome in NCBI, it will probably be available in the next version.

My bad, I didn’t understand the format of the file