List of Taxonomy IDs in Metaphlan database

Is there a file where I can find the list of Taxonomy IDs that are included in the Metaphlan database? More specifically I would need the list of Taxonomy IDs at species level that are present in the database. The file [mpa_vJun23_CHOCOPhlAnSGB_202307_species.txt.bz2] only contains scientific names, so is there an easier way than extracting all names from this file at species level and then finding the corresponding Taxonomy IDs (where there is any at all) with the ncbi api?

Hi @dudamate
Yes, it is possible. You can use the pickle python library to read the metaphlan database (mpa*.pkl). The database is a dictionary and in the ‘taxonomy’ key you can find all the info of each SGB present in the db.
E.g.

import pickle
import bz2

db = pickle.load(bz2.open('metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl', 'r'))
for taxa in db['taxonomy']:
    print(taxa, db['taxonomy'][taxa][0])

2 Likes