MetaPhlAn 4.1 NCBI id mapping

Using Humann v3.9 and Metaphlan v4.1

I am trying to get a hold of the corresponding complete NCBI taxa id numbers for all the SGB’s in the Metaphlan 4.1 database (I know some will not have corresponding ID’s and some will be duplicates). Is there a file with this mapping on the Github or part of the downloaded software?

Hi @sarahbald,

When you download the MetaPhlAn database you get a pickle file (e.g. mpa_vJun23_CHOCOPhlAnSGB_202307.pkl) which is a dictionary which also includes this information. You can use python and read the pickle file like this:

path_db = ‘./mpa_vOct22_CHOCOPhlAnSGB_202212.pkl’
db = pickle.load(bz2.open(path_db,‘rb’))
for taxa in db[‘taxonomy’]: # taxa is the full taxonomy for each SGB…
ncbi_tax_id= db[‘taxonomy’][taxa][0]