Metaphlan database problem

ZhaoHuiyao · January 3, 2025, 3:50am

Hello

By referring to the metaphlan.py file, Metaphlan run requires two important database files: a .pkl file and a Bowtie2 index file.

I downloaded the file “mpa_vOct22_CHOCOPhlAnSGB_202403.tar”(http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202403.tar), and decompress, it includes a.pkl file, _SGB.fna.bz2, _VINFO.csv, and _VSG.fna.bz2.

The _SGB.fna.bz2 file contains 7,339,971 sequences.

But, in .pkl file, Its information is inconsistent with that of the previous _SGB.fna.bz2 file.

import pickle
import bz2

# open pkl file
db = pickle.load(bz2.open('mpa_vOct22_CHOCOPhlAnSGB_202403.pkl', 'r'))
db.keys()
#dict_keys(['taxonomy', 'markers', 'merged_taxon'])

#taxonomy
count_taxa = 0
for taxa in db['taxonomy']: count_taxa = count_taxa + 1
print(count_taxa)					#30216 species

#markers
count_markers = 0
for marker in db['markers']: count_markers = count_markers + 1
print(count_markers)				#5751328 marker gene
# in _SGB.fna.bz2 file contains 7,339,971 sequences.

So, please check this tar file.

Claudia_Mengoni · January 17, 2025, 1:52pm

Hi @ZhaoHuiyao

I downloaded the tar file to check and the mpa_vOct22_CHOCOPhlAnSGB_202403.fna.bz2 file has 5,843,065 sequences, which correspond to the 5,751,328 present in the pkl file plus viral sequences which are not in the pickle (and shouldn’t be). Could you please repeat your download and let me know if the problem persists?

ZhaoHuiyao · January 23, 2025, 9:25am

Topic		Replies	Views
Problem with adding a new marker to Metaphlan DB MetaPhlAn	4	806	April 12, 2021
How do I create the .pkl required to run metaphlan? MetaPhlAn	2	470	February 8, 2023
CRITICAL ERROR: Unable to find the MetaPhlAn python package. Please check your install MetaPhlAn	0	60	June 12, 2024
Error：customizing the database MetaPhlAn	4	662	January 6, 2022
Inquiry regarding abnormal taxonomy in MetaPhlAn4 DB MetaPhlAn	2	305	December 20, 2022

Metaphlan database problem

Related topics