Unable to use MetaPhlAn-4 database

Dear Metaphlan developers,
I have problems downloading and using MetaPhlAn-4 database.
I downloaded Metaphlan using conda, then I tried to install the database using:

metaphlan --install--bowtie2db [my folder]

I then got an error:

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest
Downloading file of size: 0.00 MB
0.01 MB 25600.00 %  43.29 MB/sec  0 min -0 sec
Downloading MetaPhlAn database
Please note due to the size this might take a few minutes

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.tar
Downloading file of size: 2623.07 MB
2623.05 MB 100.00 %  261.05 MB/sec  0 min  0 sec
Warning: Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.tar

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.md5
Downloading file of size: 0.00 MB
MD5 checksums do not correspond! If this happens again, you should remove the database files and rerun MetaPhlAn so they are re-downloaded

After trying multiple times, I downloaded the files manually from
Index of /biobakery4/metaphlan_databases, extracted the files, and got four files:

mpa_vJan21_CHOCOPhlAnSGB_202103.pkl
mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna.bz2
mpa_vJan21_CHOCOPhlAnSGB_202103_VINFO.csv
mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna.bz2

I built the larger one (SGB) with bowtie2.
The output is as follows:

./mpa_vJan21_CHOCOPhlAnSGB_202103.md5
./mpa_vJan21_CHOCOPhlAnSGB_202103.pkl
./mpa_vJan21_CHOCOPhlAnSGB_202103.tar
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.1.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.2.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.3.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.4.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.rev.1.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.rev.2.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_VINFO.csv
./mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna
./mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt
./mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt

I then used this code:

metaphlan example.fq.gz --input_type fastq -o example.txt --index mpa_vJan21_CHOCOPhlAnSGB_202103_SGB --bowtie2db ~/metaphlan4_db/

I get an error:

Error: Unable to find the mpa_pkl file at: mpa_pklExiting...

Maybe the problem is that the name of the pkl file is not similar to the index I used (mine has a suffix of _SGB), but by removing the suffix, it doesn’t recognize the database at all.
Also, why couldn’t I install the database in the first place?
I’d be happy for your help.

Thanks in advance!

1 Like

Hi @Afromm
Exactly, the name of the pkl file and the bt2 indexes should be the same.

Hi @Afromm,
I have the same problem. Did you find a solution in the meantime?

Best,
Ilaria

Hi @ilapt
Both pkls file and bt2 index should have the same name

helo, Aitor. I have downloaded the metaphlan4 database as follows:
mpa_vOct22_CHOCOPhlAnSGB_202212.1.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212.rev.2.bt2l
mpa_vOct22_CHOCOPhlAnSGB_202212.2.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_SGB.fna.bz2
mpa_vOct22_CHOCOPhlAnSGB_202212.3.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_species.txt
mpa_vOct22_CHOCOPhlAnSGB_202212.4.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_VINFO.csv
mpa_vOct22_CHOCOPhlAnSGB_202212_marker_info.txt mpa_vOct22_CHOCOPhlAnSGB_202212_VSG.fna.bz2
mpa_vOct22_CHOCOPhlAnSGB_202212.pkl
mpa_vOct22_CHOCOPhlAnSGB_202212.rev.1.bt2l

I run humann3 " humann --input C1.fq --output ./ --threads 10", then recieved the metaphlan error:

Error message returned from metaphlan :
Error: Unable to find the mpa_pkl file at: mpa_pklExiting…

CRITICAL ERROR: Error executing: metaphlan C1.fq -t rel_ab -o C1_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out C1_metaphlan_bowtie2.txt --nproc 10