Unable to use MetaPhlAn-4 database

Dear Metaphlan developers,
I have problems downloading and using MetaPhlAn-4 database.
I downloaded Metaphlan using conda, then I tried to install the database using:

metaphlan --install--bowtie2db [my folder]

I then got an error:

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest
Downloading file of size: 0.00 MB
0.01 MB 25600.00 %  43.29 MB/sec  0 min -0 sec
Downloading MetaPhlAn database
Please note due to the size this might take a few minutes

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.tar
Downloading file of size: 2623.07 MB
2623.05 MB 100.00 %  261.05 MB/sec  0 min  0 sec
Warning: Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.tar

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.md5
Downloading file of size: 0.00 MB
MD5 checksums do not correspond! If this happens again, you should remove the database files and rerun MetaPhlAn so they are re-downloaded

After trying multiple times, I downloaded the files manually from
Index of /biobakery4/metaphlan_databases, extracted the files, and got four files:

mpa_vJan21_CHOCOPhlAnSGB_202103.pkl
mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna.bz2
mpa_vJan21_CHOCOPhlAnSGB_202103_VINFO.csv
mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna.bz2

I built the larger one (SGB) with bowtie2.
The output is as follows:

./mpa_vJan21_CHOCOPhlAnSGB_202103.md5
./mpa_vJan21_CHOCOPhlAnSGB_202103.pkl
./mpa_vJan21_CHOCOPhlAnSGB_202103.tar
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.1.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.2.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.3.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.4.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.rev.1.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.rev.2.bt2l
./mpa_vJan21_CHOCOPhlAnSGB_202103_VINFO.csv
./mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna
./mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt
./mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt

I then used this code:

metaphlan example.fq.gz --input_type fastq -o example.txt --index mpa_vJan21_CHOCOPhlAnSGB_202103_SGB --bowtie2db ~/metaphlan4_db/

I get an error:

Error: Unable to find the mpa_pkl file at: mpa_pklExiting...

Maybe the problem is that the name of the pkl file is not similar to the index I used (mine has a suffix of _SGB), but by removing the suffix, it doesn’t recognize the database at all.
Also, why couldn’t I install the database in the first place?
I’d be happy for your help.

Thanks in advance!

1 Like

Hi @Afromm
Exactly, the name of the pkl file and the bt2 indexes should be the same.

Hi @Afromm,
I have the same problem. Did you find a solution in the meantime?

Best,
Ilaria

Hi @ilapt
Both pkls file and bt2 index should have the same name

helo, Aitor. I have downloaded the metaphlan4 database as follows:
mpa_vOct22_CHOCOPhlAnSGB_202212.1.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212.rev.2.bt2l
mpa_vOct22_CHOCOPhlAnSGB_202212.2.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_SGB.fna.bz2
mpa_vOct22_CHOCOPhlAnSGB_202212.3.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_species.txt
mpa_vOct22_CHOCOPhlAnSGB_202212.4.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_VINFO.csv
mpa_vOct22_CHOCOPhlAnSGB_202212_marker_info.txt mpa_vOct22_CHOCOPhlAnSGB_202212_VSG.fna.bz2
mpa_vOct22_CHOCOPhlAnSGB_202212.pkl
mpa_vOct22_CHOCOPhlAnSGB_202212.rev.1.bt2l

I run humann3 " humann --input C1.fq --output ./ --threads 10", then recieved the metaphlan error:

Error message returned from metaphlan :
Error: Unable to find the mpa_pkl file at: mpa_pklExiting…

CRITICAL ERROR: Error executing: metaphlan C1.fq -t rel_ab -o C1_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out C1_metaphlan_bowtie2.txt --nproc 10

Hi,

I am also facing a similar issue. So, I have tried metaphlan --install to download the databases, it did not work properly. So I have downloaded them using wget option. the indexes, bt2.tar and bt2.md5 and mpa_vJun23_CHOCOPhlAnSGB_202307.tar and md5 files. I downloaded them outside of miniconda3 to my own folder.

Then, when I ran humann3: this is my code.

path/to/humann --input fastq.gz --protein-database path/to/humann3_databases/uniref --threads 16 --search-mode uniref50 --nucleotide-database path/to/humann3_databases/chocophlan --bowtie2 path/to/miniconda3/bin/ --metaphlan-options “–bowtie2db path/to/metaphlan_databases/” --metaphlan-options “–index path/to/metaphlan_databases/mpa_vJun23_CHOCOPhlAnSGB_202307” --output X_uniref50.trimmed_humann3

So, it stops during running metaphlan. and it deletes all the md5 and tar files from the database folder. not running aside, why does it deletes the files from the database folder. Do I need to re-download them each time I ran humann? Also, why does it stop? What am I doing wrong? Can someone please help?