How to install the latest metaphlan reference DB?

Hi there:

The metaphlan software is very cool, which is very helpful to me. Here,I have a question about the ‘mpa_vJan21_CHOCOPhlAnSGB_202103’ installation. After I downloaded the ‘mpa_vjan21_chocophlansGB_202103.tar’ to the specified location, useing the github command " metaphlan --install --index mpa_vJan21_CHOCOPhlAnSGB_202103 -- Bowtie2db <The database folder> " to build the Metaphlan database.But it does not work:


The installation address of the Metaphlan database is as follows:

So I delete ‘mpa_vjan21_chocophlansGB_202103.fna’ ,and run the metaphlan --install command again:

But metaphlan does not generate the .bt2 file(like mpa_v31_CHOCOPhlAn_201901.1.bt2),so how to build the mpa_vJan21_CHOCOPhlAnSGB_202103 (the latest metaphlan reference DB) ?

Any help would be appreciated!
Thanks!

Why did it create mpa_v31_CHOCOPhlAn_201901.1.bt2 for you?
For me it is still installing mpa_v30_CHOCOPhlAn_201901
because that is set in: http://cmprod1.cibio.unitn.it/biobakery3/metaphlan_databases/mpa_latest

The version that you mentioned I found here:
http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/

Hi @shengxinf ,
Thanks for getting in touch. The code to use the mpa_vJan21 version of the MetaPhlAn database (i.e MetaPhlAn 4) is not available yet, we hope to release the code in the following weeks. In the meanwhile, the last stable version of MetaPhlAn is mpa_v3.0. For a correct installation of this version, you should just run metaphlan --install

I have instaled in a conda environment MetaPhlAn version 4.0.2 (22 Sep 2022) and humann v3.5. I try to run the code

humann --input demo.fastq.gz --output demo_fastq --threads 4

but I get the same error “No MetaPhlAn BowTie2 database found (–index option)!” with diferent metaphlan databases: mpa_v30_CHOCOPhlAn_201901
mpa_v31_CHOCOPhlAn_201901

In case of mpa_vJan21_CHOCOPhlAnSGB_202103 metaphlan --install create bt2l.tmp files instead of bt2.

If I downgrade to metaphlan 3.1 and humann 3.1 problems disapeared

Hi @imontero
The bowtie2 database for mpa_vJan21 is a large database, so the output format is bt2l instead of bt2. If you were still seeing the .tmp files, it might mean that you didn’t finish building the database correctly.
My suggestion would be to remove the mpa_vJan21_CHOCOPhlAnSGB_202103
files and try to redownload the db again with metaphlan --force_download

metaphlan --install --force_download --bowtie2db ./ --nproc 4

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest
Downloading file of size: 0.00 MB
0.01 MB 25600.00 % 130.03 MB/sec 0 min -0 sec
Downloading MetaPhlAn database
Please note due to the size this might take a few minutes

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.tar
Downloading file of size: 2623.07 MB
2623.07 MB 100.00 % 12.46 MB/sec 0 min -0 sec
Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.md5
Downloading file of size: 0.00 MB
0.01 MB 11702.86 % 64.38 MB/sec 0 min -0 sec

Decompressing ./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna.bz2 into ./mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna

Decompressing ./mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna.bz2 into ./mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna

Joining FASTA databases

Building Bowtie2 indexes
Fatal error running ‘bowtie2-build --quiet --threads 4 -f ./mpa_vJan21_CHOCOPhlAnSGB_202103.fna ./mpa_vJan21_CHOCOPhlAnSGB_202103’
Error message: ‘Command ‘[‘bowtie2-build’, ‘–quiet’, ‘–threads’, ‘4’, ‘-f’, ‘./mpa_vJan21_CHOCOPhlAnSGB_202103.fna’, ‘./mpa_vJan21_CHOCOPhlAnSGB_202103’]’ returned non-zero exit status 247.’

Hi @imontero
By your error, it seems you ran out of resources (probably RAM) when generating the indexes. MetaPhlAn 4 database significantly increased with respect to version 3 and typically requires around 16-20GB or RAM

Thank you for your advice. It is strage because my PC has 32Gb of RAM, however I will upgrade my PC to 64GB soon. I will try them.

I also ran into the same sudden error when executing bowtie2-build for MetaPhlAn4 with the process exit status 247.
I had around 28 GB of RAM available for a virtual machine, though suddenly the process terminated.
Bowtie2 has not finished and left me with the following files:

(base) bernhard@system metaphlan_db % ls -lah
total 109544408
drwxr-xr-x  30 bernhard  staff   960B 24 Okt 13:03 .
drwxr-xr-x@ 10 bernhard  staff   320B 22 Okt 22:05 ..
-rw-r--r--@  1 bernhard  staff    10K 22 Okt 23:44 .DS_Store
-rw-r--r--   1 bernhard  staff    32B 22 Okt 22:06 mpa_latest
-rw-r--r--   1 bernhard  staff   4,1G 24 Okt 04:45 mpa_vJan21_CHOCOPhlAnSGB_202103.1.bt2l.tmp
-rw-r--r--   1 bernhard  staff   4,4G 24 Okt 04:32 mpa_vJan21_CHOCOPhlAnSGB_202103.2.bt2l.tmp
-rw-r--r--   1 bernhard  staff    85M 23 Okt 16:46 mpa_vJan21_CHOCOPhlAnSGB_202103.3.bt2l.tmp
-rw-r--r--   1 bernhard  staff   2,2G 23 Okt 16:46 mpa_vJan21_CHOCOPhlAnSGB_202103.4.bt2l.tmp
-rw-r--r--   1 bernhard  staff    10G 23 Okt 13:07 mpa_vJan21_CHOCOPhlAnSGB_202103.fna
-rw-r--r--   1 bernhard  staff    70B 22 Okt 22:22 mpa_vJan21_CHOCOPhlAnSGB_202103.md5
-rw-rw-r--   1 bernhard  staff    53M  1 Apr  2022 mpa_vJan21_CHOCOPhlAnSGB_202103.pkl
-rw-r--r--   1 bernhard  staff   1,7G 24 Okt 13:06 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.1.bt2l.tmp
-rw-r--r--   1 bernhard  staff   2,3G 24 Okt 13:06 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.2.bt2l.tmp
-rw-r--r--   1 bernhard  staff   2,2G 24 Okt 11:33 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.24.sa
-rw-r--r--   1 bernhard  staff   2,2G 24 Okt 12:22 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.25.sa
-rw-r--r--   1 bernhard  staff   1,8G 24 Okt 12:26 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.26.sa
-rw-r--r--   1 bernhard  staff   1,7G 24 Okt 12:38 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.27.sa
-rw-r--r--   1 bernhard  staff   2,0G 24 Okt 13:01 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.28.sa
-rw-r--r--   1 bernhard  staff   1,4G 24 Okt 12:40 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.29.sa
-rw-r--r--   1 bernhard  staff   1,0G 24 Okt 13:06 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.30.sa
-rw-r--r--   1 bernhard  staff   918M 24 Okt 13:05 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.31.sa
-rw-r--r--   1 bernhard  staff   667M 24 Okt 13:06 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.32.sa
-rw-r--r--   1 bernhard  staff   568M 24 Okt 13:06 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.33.sa
-rw-r--r--   1 bernhard  staff   133M 24 Okt 13:06 mpa_vJan21_CHOCOPhlAnSGB_202103.rev.34.sa
-rw-r--r--@  1 bernhard  staff   2,6G 22 Okt 22:22 mpa_vJan21_CHOCOPhlAnSGB_202103.tar
-rw-r--r--   1 bernhard  staff   9,2G 23 Okt 11:33 mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna
-rw-rw-r--   1 bernhard  staff    43K 11 Jun  2021 mpa_vJan21_CHOCOPhlAnSGB_202103_VINFO.csv
-rw-r--r--   1 bernhard  staff   841M 23 Okt 10:39 mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna
-rw-r--r--@  1 bernhard  staff    29M 22 Aug 16:51 mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt.bz2
-rw-r--r--@  1 bernhard  staff   380K 25 Aug 10:25 mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.bz2

I guess it is better to stick with the older MetaPhlAn3 version then or move to a system with more RAM.
Best regards, Bernhard

Had the same issue. 32 Gb on the remote cluster was not enough. Allocated 64 GB and it worked (probably you don’t need 64 :grinning: ).

Yes, building the new MetaPhlAn 4 database requires a great amount of RAM. We are currently working on making the pre-build database available to download to avoid this inconvenience. We will keep you posted

1 Like

I uploaded a precomputed version of the bt2 database here: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vJan21_CHOCOPhlAnSGB_202103_bt2.tar
Let me know if you have any problems with it

$ humann --version
humann v3.6.1
$ metaphlan --version
MetaPhlAn version 4.0.6 (1 Mar 2023)
$ biobakery_workflows --version
biobakery_workflows v3.1

When running:

humann --input /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data3/kneaddata/main/HD32R1_subsample.fastq.gz --output /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data3/humann/main --o-log /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data3/humann/main/HD32R1_subsample.log --threads 1 --taxonomic-profile /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data3/metaphlan/main/HD32R1_subsample_taxonomic_profile.tsv

I get:

ERROR: The MetaPhlAn taxonomic profile provided was not generated with the database version v3 or vJan21 . Please update your version of MetaPhlAn to at least v3.0 or if you are using MetaPhlAn v4 please use the database vJan21.

Indeed when downloading the metaphlan DB with:

metaphlan --install

the db installed is: vOct22

this is a consequence of: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest containing:

mpa_vOct22_CHOCOPhlAnSGB_202212

I downloaded the precomputed version of the bt2 database from: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vJan21_CHOCOPhlAnSGB_202103_bt2.tar (as suggested in How to install the latest metaphlan reference DB? - #13 by aitor.blancomiguez and I untarred the file and placed the bt2l files in the methaplan_databases directory but even after doing so I still get the same error. Possibly because I miss a pkl and a csv file for the vJan21?

Question 1: is there a way to force a specific version to be downloaded and computed?

Question 2: what is the command to perform the computation locally after downloading the files from Index of /biobakery4/metaphlan_databases for vJan21?

Question 3: I see that vJan21 is hardcoded in many files humann and metaphlan (e.g., humann/config.py, humann-3.6.1-py3.9.egg-info/SOURCES.txt, MetaPhlAn-4.0.6.dist-info/METADATA, etc.), shouldn’t accordingly mpa_latest in Index of /biobakery4/metaphlan_databases point to vJan21 and not vOct22 or the version be updated in humann and metaphlan?

Thanks,

RD

OK, I see that the answer to my Queston 1 in How to install the latest metaphlan reference DB? - #14 by rda is provided in:

By default, the latest MetaPhlAn database is downloaded and built. You can download a specific version with the --index parameter

$ metaphlan --install --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2db <database folder>

Update to: How to install the latest metaphlan reference DB? - #15 by rda

Even after downloading the DB with the --index, the step:

$ humann --input /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/kneaddata/main/HD32R1_subsample.fastq.gz --output /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/humann/main --o-log /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/humann/main/HD32R1_subsample.log --threads 1 --taxonomic-profile /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/metaphlan/main/HD32R1_subsample_taxonomic_profile.tsv

still fails with error:

ERROR: The MetaPhlAn taxonomic profile provided was not generated with the database version v3 or vJan21 . Please update your version of MetaPhlAn to at least v3.0 or if you are using MetaPhlAn v4 please use the database vJan21.

I think is caused by the fact that when computing the taxonomic profile metaphlan chooses the version of the DB as declared in mpa_latest, I am not sure how to correct this step (especially when it occurs from within biobakery_workflow).

Please see below the first few lines of the tsv file:

$ head /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/metaphlan/main/HD32R1_subsample_taxonomic_profile.tsv
#mpa_vOct22_CHOCOPhlAnSGB_202212
#/u/local/apps/PYTHON-VIRT-ENVS/3.9.6/biobakery/bin/metaphlan /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/kneaddata/main/HD32R1_subsample.fastq.gz --input_type fastq --output_file /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/metaphlan/main/HD32R1_subsample_taxonomic_profile.tsv --samout /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/metaphlan/main/HD32R1_subsample_bowtie2.sam --nproc 1 --no_map --tmp_dir /u/home/staff1/cusgunx/ATS-TEST/biobakery/output_data4/metaphlan/main
#34596 reads processed
#SampleID	Metaphlan_Analysis
#clade_name	NCBI_tax_id	relative_abundance	additional_species
k__Bacteria	2	100.0	
k__Bacteria|p__Bacteroidetes	2|976	76.7975	
k__Bacteria|p__Firmicutes	2|1239	18.81751	
k__Bacteria|p__Actinobacteria	2|201174	4.38499	
k__Bacteria|p__Bacteroidetes|c__Bacteroidia	2|976|200643	76.7975

Any suggestion would be appreciated.

Thanks

As the installation is done via humann, I think if you move the question to the humann subforum (HUMAnN - The bioBakery help forum) they will better address your question