The bioBakery help forum

MetaPhlAn Database Issue

Hello!

I installed MetaPhlAn 3.0.7 using conda. However, when I first tried to run MetaPhlAn on got the error:

No MetaPhlAn BowTie2 database found (–index option)! Expecting location /home/vilardi.k/.conda/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901

If I am using the database in my conda folder, do I still need to have the database location in the command line (–index /home/vilardi.k/.conda/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901)

I also tried to download the database outside of the conda folder, but ultimately could not due to how long job was taking on the discovery cluster I am using.

Hi Katie,
–index expects to find as parameter the name of the database, in this case mpa_v30_CHOCOPhlAn_201901.
To specify its location, you can use the parameter --bowtie2db /home/vilardi.k/.conda/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan_databases/.

Thank you for response. Unfortunately, I added those parameters and still get the same error message. I also noticed the file size of mpa_v30_CHOCOPhlAn_201901.2.bt2 in metaphlan_databases is 0. There is nothing in it.

It seems something failed while building the database. Have you tried re-running bowtie2-build on the fasta with the sequences?

Hello,
I am getting the same problems and have tried to download and/or build the databases in multiple ways. I installed metaphlan into the metaphlan-3.0 conda environment as described in the Github issue #109, and also tried to pip install outside of a conda environment… I was able to download and run it successfully on my local machine but I need to work on the cluster due to the size of the metagenomics data. Here is the error I am getting when trying to run metaphlan --install in either pip or conda:
Building Bowtie2 indexes

Fatal error running ‘bowtie2-build --quiet --threads 4 -f /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901’

Error message: ‘Command ‘[‘bowtie2-build’, ‘–quiet’, ‘–threads’, ‘4’, ‘-f’, ‘/storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna’, ‘/storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901’]’ returned non-zero exit status 247.’

I also tried to copy over the databases that downloaded successfully on my local machine into the cluster, that did not work either. Here is the content of the metaphlan_databases folder:

$ ls /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/
mpa_latest mpa_v30_CHOCOPhlAn_201901.fna
mpa_v30_CHOCOPhlAn_201901.1.bt2 mpa_v30_CHOCOPhlAn_201901.fna.bz2
mpa_v30_CHOCOPhlAn_201901.2.bt2 mpa_v30_CHOCOPhlAn_201901.md5
mpa_v30_CHOCOPhlAn_201901.25.sa mpa_v30_CHOCOPhlAn_201901.pkl
mpa_v30_CHOCOPhlAn_201901.3.bt2 mpa_v30_CHOCOPhlAn_201901.tar
mpa_v30_CHOCOPhlAn_201901.4.bt2

All of the databases seem to be there, but it is still not running properly…

I should add that when I try bowtie2-build it freezes here:

$ bowtie2-build mpa_v30_CHOCOPhlAn_201901.fna mpa_v30_CHOCOPhlAn_201901
Settings:
Output files: "mpa_v30_CHOCOPhlAn_201901..bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void
:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
mpa_v30_CHOCOPhlAn_201901.fna
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:09
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:08
bmax according to bmaxDivN setting: 299330357
Using parameters --bmax 224497768 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 224497768 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:39
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:09
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:24
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 1.19732e+09 (target: 224497767)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 1197321429 for bucket 1
(Using difference cover)

Can you run the following command and post the output?
bowtie2-build –threads 4 -f /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901

Also,
ls -l /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/

Here is the output:
Error: Encountered internal Bowtie 2 exception (#1)
Command: /storage/work/e/epb5360/miniconda3/envs/metaphlan-3.0/bin/bowtie2-build-s --wrapper basic-0 -threads 4 -f /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901

$ ls -l /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/

total 3.0G

-rw-rw---- 1 epb5360 epb5360_collab 26 Feb 19 11:02 mpa_latest

-rw-rw---- 1 epb5360 epb5360_collab 296M Feb 19 11:11 mpa_v30_CHOCOPhlAn_201901.1.bt2

-rw-rw---- 1 epb5360 epb5360_collab 209M Feb 19 11:11 mpa_v30_CHOCOPhlAn_201901.2.bt2

-rw-rw---- 1 epb5360 epb5360_collab 101M Feb 19 11:11 mpa_v30_CHOCOPhlAn_201901.25.sa

-rw-rw---- 1 epb5360 epb5360_collab 9.9M Feb 19 11:04 mpa_v30_CHOCOPhlAn_201901.3.bt2

-rw-rw---- 1 epb5360 epb5360_collab 286M Feb 19 11:04 mpa_v30_CHOCOPhlAn_201901.4.bt2

-rw-rw---- 1 epb5360 epb5360_collab 1.4G Feb 19 11:04 mpa_v30_CHOCOPhlAn_201901.fna

-rw-rw-r-- 1 epb5360 epb5360_collab 342M Jun 30 2020 mpa_v30_CHOCOPhlAn_201901.fna.bz2

-rw-rw---- 1 epb5360 epb5360_collab 64 Feb 19 11:03 mpa_v30_CHOCOPhlAn_201901.md5

-rw-rw-r-- 1 epb5360 epb5360_collab 25M Jan 20 09:57 mpa_v30_CHOCOPhlAn_201901.pkl

-rw-rw---- 1 epb5360 epb5360_collab 367M Feb 19 11:03 mpa_v30_CHOCOPhlAn_201901.tar

Have you checked the md5 of the downloaded tar (md5sum -c mpa_v30_CHOCOPhlAn_201901.md5 mpa_v30_CHOCOPhlAn_201901.tar)?
Can you post here the version of Bowtie2 are you using? Are you trying to build the index from a HPC cluster system?

Yes, I’m building this from an HPC cluster but I had previously tried to copy over the database files from a (successful) local build which doesn’t work either… the cluster is currently down so I’ll get back to you asap

1 Like

After a bit of struggling, I just managed to build the bowtie index on an HPC cluster. I had to submit it as a job because it didn’t seem to be working from an interactive node.

Submit script (though it didn’t actually need much RAM, ~3 GB):

bsub -e build_bowtie2_index.err -o build_bowtie2_index.out -n 8 -M 20480 -R "rusage [mem=20480] span[hosts=1]" sh build_bowtie2_index.sh

build_bowtie2_index.sh:

source ~/miniconda3/bin/activate metaphlan
metaphlan --install --index mpa_v30_CHOCOPhlAn_201901 --bowtie2db /dir1/dir2/metaphlan_db/

And here’s what the directory looks like after it ran:

mpa_latest
mpa_v30_CHOCOPhlAn_201901.1.bt2
mpa_v30_CHOCOPhlAn_201901.2.bt2
mpa_v30_CHOCOPhlAn_201901.3.bt2
mpa_v30_CHOCOPhlAn_201901.4.bt2
mpa_v30_CHOCOPhlAn_201901.fna.bz2
mpa_v30_CHOCOPhlAn_201901.md5
mpa_v30_CHOCOPhlAn_201901.pkl
mpa_v30_CHOCOPhlAn_201901.rev.1.bt2
mpa_v30_CHOCOPhlAn_201901.rev.2.bt2
mpa_v30_CHOCOPhlAn_201901.tar

Hope this might be helpful!