MetaPhlAn Database Issue

Hello!

I installed MetaPhlAn 3.0.7 using conda. However, when I first tried to run MetaPhlAn on got the error:

No MetaPhlAn BowTie2 database found (–index option)! Expecting location /home/vilardi.k/.conda/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901

If I am using the database in my conda folder, do I still need to have the database location in the command line (–index /home/vilardi.k/.conda/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901)

I also tried to download the database outside of the conda folder, but ultimately could not due to how long job was taking on the discovery cluster I am using.

Hi Katie,
–index expects to find as parameter the name of the database, in this case mpa_v30_CHOCOPhlAn_201901.
To specify its location, you can use the parameter --bowtie2db /home/vilardi.k/.conda/envs/mpa/lib/python3.7/site-packages/metaphlan/metaphlan_databases/.

Thank you for response. Unfortunately, I added those parameters and still get the same error message. I also noticed the file size of mpa_v30_CHOCOPhlAn_201901.2.bt2 in metaphlan_databases is 0. There is nothing in it.

It seems something failed while building the database. Have you tried re-running bowtie2-build on the fasta with the sequences?

Hello,
I am getting the same problems and have tried to download and/or build the databases in multiple ways. I installed metaphlan into the metaphlan-3.0 conda environment as described in the Github issue #109, and also tried to pip install outside of a conda environment… I was able to download and run it successfully on my local machine but I need to work on the cluster due to the size of the metagenomics data. Here is the error I am getting when trying to run metaphlan --install in either pip or conda:
Building Bowtie2 indexes

Fatal error running ‘bowtie2-build --quiet --threads 4 -f /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901’

Error message: ‘Command ‘[‘bowtie2-build’, ‘–quiet’, ‘–threads’, ‘4’, ‘-f’, ‘/storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna’, ‘/storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901’]’ returned non-zero exit status 247.’

I also tried to copy over the databases that downloaded successfully on my local machine into the cluster, that did not work either. Here is the content of the metaphlan_databases folder:

$ ls /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/
mpa_latest mpa_v30_CHOCOPhlAn_201901.fna
mpa_v30_CHOCOPhlAn_201901.1.bt2 mpa_v30_CHOCOPhlAn_201901.fna.bz2
mpa_v30_CHOCOPhlAn_201901.2.bt2 mpa_v30_CHOCOPhlAn_201901.md5
mpa_v30_CHOCOPhlAn_201901.25.sa mpa_v30_CHOCOPhlAn_201901.pkl
mpa_v30_CHOCOPhlAn_201901.3.bt2 mpa_v30_CHOCOPhlAn_201901.tar
mpa_v30_CHOCOPhlAn_201901.4.bt2

All of the databases seem to be there, but it is still not running properly…

I should add that when I try bowtie2-build it freezes here:

$ bowtie2-build mpa_v30_CHOCOPhlAn_201901.fna mpa_v30_CHOCOPhlAn_201901
Settings:
Output files: "mpa_v30_CHOCOPhlAn_201901..bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void
:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
mpa_v30_CHOCOPhlAn_201901.fna
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:09
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:08
bmax according to bmaxDivN setting: 299330357
Using parameters --bmax 224497768 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 224497768 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:39
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:09
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:24
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 1.19732e+09 (target: 224497767)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 1197321429 for bucket 1
(Using difference cover)

Can you run the following command and post the output?
bowtie2-build –threads 4 -f /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901

Also,
ls -l /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/

Here is the output:
Error: Encountered internal Bowtie 2 exception (#1)
Command: /storage/work/e/epb5360/miniconda3/envs/metaphlan-3.0/bin/bowtie2-build-s --wrapper basic-0 -threads 4 -f /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901

$ ls -l /storage/work/epb5360/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/

total 3.0G

-rw-rw---- 1 epb5360 epb5360_collab 26 Feb 19 11:02 mpa_latest

-rw-rw---- 1 epb5360 epb5360_collab 296M Feb 19 11:11 mpa_v30_CHOCOPhlAn_201901.1.bt2

-rw-rw---- 1 epb5360 epb5360_collab 209M Feb 19 11:11 mpa_v30_CHOCOPhlAn_201901.2.bt2

-rw-rw---- 1 epb5360 epb5360_collab 101M Feb 19 11:11 mpa_v30_CHOCOPhlAn_201901.25.sa

-rw-rw---- 1 epb5360 epb5360_collab 9.9M Feb 19 11:04 mpa_v30_CHOCOPhlAn_201901.3.bt2

-rw-rw---- 1 epb5360 epb5360_collab 286M Feb 19 11:04 mpa_v30_CHOCOPhlAn_201901.4.bt2

-rw-rw---- 1 epb5360 epb5360_collab 1.4G Feb 19 11:04 mpa_v30_CHOCOPhlAn_201901.fna

-rw-rw-r-- 1 epb5360 epb5360_collab 342M Jun 30 2020 mpa_v30_CHOCOPhlAn_201901.fna.bz2

-rw-rw---- 1 epb5360 epb5360_collab 64 Feb 19 11:03 mpa_v30_CHOCOPhlAn_201901.md5

-rw-rw-r-- 1 epb5360 epb5360_collab 25M Jan 20 09:57 mpa_v30_CHOCOPhlAn_201901.pkl

-rw-rw---- 1 epb5360 epb5360_collab 367M Feb 19 11:03 mpa_v30_CHOCOPhlAn_201901.tar

Have you checked the md5 of the downloaded tar (md5sum -c mpa_v30_CHOCOPhlAn_201901.md5 mpa_v30_CHOCOPhlAn_201901.tar)?
Can you post here the version of Bowtie2 are you using? Are you trying to build the index from a HPC cluster system?

Yes, I’m building this from an HPC cluster but I had previously tried to copy over the database files from a (successful) local build which doesn’t work either… the cluster is currently down so I’ll get back to you asap

1 Like

After a bit of struggling, I just managed to build the bowtie index on an HPC cluster. I had to submit it as a job because it didn’t seem to be working from an interactive node.

Submit script (though it didn’t actually need much RAM, ~3 GB):

bsub -e build_bowtie2_index.err -o build_bowtie2_index.out -n 8 -M 20480 -R "rusage [mem=20480] span[hosts=1]" sh build_bowtie2_index.sh

build_bowtie2_index.sh:

source ~/miniconda3/bin/activate metaphlan
metaphlan --install --index mpa_v30_CHOCOPhlAn_201901 --bowtie2db /dir1/dir2/metaphlan_db/

And here’s what the directory looks like after it ran:

mpa_latest
mpa_v30_CHOCOPhlAn_201901.1.bt2
mpa_v30_CHOCOPhlAn_201901.2.bt2
mpa_v30_CHOCOPhlAn_201901.3.bt2
mpa_v30_CHOCOPhlAn_201901.4.bt2
mpa_v30_CHOCOPhlAn_201901.fna.bz2
mpa_v30_CHOCOPhlAn_201901.md5
mpa_v30_CHOCOPhlAn_201901.pkl
mpa_v30_CHOCOPhlAn_201901.rev.1.bt2
mpa_v30_CHOCOPhlAn_201901.rev.2.bt2
mpa_v30_CHOCOPhlAn_201901.tar

Hope this might be helpful!

Thanks! I believe this is still missing the 5 and 6 bt2 files? That’s been my continual issue

Maybe a silly question: where are you seeing to expect .5.bt2 and .6.bt2 files? It’s been a little bit since I did this, so I am a little hazier on the details now.

I am hitting the same error “non-zero exit status 247”. My /metaphlan_databases/ looks exactly like @levlitichev posted above. @EmilyB did you eventually solve this?

Yes! @levlitichev I believe that was from an earlier version, I remember seeing it on forums somewhere…

@maxqiu I built the bowtie2 database in a different directory outside of the default conda install. For some reason that played nicely with the cluster and I was able to run humann.

I’d like to follow up on both EmilyB and levlitichev’s comments. On most HPC clusters, the ulimit is typically set so computationally intensive processes can’t be run on the login nodes. The login nodes are shared resources and only used for text editing and small file manipulations. I bet the failure with exit code 247 seen by bowtie2-build is emblematic of this.

I had the same error when running metaphlan --install at the Pittsburgh Supercomputing Center (PSC). The problem was that bowtie is called with 4 threads (see error message), while my default session was set up with a single thread.

At the PSC I normally start an interactive session with:

interact

All I had to do to make bowtie work was to request an interactive session with 4 tasks:

interact -n 4

Dear Emily,

I wonder if you could share how you built the bowtie2 database in a different directory? Did you come out of your conda environment and create it in a completely different folder? Also, did you have to recruit more CPU / memory for this and what code you used if you remember. I am stuck at this stage where I am getting this error 247.

I would be very grateful if anyone could help :slight_smile: many thanks.