The bioBakery help forum

Difficulties installing humann3 & metaphlan3

I’m trying to build a docker container for running humann3. My base container is running ubuntu 20.04 and I’m using conda version 4.8.3 for the installation. The default Python seems to be v3.8.3, but conda won’t install humann with that version so I’ve downgraded to Python v3.7.0 (ie. conda install python=3.7.0).

The repo order is: biobakery, conda-forge, bioconda, defaults. At that point I run:

conda install humann -c biobakery

The Humann3 wiki suggests that this should handle all the pre-requisites including metaphlan. But from some helpful posts I found that it was installing an older version of metaphlan. So I explicitly installed what I think (from the post) is the correct version of metaphlan:

conda install metaphlan=3.0=pyh5ca1d4c_2 --no-channel-priority

I then install the DEMO dbs for testing:

humann_databases --download chocophlan DEMO humann_dbs
humann_databases --download uniref DEMO_diamond humann_dbs

and run the tests. humann_test completes successfully. But then my installation is failing when I try to run:

humann -i demo.fastq -o sample_results

I am getting this error:

Running metaphlan ........

CRITICAL ERROR: Error executing: /usr/local/miniconda3/bin/metaphlan /usr/local/miniconda3/lib/python3.7/site-packages/humann/tests/data/demo.fastq -t rel_ab -o /gscmnt/gc2732/mitrevalab/USERS_Mitreva/jmartin/200806_testing_humann3/sample_results/demo_humann_temp/demo_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /gscmnt/gc2732/mitrevalab/USERS_Mitreva/jmartin/200806_testing_humann3/sample_results/demo_humann_temp/demo_metaphlan_bowtie2.txt

Error message returned from metaphlan :
No MetaPhlAn BowTie2 database found (--index option)!
Expecting location /usr/local/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901
Exiting...

I am new to humann3 & metaphlan3 and I am not sure that I fully understood the installation instructions. Do I need to explicitly download metaphlan databases similar to what I did for the humann dbs? I’m not sure if I’m doing something wrong or if I missed a step somewhere.

Yes, in the Dockerfile you have to include RUN metaphlan --install in order to install inside the Docker container the MetaPhlAn database, otherwise you can provide a local copy using the volume binding.

Have a look here, we provide also a pre-built docker image

Ah, I would very much prefer using the biobakery/humann image but when I tried running humann on the demo.fastq it I get this error:

Error message returned from metaphlan :
ERROR: Unable to create folder for database install: /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases

The system I am working on has some strict rules about where users can write into. I have put in a request for help to our systems group but I suspect that I am not allowed to write anywhere under /usr/local/lib.

So I think I will need to pre-install the metaphlan Db at an explicit path my user can access. Is there some way I can use metaphlan --install with a forced path for the db? Assuming I set that up I think I would be able to run humann using --metaphlan-options “–bowtie2db <path_to_bowtie2_db>”, is that correct?

So basically my question boils down to how can I install the metaphlan db into an explicit path.

Thanks,
John

The /usr/local/lib is inside the docker image, so in theory, you should be able to write inside it since is not the host system.

Yes! You can run metaphlan --install --bowtie2db <path_to_bowtie2_db> in order to download and build the database in a non-default location, but as you said, keep in mind to use --metaphlan-options “–bowtie2db <path_to_bowtie2_db>” when running HUMAnN.

I was able to get a successful run using biobakery/humann after doing the custom install of the metaphlan db, thank you for the help!

I guess the only question I have remaining is do I have to install any additional DB to make the biobakery/humann container ready for real data? Assuming I ran metaphlan --install --bowtie2db <my_custom_db_path>, are the base Humann dbs already installed in the container? I ask because as I mentioned I had been trying to setup my own container and in that container I was running:

humann_databases --download chocophlan DEMO humann_dbs
humann_databases --download uniref DEMO_diamond humann_dbs

I know those are specifically DEMO dbs (at the time I was just trying to get the demo.fastq to run), but will I need to install those in some custom path as well? Or does the biobakery/humann container already ahve them and I only need to install the metaphlan db?

No, the Docker container comes with only the software, I’d suggest you to use the humann:3.0.0.a.4.plus.utility.dbs image which also includes the mapping utilities.

You can specify the path in which the HUMAnN databases will be installed with the last humann_databases parameter, but what I would do is to have both the UniRef and CHOCOPhlAn databases in a local directory and mount it via volume binding and run HUMAnN with the --nucleotide-database and --protein-database parameters in order to point to the correct database location

I just tried installing the humann databases as you suggested for:

chocophlan full
uniref uniref90_diamond
utility_mapping full

But when I ran the humann_databases command-lines to install them they errored out with the same message in all 3 cases:

Unable to write to the HUMAnN config file.

As I mentioned before the environment I’m working in does not allow me to write along any path underneath /usr/local. So my guess is that Humann is trying to update some config file sitting near the executable inside the container (/usr/local/bin I think?), but even though its in a container our local environment will not let me modify any path under /usr/local

Is there any way I can move the human config file to a location I can write into, and then specify that alternate location when I run humann_databases (and I guess when I run the full humann)?

I am pretty much restricted to the disk space underneath the volume assigned to my lab. I don’t have write access anywhere outside of that. It doesn’t make sense to me either since technically the stuff inside the biobakery/humann container is not on our system, but I am told that even though its in a container the infrastructure they have setup (I run the container through an LSF bsub command) prevents me from writing anywhere underneath any forbidden path

Is there any way to get around this? I guess I might be able to make a branch of the biobakery docker container and manually update the humann config file if I can find it, and if I can figure out the correct formatting. Do you have any ideas that might help me?

Yes, after humann_databases, the configuration file is updated in order to automatically point to the databases locations, but if you specify each time the two parameters I mentioned in the previous post, you don’t have to branch the dockerfile and have a custom one.

Hi @John_Martin and @fbeghini, Just jumping in on this thread as there is a humann option that is not mentioned in the user manual because it is not used much but I think it might be useful in this case. There is an option when downloading the databases that will allow you to not write to the config file. Adding the option --update-config no will download the database but not update the config file. This is useful if because of permissions you can’t write to the config file but need to download and install the databases. Then since the downloads are not in the config file just specify the locations to the databases you have downloaded when running humann and you should be all set.

I will get that option added to the user manual today.

Thank you,
Lauren

It would be comforting to see the humann_databases commands finish without error. But I started a test run with the dbs I downloaded where I got the message about the config not being updated on the assumption that the only problem was the failure to update the config. So far the process seems to be working using those dbs.

But are you saying that the dbs I downloaded using humann_databases will not be complete/correct in my case (with the permissions issue resulting in the error message I got)? Sorry for being dense here, I just want to be sure I am using this tool correctly

Hi John, Sorry for the confusion. The databases you downloaded should be okay. The last step for the database tool is to update the config so an error in writing the config file would not affect the databases.

Thank you,
Lauren

Thanks for the quick reply! And I wanted to add that my test run (using the full dbs) did work. I appreciate all the help!

Hi!
I just wanted to note this whole problem also makes it difficult to use the container with Singularity, as Singularity containers are read only. I guess for metaphlan this was solved after this issue was raised:


The current humann3 container also needs the above workarounds when used in Singularity to install the databases and get metaphlan to work.

Cheers