I’m trying to build a docker container for running humann3. My base container is running ubuntu 20.04 and I’m using conda version 4.8.3 for the installation. The default Python seems to be v3.8.3, but conda won’t install humann with that version so I’ve downgraded to Python v3.7.0 (ie. conda install python=3.7.0).
The repo order is: biobakery, conda-forge, bioconda, defaults. At that point I run:
conda install humann -c biobakery
The Humann3 wiki suggests that this should handle all the pre-requisites including metaphlan. But from some helpful posts I found that it was installing an older version of metaphlan. So I explicitly installed what I think (from the post) is the correct version of metaphlan:
and run the tests. humann_test completes successfully. But then my installation is failing when I try to run:
humann -i demo.fastq -o sample_results
I am getting this error:
Running metaphlan ........
CRITICAL ERROR: Error executing: /usr/local/miniconda3/bin/metaphlan /usr/local/miniconda3/lib/python3.7/site-packages/humann/tests/data/demo.fastq -t rel_ab -o /gscmnt/gc2732/mitrevalab/USERS_Mitreva/jmartin/200806_testing_humann3/sample_results/demo_humann_temp/demo_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /gscmnt/gc2732/mitrevalab/USERS_Mitreva/jmartin/200806_testing_humann3/sample_results/demo_humann_temp/demo_metaphlan_bowtie2.txt
Error message returned from metaphlan :
No MetaPhlAn BowTie2 database found (--index option)!
Expecting location /usr/local/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901
Exiting...
I am new to humann3 & metaphlan3 and I am not sure that I fully understood the installation instructions. Do I need to explicitly download metaphlan databases similar to what I did for the humann dbs? I’m not sure if I’m doing something wrong or if I missed a step somewhere.
Yes, in the Dockerfile you have to include RUN metaphlan --install in order to install inside the Docker container the MetaPhlAn database, otherwise you can provide a local copy using the volume binding.
Have a look here, we provide also a pre-built docker image
Ah, I would very much prefer using the biobakery/humann image but when I tried running humann on the demo.fastq it I get this error:
Error message returned from metaphlan :
ERROR: Unable to create folder for database install: /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases
The system I am working on has some strict rules about where users can write into. I have put in a request for help to our systems group but I suspect that I am not allowed to write anywhere under /usr/local/lib.
So I think I will need to pre-install the metaphlan Db at an explicit path my user can access. Is there some way I can use metaphlan --install with a forced path for the db? Assuming I set that up I think I would be able to run humann using --metaphlan-options “–bowtie2db <path_to_bowtie2_db>”, is that correct?
So basically my question boils down to how can I install the metaphlan db into an explicit path.
The /usr/local/lib is inside the docker image, so in theory, you should be able to write inside it since is not the host system.
Yes! You can run metaphlan --install --bowtie2db <path_to_bowtie2_db> in order to download and build the database in a non-default location, but as you said, keep in mind to use --metaphlan-options “–bowtie2db <path_to_bowtie2_db>” when running HUMAnN.
I was able to get a successful run using biobakery/humann after doing the custom install of the metaphlan db, thank you for the help!
I guess the only question I have remaining is do I have to install any additional DB to make the biobakery/humann container ready for real data? Assuming I ran metaphlan --install --bowtie2db <my_custom_db_path>, are the base Humann dbs already installed in the container? I ask because as I mentioned I had been trying to setup my own container and in that container I was running:
I know those are specifically DEMO dbs (at the time I was just trying to get the demo.fastq to run), but will I need to install those in some custom path as well? Or does the biobakery/humann container already ahve them and I only need to install the metaphlan db?
No, the Docker container comes with only the software, I’d suggest you to use the humann:3.0.0.a.4.plus.utility.dbs image which also includes the mapping utilities.
You can specify the path in which the HUMAnN databases will be installed with the last humann_databases parameter, but what I would do is to have both the UniRef and CHOCOPhlAn databases in a local directory and mount it via volume binding and run HUMAnN with the --nucleotide-database and --protein-database parameters in order to point to the correct database location
I just tried installing the humann databases as you suggested for:
chocophlan full
uniref uniref90_diamond
utility_mapping full
But when I ran the humann_databases command-lines to install them they errored out with the same message in all 3 cases:
Unable to write to the HUMAnN config file.
As I mentioned before the environment I’m working in does not allow me to write along any path underneath /usr/local. So my guess is that Humann is trying to update some config file sitting near the executable inside the container (/usr/local/bin I think?), but even though its in a container our local environment will not let me modify any path under /usr/local
Is there any way I can move the human config file to a location I can write into, and then specify that alternate location when I run humann_databases (and I guess when I run the full humann)?
I am pretty much restricted to the disk space underneath the volume assigned to my lab. I don’t have write access anywhere outside of that. It doesn’t make sense to me either since technically the stuff inside the biobakery/humann container is not on our system, but I am told that even though its in a container the infrastructure they have setup (I run the container through an LSF bsub command) prevents me from writing anywhere underneath any forbidden path
Is there any way to get around this? I guess I might be able to make a branch of the biobakery docker container and manually update the humann config file if I can find it, and if I can figure out the correct formatting. Do you have any ideas that might help me?
Yes, after humann_databases, the configuration file is updated in order to automatically point to the databases locations, but if you specify each time the two parameters I mentioned in the previous post, you don’t have to branch the dockerfile and have a custom one.
Hi @John_Martin and @fbeghini, Just jumping in on this thread as there is a humann option that is not mentioned in the user manual because it is not used much but I think it might be useful in this case. There is an option when downloading the databases that will allow you to not write to the config file. Adding the option --update-config no will download the database but not update the config file. This is useful if because of permissions you can’t write to the config file but need to download and install the databases. Then since the downloads are not in the config file just specify the locations to the databases you have downloaded when running humann and you should be all set.
I will get that option added to the user manual today.
It would be comforting to see the humann_databases commands finish without error. But I started a test run with the dbs I downloaded where I got the message about the config not being updated on the assumption that the only problem was the failure to update the config. So far the process seems to be working using those dbs.
But are you saying that the dbs I downloaded using humann_databases will not be complete/correct in my case (with the permissions issue resulting in the error message I got)? Sorry for being dense here, I just want to be sure I am using this tool correctly
Hi John, Sorry for the confusion. The databases you downloaded should be okay. The last step for the database tool is to update the config so an error in writing the config file would not affect the databases.
Hi!
I just wanted to note this whole problem also makes it difficult to use the container with Singularity, as Singularity containers are read only. I guess for metaphlan this was solved after this issue was raised:
The current humann3 container also needs the above workarounds when used in Singularity to install the databases and get metaphlan to work.
Installed via conda using the tutorial. Having the same problem, however, I cannot get it to run after downloading the database separately. I went to zenodo and downloaded the .tar, .md5, and the mpa_latest files. Metaphlan by default tries to get it from dropbox and fails. I tried pointing it to another location via:
humann -i demo.fastq -o sample_results --metaphlan-options '--bowtie2db /data/software/Reference_data/metaphlan/mpa_v30_CHOCOPhlAn_201901'
Output files will be written to: /data/butlerr/rosmap/pfc_rnaseq/sample_results
WARNING: Can not call software version for bowtie2
Running metaphlan ........
CRITICAL ERROR: Error executing: /data/butlerr/miniconda3/envs/biobakery3/bin/metaphlan /data/butlerr/rosmap/pfc_rnaseq/demo.fastq -–bowtie2db /data/software/Reference_data/metaphlan/mpa_v30_CHOCOPhlAn_201901 -o /data/butlerr/rosmap/pfc_rnaseq/sample_results/demo_humann_temp/demo_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /data/butlerr/rosmap/pfc_rnaseq/sample_results/demo_humann_temp/demo_metaphlan_bowtie2.txt
Error message returned from metaphlan :
Downloading https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAA4XDP85WHon_eHvztxkamTa/file_list.txt?dl=1
Warning: Unable to download https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAA4XDP85WHon_eHvztxkamTa/file_list.txt?dl=1
...
I also tried copying the three files into the miniconda3/envs/biobakery3/lib/python3.7/site-packages/metaphlan/metaphlan_databases directory. Still tries to get dropbox.
Hi,
You have to specify using --metaphlan-options "-x <dbname>"
the name of the desired database, only in this case if all the required files are present, it will not try to download the tar
To clarify, I did get it working with the updated command. As you said, forgot the -x. There was talk before of including zenodo as a backup if dropbox failed. Is that still a potential feature? Or hosting it via huttenhower.sph.harvard.edu ?
Yes, we implemented the possibility to fetch the database from Zenodo if the download from Dropbox fails. We’ll look for hosting the database on our servers in order to avoid such problems.
It’d be great if this pattern were easy to find in the humann documentation. Using these options seems to be required when using conda to prevent metaphlann from installing databases to the conda env. I’m running humann3 via Snakemake and using conda environments.
Hi you all, I’m having this issue using singularity. Although I provide the path to the database with --nucleotide-database that I have in a different folder, it fails with this error:
ERROR: Unable to create folder for database install: /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases
Could it be that MetaPhlAn tries to make an index or something like that?
I am running this in a cluster where I am unable to install HUMAnN and make it work correctly because of MetaPhlAn, that is why we were trying with singularity, but again we have problems.
Do you know why this is happening? I’ve tried possible solutions proposed here but without successful results.
Many thanks in advanced.