Difficulties installing humann3 & metaphlan3

I’m trying to build a docker container for running humann3. My base container is running ubuntu 20.04 and I’m using conda version 4.8.3 for the installation. The default Python seems to be v3.8.3, but conda won’t install humann with that version so I’ve downgraded to Python v3.7.0 (ie. conda install python=3.7.0).

The repo order is: biobakery, conda-forge, bioconda, defaults. At that point I run:

conda install humann -c biobakery

The Humann3 wiki suggests that this should handle all the pre-requisites including metaphlan. But from some helpful posts I found that it was installing an older version of metaphlan. So I explicitly installed what I think (from the post) is the correct version of metaphlan:

conda install metaphlan=3.0=pyh5ca1d4c_2 --no-channel-priority

I then install the DEMO dbs for testing:

humann_databases --download chocophlan DEMO humann_dbs
humann_databases --download uniref DEMO_diamond humann_dbs

and run the tests. humann_test completes successfully. But then my installation is failing when I try to run:

humann -i demo.fastq -o sample_results

I am getting this error:

Running metaphlan ........

CRITICAL ERROR: Error executing: /usr/local/miniconda3/bin/metaphlan /usr/local/miniconda3/lib/python3.7/site-packages/humann/tests/data/demo.fastq -t rel_ab -o /gscmnt/gc2732/mitrevalab/USERS_Mitreva/jmartin/200806_testing_humann3/sample_results/demo_humann_temp/demo_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /gscmnt/gc2732/mitrevalab/USERS_Mitreva/jmartin/200806_testing_humann3/sample_results/demo_humann_temp/demo_metaphlan_bowtie2.txt

Error message returned from metaphlan :
No MetaPhlAn BowTie2 database found (--index option)!
Expecting location /usr/local/miniconda3/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901
Exiting...

I am new to humann3 & metaphlan3 and I am not sure that I fully understood the installation instructions. Do I need to explicitly download metaphlan databases similar to what I did for the humann dbs? I’m not sure if I’m doing something wrong or if I missed a step somewhere.

Yes, in the Dockerfile you have to include RUN metaphlan --install in order to install inside the Docker container the MetaPhlAn database, otherwise you can provide a local copy using the volume binding.

Have a look here, we provide also a pre-built docker image

Ah, I would very much prefer using the biobakery/humann image but when I tried running humann on the demo.fastq it I get this error:

Error message returned from metaphlan :
ERROR: Unable to create folder for database install: /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases

The system I am working on has some strict rules about where users can write into. I have put in a request for help to our systems group but I suspect that I am not allowed to write anywhere under /usr/local/lib.

So I think I will need to pre-install the metaphlan Db at an explicit path my user can access. Is there some way I can use metaphlan --install with a forced path for the db? Assuming I set that up I think I would be able to run humann using --metaphlan-options “–bowtie2db <path_to_bowtie2_db>”, is that correct?

So basically my question boils down to how can I install the metaphlan db into an explicit path.

Thanks,
John

The /usr/local/lib is inside the docker image, so in theory, you should be able to write inside it since is not the host system.

Yes! You can run metaphlan --install --bowtie2db <path_to_bowtie2_db> in order to download and build the database in a non-default location, but as you said, keep in mind to use --metaphlan-options “–bowtie2db <path_to_bowtie2_db>” when running HUMAnN.

I was able to get a successful run using biobakery/humann after doing the custom install of the metaphlan db, thank you for the help!

I guess the only question I have remaining is do I have to install any additional DB to make the biobakery/humann container ready for real data? Assuming I ran metaphlan --install --bowtie2db <my_custom_db_path>, are the base Humann dbs already installed in the container? I ask because as I mentioned I had been trying to setup my own container and in that container I was running:

humann_databases --download chocophlan DEMO humann_dbs
humann_databases --download uniref DEMO_diamond humann_dbs

I know those are specifically DEMO dbs (at the time I was just trying to get the demo.fastq to run), but will I need to install those in some custom path as well? Or does the biobakery/humann container already ahve them and I only need to install the metaphlan db?

No, the Docker container comes with only the software, I’d suggest you to use the humann:3.0.0.a.4.plus.utility.dbs image which also includes the mapping utilities.

You can specify the path in which the HUMAnN databases will be installed with the last humann_databases parameter, but what I would do is to have both the UniRef and CHOCOPhlAn databases in a local directory and mount it via volume binding and run HUMAnN with the --nucleotide-database and --protein-database parameters in order to point to the correct database location

I just tried installing the humann databases as you suggested for:

chocophlan full
uniref uniref90_diamond
utility_mapping full

But when I ran the humann_databases command-lines to install them they errored out with the same message in all 3 cases:

Unable to write to the HUMAnN config file.

As I mentioned before the environment I’m working in does not allow me to write along any path underneath /usr/local. So my guess is that Humann is trying to update some config file sitting near the executable inside the container (/usr/local/bin I think?), but even though its in a container our local environment will not let me modify any path under /usr/local

Is there any way I can move the human config file to a location I can write into, and then specify that alternate location when I run humann_databases (and I guess when I run the full humann)?

I am pretty much restricted to the disk space underneath the volume assigned to my lab. I don’t have write access anywhere outside of that. It doesn’t make sense to me either since technically the stuff inside the biobakery/humann container is not on our system, but I am told that even though its in a container the infrastructure they have setup (I run the container through an LSF bsub command) prevents me from writing anywhere underneath any forbidden path

Is there any way to get around this? I guess I might be able to make a branch of the biobakery docker container and manually update the humann config file if I can find it, and if I can figure out the correct formatting. Do you have any ideas that might help me?

Yes, after humann_databases, the configuration file is updated in order to automatically point to the databases locations, but if you specify each time the two parameters I mentioned in the previous post, you don’t have to branch the dockerfile and have a custom one.

Hi @John_Martin and @fbeghini, Just jumping in on this thread as there is a humann option that is not mentioned in the user manual because it is not used much but I think it might be useful in this case. There is an option when downloading the databases that will allow you to not write to the config file. Adding the option --update-config no will download the database but not update the config file. This is useful if because of permissions you can’t write to the config file but need to download and install the databases. Then since the downloads are not in the config file just specify the locations to the databases you have downloaded when running humann and you should be all set.

I will get that option added to the user manual today.

Thank you,
Lauren

It would be comforting to see the humann_databases commands finish without error. But I started a test run with the dbs I downloaded where I got the message about the config not being updated on the assumption that the only problem was the failure to update the config. So far the process seems to be working using those dbs.

But are you saying that the dbs I downloaded using humann_databases will not be complete/correct in my case (with the permissions issue resulting in the error message I got)? Sorry for being dense here, I just want to be sure I am using this tool correctly

Hi John, Sorry for the confusion. The databases you downloaded should be okay. The last step for the database tool is to update the config so an error in writing the config file would not affect the databases.

Thank you,
Lauren

Thanks for the quick reply! And I wanted to add that my test run (using the full dbs) did work. I appreciate all the help!

Hi!
I just wanted to note this whole problem also makes it difficult to use the container with Singularity, as Singularity containers are read only. I guess for metaphlan this was solved after this issue was raised:


The current humann3 container also needs the above workarounds when used in Singularity to install the databases and get metaphlan to work.

Cheers

Installed via conda using the tutorial. Having the same problem, however, I cannot get it to run after downloading the database separately. I went to zenodo and downloaded the .tar, .md5, and the mpa_latest files. Metaphlan by default tries to get it from dropbox and fails. I tried pointing it to another location via:

 humann -i demo.fastq -o sample_results --metaphlan-options '--bowtie2db /data/software/Reference_data/metaphlan/mpa_v30_CHOCOPhlAn_201901'
Output files will be written to: /data/butlerr/rosmap/pfc_rnaseq/sample_results
WARNING: Can not call software version for bowtie2


Running metaphlan ........

CRITICAL ERROR: Error executing: /data/butlerr/miniconda3/envs/biobakery3/bin/metaphlan /data/butlerr/rosmap/pfc_rnaseq/demo.fastq -–bowtie2db /data/software/Reference_data/metaphlan/mpa_v30_CHOCOPhlAn_201901 -o /data/butlerr/rosmap/pfc_rnaseq/sample_results/demo_humann_temp/demo_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /data/butlerr/rosmap/pfc_rnaseq/sample_results/demo_humann_temp/demo_metaphlan_bowtie2.txt

Error message returned from metaphlan :

Downloading https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAA4XDP85WHon_eHvztxkamTa/file_list.txt?dl=1

Warning: Unable to download https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAA4XDP85WHon_eHvztxkamTa/file_list.txt?dl=1
...

I also tried copying the three files into the miniconda3/envs/biobakery3/lib/python3.7/site-packages/metaphlan/metaphlan_databases directory. Still tries to get dropbox.

conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
bcbio-gff                 0.6.6              pyh864c0ab_1    bioconda
biom-format               2.1.8            py37hc1659b7_0    conda-forge
biopython                 1.78             py37h8f50634_0    conda-forge
blast                     2.10.1          pl526he19e7b1_1    bioconda
boost-cpp                 1.70.0               h7b93d67_3    conda-forge
bowtie2                   2.4.1            py37h8270d21_3    bioconda
brotlipy                  0.7.0           py37h8f50634_1000    conda-forge
bx-python                 0.8.9            py37h73d7ac5_2    bioconda
bzip2                     1.0.8                h516909a_3    conda-forge
c-ares                    1.16.1               h516909a_3    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
capnproto                 0.6.1                hfc679d8_1    conda-forge
certifi                   2020.6.20        py37hc8dfbb8_0    conda-forge
cffi                      1.14.3           py37he30daa8_0
chardet                   3.0.4           py37hc8dfbb8_1007    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
cmseq                     1.0.2              pyh7b7c402_0    bioconda
cryptography              3.1.1            py37hb09aad4_0    conda-forge
curl                      7.71.1               he644dc0_6    conda-forge
cycler                    0.10.0                     py_2    conda-forge
dendropy                  4.4.0              pyh864c0ab_2    bioconda
diamond                   2.0.4                h56fc30b_0    bioconda
entrez-direct             13.9            pl526h375a9b1_0    bioconda
expat                     2.2.9                he1b5a44_2    conda-forge
fasttree                  2.1.10               h516909a_4    bioconda
freetype                  2.10.2               he06d7ca_0    conda-forge
future                    0.18.2           py37hc8dfbb8_1    conda-forge
glpk                      4.65              he80fd80_1002    conda-forge
gmp                       6.2.0                he1b5a44_2    conda-forge
gsl                       2.6                  h294904e_0    conda-forge
h5py                      2.10.0          nompi_py37h90cd8ad_104    conda-forge
hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
htslib                    1.10.2               hd3b49d5_1    bioconda
humann                    3.0.0.alpha.3    py37h83b1523_0    biobakery
icu                       67.1                 he1b5a44_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
iqtree                    2.0.3                h176a8bc_0    bioconda
jpeg                      9d                   h516909a_0    conda-forge
kiwisolver                1.2.0            py37h99015e2_0    conda-forge
krb5                      1.17.1               hfafb76e_3    conda-forge
lcms2                     2.11                 hbd6801e_0    conda-forge
ld_impl_linux-64          2.33.1               h53a641e_7
libblas                   3.8.0               17_openblas    conda-forge
libcblas                  3.8.0               17_openblas    conda-forge
libcurl                   7.71.1               hcdd3856_6    conda-forge
libdeflate                1.6                  h516909a_0    conda-forge
libedit                   3.1.20191231         h14c3975_1
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.1.0                hdf63c60_0
libgfortran-ng            7.5.0               hdf63c60_16    conda-forge
liblapack                 3.8.0               17_openblas    conda-forge
libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
libopenblas               0.3.10          pthreads_hb3c22a3_4    conda-forge
libpng                    1.6.37               hed695b0_2    conda-forge
libssh2                   1.9.0                hab1572f_5    conda-forge
libstdcxx-ng              9.1.0                hdf63c60_0
libtiff                   4.1.0                hc7e4089_6    conda-forge
libwebp-base              1.1.0                h516909a_3    conda-forge
lz4-c                     1.9.2                he1b5a44_3    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mafft                     7.471                h516909a_0    bioconda
mash                      2.2.2                ha61e061_2    bioconda
matplotlib-base           3.3.2            py37hd478181_0    conda-forge
metaphlan                 3.0.4              pyh7b7c402_0    bioconda
muscle                    3.8.1551             hc9558a2_5    bioconda
ncurses                   6.2                  he6710b0_1
numpy                     1.19.1           py37h7ea13bd_2    conda-forge
olefile                   0.46                       py_0    conda-forge
openssl                   1.1.1h               h516909a_0    conda-forge
pandas                    1.1.2            py37h3340039_0    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
perl                      5.26.2            h516909a_1006    conda-forge
perl-app-cpanminus        1.7044                  pl526_1    bioconda
perl-archive-tar          2.32                    pl526_0    bioconda
perl-base                 2.23                    pl526_1    bioconda
perl-business-isbn        3.004                   pl526_0    bioconda
perl-business-isbn-data   20140910.003            pl526_0    bioconda
perl-carp                 1.38                    pl526_3    bioconda
perl-common-sense         3.74                    pl526_2    bioconda
perl-compress-raw-bzip2   2.087           pl526he1b5a44_0    bioconda
perl-compress-raw-zlib    2.087           pl526hc9558a2_0    bioconda
perl-constant             1.33                    pl526_1    bioconda
perl-data-dumper          2.173                   pl526_0    bioconda
perl-digest-hmac          1.03                    pl526_3    bioconda
perl-digest-md5           2.55                    pl526_0    bioconda
perl-encode               2.88                    pl526_1    bioconda
perl-encode-locale        1.05                    pl526_6    bioconda
perl-exporter             5.72                    pl526_1    bioconda
perl-exporter-tiny        1.002001                pl526_0    bioconda
perl-extutils-makemaker   7.36                    pl526_1    bioconda
perl-file-listing         6.04                    pl526_1    bioconda
perl-file-path            2.16                    pl526_0    bioconda
perl-file-temp            0.2304                  pl526_2    bioconda
perl-html-parser          3.72            pl526h6bb024c_5    bioconda
perl-html-tagset          3.20                    pl526_3    bioconda
perl-html-tree            5.07                    pl526_1    bioconda
perl-http-cookies         6.04                    pl526_0    bioconda
perl-http-daemon          6.01                    pl526_1    bioconda
perl-http-date            6.02                    pl526_3    bioconda
perl-http-message         6.18                    pl526_0    bioconda
perl-http-negotiate       6.01                    pl526_3    bioconda
perl-io-compress          2.087           pl526he1b5a44_0    bioconda
perl-io-html              1.001                   pl526_2    bioconda
perl-io-socket-ssl        2.066                   pl526_0    bioconda
perl-io-zlib              1.10                    pl526_2    bioconda
perl-json                 4.02                    pl526_0    bioconda
perl-json-xs              2.34            pl526h6bb024c_3    bioconda
perl-libwww-perl          6.39                    pl526_0    bioconda
perl-list-moreutils       0.428                   pl526_1    bioconda
perl-list-moreutils-xs    0.428                   pl526_0    bioconda
perl-lwp-mediatypes       6.04                    pl526_0    bioconda
perl-lwp-protocol-https   6.07                    pl526_4    bioconda
perl-mime-base64          3.15                    pl526_1    bioconda
perl-mozilla-ca           20180117                pl526_1    bioconda
perl-net-http             6.19                    pl526_0    bioconda
perl-net-ssleay           1.88            pl526h90d6eec_0    bioconda
perl-ntlm                 1.09                    pl526_4    bioconda
perl-parent               0.236                   pl526_1    bioconda
perl-pathtools            3.75            pl526h14c3975_1    bioconda
perl-scalar-list-utils    1.52            pl526h516909a_0    bioconda
perl-socket               2.027                   pl526_1    bioconda
perl-storable             3.15            pl526h14c3975_0    bioconda
perl-test-requiresinternet 0.05                    pl526_0    bioconda
perl-time-local           1.28                    pl526_1    bioconda
perl-try-tiny             0.30                    pl526_1    bioconda
perl-types-serialiser     1.0                     pl526_2    bioconda
perl-uri                  1.76                    pl526_0    bioconda
perl-www-robotrules       6.02                    pl526_3    bioconda
perl-xml-namespacesupport 1.12                    pl526_0    bioconda
perl-xml-parser           2.44_01         pl526ha1d75be_1002    conda-forge
perl-xml-sax              1.02                    pl526_0    bioconda
perl-xml-sax-base         1.09                    pl526_0    bioconda
perl-xml-sax-expat        0.51                    pl526_3    bioconda
perl-xml-simple           2.25                    pl526_1    bioconda
perl-xsloader             0.24                    pl526_0    bioconda
phylophlan                3.0                        py_7    bioconda
pillow                    7.2.0            py37h718be6c_1    conda-forge
pip                       20.2.2                   py37_0
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pysam                     0.16.0.1         py37hc334e0b_1    bioconda
pysocks                   1.7.1            py37hc8dfbb8_1    conda-forge
python                    3.7.9                h7579374_0
python-dateutil           2.8.1                      py_0    conda-forge
python-lzo                1.12            py37h81344f2_1001    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2020.1             pyh9f0ad1d_0    conda-forge
raxml                     8.2.12               h516909a_2    bioconda
readline                  8.0                  h7b6447c_0
requests                  2.24.0             pyh9f0ad1d_0    conda-forge
samtools                  1.10                 h2e538c0_3    bioconda
scipy                     1.5.2            py37hb14ef9d_0    conda-forge
seaborn                   0.11.0                        0    conda-forge
seaborn-base              0.11.0                     py_0    conda-forge
setuptools                49.6.0                   py37_1
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.33.0               h62c20be_0
statsmodels               0.12.0           py37h8f50634_0    conda-forge
tbb                       2020.2               hc9558a2_0    conda-forge
tk                        8.6.10               hbc83047_0
tornado                   6.0.4            py37h8f50634_1    conda-forge
trimal                    1.4.1                hc9558a2_4    bioconda
urllib3                   1.25.10                    py_0    conda-forge
wheel                     0.35.1                     py_0
xz                        5.2.5                h7b6447c_0
zlib                      1.2.11               h7b6447c_3
zstd                      1.4.5                h6597ccf_2    conda-forge

Update:

here=$(pwd)
condapath=$(conda info --base)
zen='https://zenodo.org/record/3957592/files'
dbpa='envs/biobakery3/lib/python3.7/site-packages/metaphlan/metaphlan_databases'
cd ${condapath}/${dbpa}
wget ${zen}/mpa_latest
wget ${zen}/mpa_v30_CHOCOPhlAn_201901.md5
wget ${zen}/mpa_v30_CHOCOPhlAn_201901.tar
wget ${zen}/mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
tar -xvf mpa_v30_CHOCOPhlAn_201901.tar
bunzip2 *.bz2
bowtie2-build mpa_v30_CHOCOPhlAn_201901.fna mpa_v30_CHOCOPhlAn_201901
cd $here

# then remember -x switch with the bowtie2db switch
humann -i demo.fastq -o sample_results --metaphlan-options \
  "--bowtie2db ${condapath}/${dbpa} -x mpa_v30_CHOCOPhlAn_201901"

Hi,
You have to specify using --metaphlan-options "-x <dbname>"
the name of the desired database, only in this case if all the required files are present, it will not try to download the tar

To clarify, I did get it working with the updated command. As you said, forgot the -x. There was talk before of including zenodo as a backup if dropbox failed. Is that still a potential feature? Or hosting it via huttenhower.sph.harvard.edu ?

Yes, we implemented the possibility to fetch the database from Zenodo if the download from Dropbox fails. We’ll look for hosting the database on our servers in order to avoid such problems.

It’d be great if this pattern were easy to find in the humann documentation. Using these options seems to be required when using conda to prevent metaphlann from installing databases to the conda env. I’m running humann3 via Snakemake and using conda environments.

Hi you all, I’m having this issue using singularity. Although I provide the path to the database with --nucleotide-database that I have in a different folder, it fails with this error:
ERROR: Unable to create folder for database install: /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases
Could it be that MetaPhlAn tries to make an index or something like that?
I am running this in a cluster where I am unable to install HUMAnN and make it work correctly because of MetaPhlAn, that is why we were trying with singularity, but again we have problems.
Do you know why this is happening? I’ve tried possible solutions proposed here but without successful results.
Many thanks in advanced.