GitHub Issue: "Please install the latest version of the database: 201901b"

nickp60 · April 19, 2022, 1:44pm

I have run into this as well, and it looks like the bot closed this issue before it was addressed:

humann3 --protein-database database naming requirements are too restrictive

opened 07:31AM - 31 Jul 20 UTC

closed 01:10PM - 05 May 21 UTC

humann3 (3.0.0.alpha.3) requires that the `--protein-database` directory only co…ntain the `*.dmnd` named as it is looking for (eg., `uniref90_201901.dmnd`). If anything else is in the same directory, humann3 throws an error: ``` CRITICAL ERROR: The directory provided for the translated database contains files ( XXX ) that are not of the expected version. Please install the latest version of the database: 201901 ``` ...which seems completely unnecessary. Also the required file naming (eg., `201901`) don't take custom databases into account, which could have other names. conda info: ``` # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_llvm conda-forge bcbio-gff 0.6.6 py_0 bioconda biom-format 2.1.8 py37hc1659b7_0 conda-forge biopython 1.77 py37h8f50634_0 conda-forge blast 2.9.0 h20b68b9_1 bioconda boost 1.68.0 py37h8619c78_1001 conda-forge boost-cpp 1.68.0 h11c811c_1000 conda-forge bowtie2 2.4.1 py37h4ef193e_2 bioconda brotlipy 0.7.0 py37h8f50634_1000 conda-forge bx-python 0.8.9 py37h5266303_0 bioconda bzip2 1.0.8 h516909a_2 conda-forge ca-certificates 2020.6.20 hecda079_0 conda-forge certifi 2020.6.20 py37hc8dfbb8_0 conda-forge cffi 1.14.0 py37hd463f26_0 conda-forge chardet 3.0.4 py37hc8dfbb8_1006 conda-forge click 7.1.2 pyh9f0ad1d_0 conda-forge cmseq 1.0 pyh5ca1d4c_0 bioconda cryptography 2.9.2 py37hb09aad4_0 conda-forge cycler 0.10.0 py_2 conda-forge dendropy 4.4.0 py_1 bioconda diamond 0.9.24 ha888412_1 bioconda fasttree 2.1.10 h516909a_4 bioconda freetype 2.10.2 he06d7ca_0 conda-forge future 0.18.2 py37hc8dfbb8_1 conda-forge glpk 4.65 he80fd80_1002 conda-forge gmp 6.2.0 he1b5a44_2 conda-forge gnutls 3.6.13 h79a8f9a_0 conda-forge h5py 2.10.0 nompi_py37h90cd8ad_103 conda-forge hdf5 1.10.6 nompi_h3c11f04_100 conda-forge htslib 1.10.2 hd3b49d5_1 bioconda humann 3.0.0.alpha.3 py37h83b1523_0 biobakery icu 58.2 hf484d3e_1000 conda-forge idna 2.10 pyh9f0ad1d_0 conda-forge iqtree 2.0.3 h176a8bc_0 bioconda kiwisolver 1.2.0 py37h99015e2_0 conda-forge krb5 1.17.1 hfafb76e_1 conda-forge ld_impl_linux-64 2.34 h53a641e_5 conda-forge libblas 3.8.0 17_openblas conda-forge libcblas 3.8.0 17_openblas conda-forge libcurl 7.71.1 hcdd3856_0 conda-forge libdeflate 1.6 h516909a_0 conda-forge libedit 3.1.20191231 h46ee950_0 conda-forge libffi 3.2.1 he1b5a44_1007 conda-forge libgcc-ng 9.2.0 h24d8f2e_2 conda-forge libgfortran-ng 7.5.0 hdf63c60_6 conda-forge liblapack 3.8.0 17_openblas conda-forge libopenblas 0.3.10 h5ec1e0e_0 conda-forge libpng 1.6.37 hed695b0_1 conda-forge libssh2 1.9.0 hab1572f_2 conda-forge libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge llvm-openmp 10.0.0 hc9558a2_0 conda-forge lzo 2.10 h14c3975_1000 conda-forge mafft 7.471 h516909a_0 bioconda matplotlib-base 3.1.1 py37hfd891ef_0 conda-forge metaphlan 3.0.1 pyh5ca1d4c_0 bioconda muscle 3.8.1551 hc9558a2_5 bioconda ncurses 6.1 hf484d3e_1002 conda-forge nettle 3.4.1 h1bed415_1002 conda-forge numpy 1.18.5 py37h8960a57_0 conda-forge openssl 1.1.1g h516909a_0 conda-forge pandas 1.0.5 py37h0da4684_0 conda-forge patsy 0.5.1 py_0 conda-forge pcre 8.44 he1b5a44_0 conda-forge perl 5.26.2 h516909a_1006 conda-forge perl-archive-tar 2.32 pl526_0 bioconda perl-carp 1.38 pl526_3 bioconda perl-common-sense 3.74 pl526_2 bioconda perl-compress-raw-bzip2 2.087 pl526he1b5a44_0 bioconda perl-compress-raw-zlib 2.087 pl526hc9558a2_0 bioconda perl-exporter 5.72 pl526_1 bioconda perl-exporter-tiny 1.002001 pl526_0 bioconda perl-extutils-makemaker 7.36 pl526_1 bioconda perl-io-compress 2.087 pl526he1b5a44_0 bioconda perl-io-zlib 1.10 pl526_2 bioconda perl-json 4.02 pl526_0 bioconda perl-json-xs 2.34 pl526h6bb024c_3 bioconda perl-list-moreutils 0.428 pl526_1 bioconda perl-list-moreutils-xs 0.428 pl526_0 bioconda perl-pathtools 3.75 pl526h14c3975_1 bioconda perl-scalar-list-utils 1.52 pl526h516909a_0 bioconda perl-types-serialiser 1.0 pl526_2 bioconda perl-xsloader 0.24 pl526_0 bioconda phylophlan 3.0 py_5 bioconda pigz 2.3.4 hed695b0_1 conda-forge pip 20.1.1 py_1 conda-forge pycparser 2.20 pyh9f0ad1d_2 conda-forge pyopenssl 19.1.0 py_1 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pysam 0.16.0.1 py37hc501bad_0 bioconda pysocks 1.7.1 py37hc8dfbb8_1 conda-forge python 3.7.6 cpython_h8356626_6 conda-forge python-dateutil 2.8.1 py_0 conda-forge python-lzo 1.12 py37h81344f2_1001 conda-forge python_abi 3.7 1_cp37m conda-forge pytz 2020.1 pyh9f0ad1d_0 conda-forge raxml 8.2.12 h516909a_2 bioconda readline 8.0 hf8c457e_0 conda-forge requests 2.24.0 pyh9f0ad1d_0 conda-forge samtools 1.10 h9402c20_2 bioconda scipy 1.5.0 py37ha3d9a3c_0 conda-forge seaborn 0.10.1 1 conda-forge seaborn-base 0.10.1 py_1 conda-forge seqkit 0.13.0 0 bioconda setuptools 49.1.0 py37hc8dfbb8_0 conda-forge six 1.15.0 pyh9f0ad1d_0 conda-forge sqlite 3.32.3 hcee41ef_0 conda-forge statsmodels 0.11.1 py37h8f50634_2 conda-forge tbb 2020.1 hc9558a2_0 conda-forge tk 8.6.10 hed695b0_0 conda-forge tornado 6.0.4 py37h8f50634_1 conda-forge trimal 1.4.1 h6bb024c_3 bioconda urllib3 1.25.9 py_0 conda-forge wheel 0.34.2 py_1 conda-forge xz 5.2.5 h516909a_0 conda-forge zlib 1.2.11 h516909a_1006 conda-forge ```

sagunmaharjann · April 28, 2022, 4:17pm

Hi @nickp60,

(Edited for clarity and additional details)
Thank you for reaching out. Since the lab handles bugs and feature requests through the bioBakery forum instead of GitHub, issues there are always auto-closed, although you’re welcome (and encouraged) to share them here instead.

HUMAnN intentionally will not read multiple databases of different sources from the same directory, since it instead uses that functionality for other purposes. Specifically, HUMAnN can work with a protein database that’s been split into chunks for computational efficiency (mapping the reads to them serially). Therefore HUMAnN models a protein database as a folder with one or more database chunks in it (derived from the same input set of protein sequences).

So if you mix unrelated chunks in the same folder HUMAnN will produce an error. However, if you want to have two separate DBs, even if each is only represented by a single chunk, they can be used without problems in separate folders.

To avoid confusion in the future for other users, we now have updated the HUMAnN user manual with the same information as well.

Regards,
Sagun

cbeekman · May 9, 2022, 2:48pm

I also noticed the same issue. It is not an issue of having multiple incompatible protein databases in the same directory because the same error occurs even when my custom protein DB is moved into a new directory where it is the only file. It appears to simply be an issue with the file naming alone as when I add “201901b” to the end of the file name for my diamond database I do not get the error.

franzosa · June 28, 2022, 8:41pm

This is a corollary to the point above - HUMAnN software releases are tied to particular database versions and check for compatibility via file names at runtime. For databases that manifest as a folder of files (e.g. the pangenome dataset) we check that all the files in the folder have the expected version string.

Topic		Replies	Views
HUMAnN3 v3.7 database error "Please install the latest version of the database: 201901b" HUMAnN	1	978	June 23, 2023
Incompatibility of databases Data resource	2	573	October 12, 2023
Humann error database version HUMAnN	1	405	June 28, 2022
CRITICAL ERROR: The directory provided for ChocoPhlAn contains files that are not of the expected version HUMAnN	5	1301	August 18, 2023
Diamond UniRef90 database error HUMAnN	4	1510	July 27, 2022

GitHub Issue: "Please install the latest version of the database: 201901b"

Related topics