Humann database errors (a short novel)

I am trying to run humann on a chocophlan database which is compatible with metaphlan. I’ve read many different forum posts and have still not been able to find an answer to my issues. Please find a “workflow” of what I have tried so far, to see if you can find any glaring issues that may cause this to fail. Thanks.

humann v3.6
MetaPhlAn version 4.1.1 (11 Mar 2024)

Check which databases are available

humann_databases --available

HUMAnN Databases ( database : build = location )
chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz

(amongst others…)

Databases acquired using the following:

humann_databases --download chocophlan full /path/to/databases
Humann_databases --download uniref uniref90_diamond
/path/to/databases
humann_databases --download utility_mapping full /path/to/databases

Notably - the resulting folder with the chocophlan database has many .tar.gz files (12774), in the format e.g.:
g__{misc}.centroids.v201901_v31.ffn.gz
Meaning the full_chocophlan.v201901_v31.tar.gz has been extracted, this has been forced and was not a decision of mine.

Updated the humann_config, using the following format for protein, nucleotide and utility_mapping:
humann_config --update database_folders protein /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/uniref/

Resulting in:

HUMAnN Configuration:
database_folders : nucleotide = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/chocophlan
database_folders : protein = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/uniref
database_folders : utility_mapping = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/utility/utility_mapping

humann_test → runs smoothly
humann -i demo.fastq -o test_dir/ → fails

First test code: - using v201901_v31 as index

humann \
--input humann/merged_paired_ends/$1.fastq.gz \
--output humann/results/$1/ \
--bowtie-options '--threads 8' \
--metaphlan-options '--bowtie2db databases/chocophlan/ --index v201901_v31'

Resulting error code

Running metaphlan ........


CRITICAL ERROR: Error executing: /home/rb979/micromamba/envs/pip-humann/bin/metaphlan /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/SC03017-777_humann_temp/tmp6q40rbrz/tmpre7ow2hk --bowtie2db databases/chocophlan/ --index v201901_v31 -o /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/SC03017-777_humann_temp/SC03017-777_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/SC03017-777_humann_temp/SC03017-777_metaphlan_bowtie2.txt

Error message returned from metaphlan :
Error: Unable to find the mpa_pkl file at: mpa_pklExiting...

Second test code: - (wishfully) using a different format of the index mpa_v31_CHOCOPhlAn_201901

humann \
--input humann/merged_paired_ends/$1.fastq.gz \
--output humann/results/$1/ \
--bowtie-options '--threads 8' \
--metaphlan-options '--bowtie2db databases/chocophlan/ --index mpa_v31_CHOCOPhlAn_201901'

Resulting error code

CRITICAL ERROR: Error executing: /home/rb979/micromamba/envs/pip-humann/bin/metaphlan /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/v31/SC03017-777_humann_temp/tmpjr961hzp/tmp_tre3q2d --bowtie2db databases/chocophlan/ --index mpa_v31_CHOCOPhlAn_201901 -o /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/v31/SC03017-777_humann_temp/SC03017-777_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/v31/SC03017-777_humann_temp/SC03017-777_metaphlan_bowtie2.txt

Error message returned from metaphlan :

Downloading MetaPhlAn database

Please note due to the size this might take a few minutes

\Downloading and uncompressing indexes

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_v31_CHOCOPhlAn_201901_bt2.tar

Warning: Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_v31_CHOCOPhlAn_201901_bt2.tar

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_v31_CHOCOPhlAn_201901_bt2.md5

Warning: Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_v31_CHOCOPhlAn_201901_bt2.md5

File "databases/chocophlan/mpa_v31_CHOCOPhlAn_201901_bt2.md5" not found!

File "databases/chocophlan/mpa_v31_CHOCOPhlAn_201901_bt2.tar" not found!

MD5 checksums not found, something went wrong!

When I try to run humann on a database that Metaphlan works on e.g: mpa_vOct22_CHOCOPhlAnSGB_202403

If I were to use that database it would error out:
**CRITICAL ERROR: The directory provided for ChocoPhlAn contains files (mpa_vOct22_CHOCOPhlAnSGB_202403) that are not of the expected version. Please install the latest version of the database: v201901_v31**

This happens with other databases too: mpa_vJun23_CHOCOPhlAnSGB_202403

I find that sometimes within the error code, it forces the download of the most recent chocophlan database into my human environment library, in a metaphlan_databases folder, e.g.:

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vJun23_CHOCOPhlAnSGB_202403_bt2.tar

This is despite specifying which databases to use, both in the humann_config and in the submission code, notably that is a database which has before been rejected as the wrong one.


I would really appreciate any and all advice/guidance on the above errors and what you think I could try next.
Thanks!

1 Like

There’s a lot going on here. From the first part, it looks like you’re using a very old HUMAnN (v3.6). HUMAnN 3.5-3.9 were released for compatibility with different versions of MetaPhlAn 4, and they have to be paired with the right version. If you want to stick with HUMAnN 3, I would use v3.9 and install alongside MetaPhlAn following these release notes:

The other option is to upgrade to HUMAnN 4.0 alpha:

Which is the latest release and works with MetaPhlAn 4 using its mpa_vOct22_CHOCOPhlAnSGB_202403 index.

In general, when debugging MetaPhlAn-HUMAnN communication, it’s helpful to make sure you can analyze your sample / a demo file with MetaPhlAn outside of HUMAnN, as this can sometimes reveal installation / runtime issues that are not related to the HUMAnN run itself.

Thank you for your reply and thoughts/comments.
Please find the outcome of your suggestions.

I used pip to first install human 4.0 alpha
pip install humann==4.0.0a1 --no-binary :all:

Confirmed the versions
humann --version
humann v4.0.0.alpha.1

metaphlan --version
MetaPhlAn version 4.1.1 (11 Mar 2024)

Databases: humann_config

HUMAnN Configuration ( Section : Name = Value )

database_folders : nucleotide = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/chocophlan_mpa_vOct22_CHOCOPhlAnSGB_202403/

database_folders : protein = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/uniref/

database_folders : utility_mapping = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/utility/utility_mapping/

Notably the chocophlan folder looks like this:

-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers 3978538203 Aug 29  2024 mpa_vOct22_CHOCOPhlAnSGB_202403.1.bt2l
-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers 5311606700 Aug 29  2024 mpa_vOct22_CHOCOPhlAnSGB_202403.2.bt2l
-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers  100610041 Aug 29  2024 mpa_vOct22_CHOCOPhlAnSGB_202403.3.bt2l
-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers 2655803348 Aug 29  2024 mpa_vOct22_CHOCOPhlAnSGB_202403.4.bt2l
-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers 2951357081 Aug 29  2024 mpa_vOct22_CHOCOPhlAnSGB_202403.fna.bz2
-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers   65502658 Apr  4  2024 mpa_vOct22_CHOCOPhlAnSGB_202403.pkl
-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers 3978538203 Aug 29  2024 mpa_vOct22_CHOCOPhlAnSGB_202403.rev.1.bt2l
-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers 5311606700 Aug 29  2024 mpa_vOct22_CHOCOPhlAnSGB_202403.rev.2.bt2l
-rw-rw-r--+ 1 rb979 rds-XUr6B1Jhndg-managers      44092 Feb 22  2023 mpa_vOct22_CHOCOPhlAnSGB_202403_VINFO.csv
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers  881484571 Feb 12 15:54 mpa_vOct22_CHOCOPhlAnSGB_202403_VSG.fna

So, the humann_test, works well, however it created a document called “_3_reactions”, which is a new development.
Ran 176 tests in 75.820s

# Reaction HUMAnN v HUMAnN_test

UNMAPPED 100.0000000
UNGROUPED 1912.1816208
UNGROUPED|g__Bacteroides.s__Bacteroides_thetaiotaomicron 943.6975327
UNGROUPED|g__Bacteroides.s__Bacteroides_stercoris 794.9003455

You mentioned trying to run metaphlan alone.. so I did the following:

metaphlan ../sg_metagenomics_Boston24/sg_raw_data_to_deposit/$1_1.fastq.gz,../sg_metagenomics_Boston24/sg_raw_data_to_deposit/$1_2.fastq.gz \
--input_type fastq \
--unclassified_estimation \
--add_viruses \
--index mpa_vOct22_CHOCOPhlAnSGB_202403 \
--bowtie2db databases/chocophlan_mpa_vOct22_CHOCOPhlAnSGB_202403 \
--bowtie2out bowtie_outputs_humann/$1.bt2.bz2 \
-o metaphlan_results/metaphlan_mpa_vOct22_CHOCOPhlAnSGB_202403/$1/april-profiled_metagenome_$1.txt \
--nproc 8

And this worked successfully.

The problems arise with running humann..

humann \
--input humann/merged_paired_ends/$1.fastq.gz \
--output humann/results/$1/ \
--bowtie-options '--threads 8' \
--metaphlan-options '--bowtie2db databases/chocophlan_mpa_vOct22_CHOCOPhlAnSGB_202403 --index mpa_vOct22_CHOCOPhlAnSGB_202403'

Which results in the error:

Output files will be written to: /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results
Decompressing gzipped file ...
Removing spaces from identifiers in input file ...

CRITICAL ERROR: The directory provided for ChocoPhlAn does not contain files of the expected format (ie '^SGB').

Please can you advise on what may be the problem from here?

Wanted to chime in that I am having a similar database issue, while working in a conda environment with python3.12.9. Would like to use Humann to annotate a genome. Yesterday I went through the google docs version of downloading humann4, and got down to the downloading new databases step. As far as I could tell, installation with condo went fine, and I saw that my demo databases downloaded appropriately.

I suspect I’m having trouble with the links that would allow me to download these databases?

humann_databases --available

This showed that the databases were available, as author shows. Then I tried:

humann_databases --download chocophlan full /path/to/databases

which results in a critical error: CRITICAL ERROR: Unable to download and extract from URL: http://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan.v4_alpha.tar.gz

To continue diagnosing the problem, I tried:

 wget http://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan.v4_alpha.tar.gz

Which gave me:
–2025-04-11 10:56:16-- http://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan.v4_alpha.tar.gz
Resolving huttenhower.sph.harvard.edu (huttenhower.sph.harvard.edu)… 199.94.60.28
Connecting to huttenhower.sph.harvard.edu (huttenhower.sph.harvard.edu)|199.94.60.28|:80… connected.
HTTP request sent, awaiting response… 302 Found
Location: https://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan.v4_alpha.tar.gz [following]
–2025-04-11 10:56:17-- https://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan.v4_alpha.tar.gz
Connecting to huttenhower.sph.harvard.edu (huttenhower.sph.harvard.edu)|199.94.60.28|:443… connected.
Unable to establish SSL connection.

I also tried curl:

curl -O https://huttenhower.sph.harvard.edu/humann_data/chocophlan/chocophlan.v4_alpha.tar.gz

Which outputted this:

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:–:-- 0:05:00 --:–:-- 0
curl: (28) Connection timed out after 300396 milliseconds

I’m not necessarily an expert with computers, but is it safe to assume these download links are faulty?

2 Likes

I’d like to address the second option in this, which was to use humann 3.9 with metaphlan 4.1.

> pip install humann==3.9
Collecting humann==3.9
  Using cached humann-3.9-py3-none-any.whl
Installing collected packages: humann
  Attempting uninstall: humann
    Found existing installation: humann 4.0.0a1
    Uninstalling humann-4.0.0a1:
      Successfully uninstalled humann-4.0.0a1
Successfully installed humann-3.9

> humann --version
humann v3.9

> metaphlan --version
MetaPhlAn version 4.1.1 (11 Mar 2024)

So you mentioned the June 2023 chocophlan was the correct database, so I stored that into the humann_config

> humann_config --update database_folders nucleotide /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/chocophlan_mpa_vJun23_CHOCOPhlAnSGB_202403/

HUMAnN configuration file updated: database_folders : nucleotide = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/chocophlan_mpa_vJun23_CHOCOPhlAnSGB_202403/

But when I run the humann code:

> humann --input humann/merged_paired_ends/sample1.fastq.gz --output humann/results/sample1/ --bowtie-options '--threads 8' --metaphlan-options '--bowtie2db databases/chocophlan_mpa_vJun23_CHOCOPhlAnSGB_202403/ --index vJun23_CHOCOPhlAnSGB_202403'
Output files will be written to: /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/sample1 Decompressing gzipped file ...

Removing spaces from identifiers in input file ...

CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( mpa_vJun23_CHOCOPhlAnSGB_202403.pkl ) that are not of the expected version. Please install the latest version of the database: v201901_v31

So I have tried humann v4.0.a1 with the vOct22 and humann v3.9 with vJune23 as encouraged, and neither work. Please advise.

That error looks to be due to the fact that you have a MetaPhlAn file (the PKL file) saved in the ChocoPhlAn folder. The ChocoPhlAn folder should only contain the species pangenomes.

Thank you for your reply.

To confirm the setup:

(biobakery4) [rb979@login-q-4 metaphlan]$ humann --version
humann v3.9
(biobakery4) [rb979@login-q-4 metaphlan]$ metaphlan --version
humann_configMetaPhlAn version 4.1.1 (11 Mar 2024)
(biobakery4) [rb979@login-q-4 metaphlan]$ humann_config
HUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/chocophlan_mpa_vJun23_CHOCOPhlAnSGB_202403/
database_folders : protein = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/uniref/
database_folders : utility_mapping = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/utility_mapping/
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False

Having edited the database folder to look like this..

(biobakery4) [rb979@login-q-4 metaphlan]$ ll 
databases/chocophlan_mpa_vJun23_CHOCOPhlAnSGB_202403/
total 23311904
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers 4389008193 Jan 28 15:48 mpa_vJun23_CHOCOPhlAnSGB_202403.1.bt2l
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers 5634079508 Jan 28 15:48 mpa_vJun23_CHOCOPhlAnSGB_202403.2.bt2l
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers 126529023 Jan 28 15:48 mpa_vJun23_CHOCOPhlAnSGB_202403.3.bt2l
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers 2817039751 Jan 28 15:48 mpa_vJun23_CHOCOPhlAnSGB_202403.4.bt2l
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers 4389008193 Jan 28 15:48 mpa_vJun23_CHOCOPhlAnSGB_202403.rev.1.bt2l
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers 5634079508 Jan 28 15:48 mpa_vJun23_CHOCOPhlAnSGB_202403.rev.2.bt2l
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers 44092 Jan 28 15:48 mpa_vJun23_CHOCOPhlAnSGB_202403_VINFO.csv
-rw-rw----+ 1 rb979 rds-XUr6B1Jhndg-managers 881484571 Jan 28 15:48 mpa_vJun23_CHOCOPhlAnSGB_202403_VSG.fna

I still get an error code:

CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( mpa_vJun23_CHOCOPhlAnSGB_202403.1.bt2l ) that are not of the expected version. Please install the latest version of the database: v201901_v31

Please can you advise from here - thanks in advance!

1 Like

I’m having the exact same error. I can’t download the database files at all. Using Humann4 alpha version installed with pip.