I am trying to run humann on a chocophlan database which is compatible with metaphlan. I’ve read many different forum posts and have still not been able to find an answer to my issues. Please find a “workflow” of what I have tried so far, to see if you can find any glaring issues that may cause this to fail. Thanks.
humann v3.6
MetaPhlAn version 4.1.1 (11 Mar 2024)
Check which databases are available
humann_databases --available
HUMAnN Databases ( database : build = location )
chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
(amongst others…)
Databases acquired using the following:
humann_databases --download chocophlan full /path/to/databases
Humann_databases --download uniref uniref90_diamond
/path/to/databases
humann_databases --download utility_mapping full /path/to/databases
Notably - the resulting folder with the chocophlan database has many .tar.gz files (12774), in the format e.g.:
g__{misc}.centroids.v201901_v31.ffn.gz
Meaning the full_chocophlan.v201901_v31.tar.gz has been extracted, this has been forced and was not a decision of mine.
Updated the humann_config
, using the following format for protein, nucleotide and utility_mapping:
humann_config --update database_folders protein /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/uniref/
Resulting in:
HUMAnN Configuration:
database_folders : nucleotide = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/chocophlan
database_folders : protein = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/uniref
database_folders : utility_mapping = /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/databases/utility/utility_mapping
humann_test → runs smoothly
humann -i demo.fastq -o test_dir/ → fails
First test code: - using v201901_v31 as index
humann \
--input humann/merged_paired_ends/$1.fastq.gz \
--output humann/results/$1/ \
--bowtie-options '--threads 8' \
--metaphlan-options '--bowtie2db databases/chocophlan/ --index v201901_v31'
Resulting error code
Running metaphlan ........
CRITICAL ERROR: Error executing: /home/rb979/micromamba/envs/pip-humann/bin/metaphlan /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/SC03017-777_humann_temp/tmp6q40rbrz/tmpre7ow2hk --bowtie2db databases/chocophlan/ --index v201901_v31 -o /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/SC03017-777_humann_temp/SC03017-777_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/SC03017-777_humann_temp/SC03017-777_metaphlan_bowtie2.txt
Error message returned from metaphlan :
Error: Unable to find the mpa_pkl file at: mpa_pklExiting...
Second test code: - (wishfully) using a different format of the index mpa_v31_CHOCOPhlAn_201901
humann \
--input humann/merged_paired_ends/$1.fastq.gz \
--output humann/results/$1/ \
--bowtie-options '--threads 8' \
--metaphlan-options '--bowtie2db databases/chocophlan/ --index mpa_v31_CHOCOPhlAn_201901'
Resulting error code
CRITICAL ERROR: Error executing: /home/rb979/micromamba/envs/pip-humann/bin/metaphlan /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/v31/SC03017-777_humann_temp/tmpjr961hzp/tmp_tre3q2d --bowtie2db databases/chocophlan/ --index mpa_v31_CHOCOPhlAn_201901 -o /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/v31/SC03017-777_humann_temp/SC03017-777_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /rds/project/rds-XUr6B1Jhndg/rb979_Microbiome/Metagenomics/metaphlan/humann/results/SC03017-777/v31/SC03017-777_humann_temp/SC03017-777_metaphlan_bowtie2.txt
Error message returned from metaphlan :
Downloading MetaPhlAn database
Please note due to the size this might take a few minutes
\Downloading and uncompressing indexes
Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_v31_CHOCOPhlAn_201901_bt2.tar
Warning: Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_v31_CHOCOPhlAn_201901_bt2.tar
Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_v31_CHOCOPhlAn_201901_bt2.md5
Warning: Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_v31_CHOCOPhlAn_201901_bt2.md5
File "databases/chocophlan/mpa_v31_CHOCOPhlAn_201901_bt2.md5" not found!
File "databases/chocophlan/mpa_v31_CHOCOPhlAn_201901_bt2.tar" not found!
MD5 checksums not found, something went wrong!
When I try to run humann on a database that Metaphlan works on e.g: mpa_vOct22_CHOCOPhlAnSGB_202403
If I were to use that database it would error out:
**CRITICAL ERROR: The directory provided for ChocoPhlAn contains files (mpa_vOct22_CHOCOPhlAnSGB_202403) that are not of the expected version. Please install the latest version of the database: v201901_v31**
This happens with other databases too: mpa_vJun23_CHOCOPhlAnSGB_202403
I find that sometimes within the error code, it forces the download of the most recent chocophlan database into my human environment library, in a metaphlan_databases folder, e.g.:
Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vJun23_CHOCOPhlAnSGB_202403_bt2.tar
This is despite specifying which databases to use, both in the humann_config and in the submission code, notably that is a database which has before been rejected as the wrong one.
I would really appreciate any and all advice/guidance on the above errors and what you think I could try next.
Thanks!