Edit: I figured out what was wrong. I managed to get it running in a clean conda environment. I was initially trying using a new singularity image I had built, but singularity used an old cached copy of an older version of the container based on the same singularity deffile. Problem solved!
Leaving the rest of the post here in case it helps someone else in the future. I still find the naming conventions of the HUMAnN databases confusing, but it appears it was not the databases that were the issue this time.
I’m having trouble running HUMAnN3 v3.7 with the latest database versions available.
Hoping someone can help me figure what I’m doing wrong. I have read all the other threads I have found about HUMAnN3 and database issues, but nothing seems directly similar to my problem.
When I run HUMAnN3 v3.7 using what I think are the latest databases:
I get the following error message:
CRITICAL ERROR: The directory provided for ChocoPhlAn contains files (g__candidate_division_Zixibacteria_unclassified.s__candidate_division_Zixibacteria_bacterium_HGW_Zixibacteria_1.centroids.v201901_v31.ffn.gz) that are not of the expected version. Please install the latest version of the database: 201901b
I thought chocophlan v201901_v31 was the latest?
The official database download tool
humann_databases lists the following:
HUMAnN Databases ( database : build = location ) chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/DEMO_chocophlan.v201901_v31.tar.gz uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref50_annotated_v201901b_full.tar.gz uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_201901b_subset.tar.gz uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901b_subset.tar.gz uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_DEMO_diamond_v201901b.tar.gz utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
Based on what I can see at the URL where the tool downloads its databases from, Index of /humann_data/chocophlan/, the version with the most recent date is actually
full_chocophlan.v201901_v31.tar.gz and not
full_chocophlan.v296_201901b.tar.gz (which is what I think the error message wants me to use?).What have I done wrong? Did I run with the wrong combination of databases?
v296_201901b actually the latest version, despite having a date listed at the download URL that is one year older than the
I see now in the release announcement for HUMAnN 3.6, it says::
You DO NOT need to re-download the latest (v3.1) pangenome database
I’m interpreting this to mean that
http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz is actually the latest ChocoPhlAn database to use, and not the 201901b variant the error message hinted at. Maybe updating that error message was overlooked in a recent release?
I am not sure how to proceed from here. I find version numbering of the HUMAnN database very confusing. Am I really supposed to be combining chocophlan database v201901_v31 with uniref 201901b?
I am currently trying to redownload the latest databases (
uniref90_annotated_v201901b_full.tar.gz) from the URLs listed above to see if there is something wrong with my local copies.