CRITICAL ERROR: The directory provided for ChocoPhlAn contains files that are not of the expected version

Hey all - I’m having some issues getting the databases to cooperate and I’m hoping someone can give me some advice :slight_smile:

I have the following version of humann and metaphlan installed in a conda environment:

humann v3.8
MetaPhlAn version 4.0.0 (22 Aug 2022)

I downloaded the databases using the following commands:

$ humann_databases --download uniref uniref90_diamond  /home/groups/C/kr/metaphlan_dbs/uniref90_diamond_db --update-config yes

$ humann_databases --download utility_mapping full /home/groups/C/kr/metaphlan_dbs/utility_mapping --update-config yes

$ humann_databases --download chocophlan full /home/groups/C/kr/metaphlan_dbs/chocophlan_db --update-config yes

And then updated config file to look like:

HUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /home/groups/C/kr/metaphlan_dbs/chocophlan_db/chocophlan
database_folders : protein = /home/groups/C/kr/metaphlan_dbs/uniref90_diamond_db/uniref
database_folders : utility_mapping = /home/groups/C/kr/metaphlan_dbs/utility_mapping/utility_mapping
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False```

I’m running humann with this command:

humann --input fastaseqs/CM00.fasta --output test_Results

And I get this error:

CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( g__Tyzzerella.s__Tyzzerella_sp.centroids.v296_v201901b.ffn.gz ) that are not of the expected version. Please install the latest version of the database: v201901_v31

I’m at a bit of a loss since I downloaded everything using the “humann_databases” command. I’ve done a ton of work trying to get the right versions and databases of everything to install! Any help would be greatly appreciated. I’m happy to provide additional details if needed!

If you had an older ChocoPhlAn downloaded and then downloaded the new one into the same location (such that you had a mix of pangenomes from the two versions in the same folder) that would cause this error. You can either split the files by version and point your HUMAnN at the newest version or - if you don’t need the old ones with v296_v201901b in the name - you could just delete them).

Ok! I appears all my files in the ChocoPlAn databse folder have that .v296_v201901b.ffn.gz suffix. I downloaded them using this command:

$ humann_databases --download chocophlan full /home/groups/C/kr/metaphlan_dbs/chocophlan_db --update-config yes

Is there a change I should make to that code in order to download the correct files?

Thanks for the quick reply :slight_smile:

My read is that the download worked correctly, you just downloaded to a location where you already had ChocoPhlAn files from a previous version. If you make a new destination folder (e.g. chocophlan_v31) and download there instead you should avoid this error. You can also just make that folder and move the v31 pangenomes you already downloaded into it, then run humann_config to point to the new location. That will avoid needing to download the files again.

It doesn’t look like the download actually provided any v31 files. I did a grep search over the file names in that folder to be certain, and it looks like they all have “v296”. Is there a way to specify version in the humann_databases command?

Ah ok, it’s possible that you’re seeing that error not because there is a mix of old and new pangenomes in the folder but because they are ALL old. The most recent pangenomes should have names like this:

g__Abditibacterium.s__Abditibacterium_utsteinense.centroids.v201901_v31.ffn.gz

If you run the humann_databases command from your v3.8 installation it should automatically pull those without having to specify a version. Just make sure to point the download at its own folder so you don’t end up with an old + new mix.