CRITICAL ERROR: The directory provided for ChocoPhlAn does not contain files of the expected format (ie '^[g__][s__]')

hi Franzosa,
I downloaded the newest database of ChocoPhlan and installed humann3. When I run humann as following, some error occured:

(py37) [wenping@localhost data]$ humann --input /data/liying_metagenome/clean_data_ly/SRR5130527.fastq --output try_humann --bypass-translated-search
Creating output directory: /data/try_humann
Output files will be written to: /data/try_humann
Removing spaces from identifiers in input file …

CRITICAL ERROR: The directory provided for ChocoPhlAn does not contain files of the expected format (ie ‘^[g__][s__]’).

can you know how to fix the error?

wenping

Hi Wenping,

It looks like the location set in your config for the nucleotide database does not contain files of the expected format. Would you run the humann_config command? It will print the folder that you have set to contain the directory for the ChocoPhlAn files (the nucleotide database). Those files should be named something with “g__” (genus) and “s__” (species) in the name. If the folder location needs to change you can also use the humann_config command or set the new folder location when running HUMAnN with --nucleotide-database <NEW_LOCATION>.

# print the current config settings
$ humann_config --print
HUMAnN Configuration ( Section : Name = Value )
output_format : remove_stratified_output = False
output_format : output_max_decimals = 10
alignment_settings : prescreen_threshold = 0.01
alignment_settings : evalue_threshold = 1.0
alignment_settings : identity_threshold = 50.0
database_folders : nucleotide = data/chocophlan_DEMO
# update the config setting
$ humann_config --update database_folders nucleotide $NEW_LOCATION 

Thanks!
Lauren

Hi Lauren,
It works.
Thank you very much!
Wenping

In regards to this error:

CRITICAL ERROR: The directory provided for ChocoPhlAn does not contain files of the expected format (ie '^[g__][s__]').

full_chocophlan.v201901_v31.tar.gz contains the following file: alaS.centroids.v201901_v31.ffn.gz, which in not formatted like all of the other 12772 genome files in the database.

Should alaS.centroids.v201901_v31.ffn.gz be removed from full_chocophlan.v201901_v31.tar.gz, or is the ^[g__][s__] format not actually needed for all genome files?