CRITICAL ERROR: The directory provided for the translated database contains files that are not of the expected version. Please install the latest version of the database: 201901b

Trying out this:

INPUT=veba_output/preprocess/S4/output/joined.fastq.gz
OUTPUT=test_output
DMND_DB_DIR=veba_output/misc/diamond_database/
NUM_THREADS=1

humann --input ${INPUT} --output ${OUTPUT} --protein-database ${DMND_DB_DIR} --threads ${NUM_THREADS} --bypass-nucleotide-search --input-format fastq.gz --translated-identity-threshold 50 --translated-query-coverage-threshold 80 --search-mode uniref50 --id-mapping  veba_output/misc/humann_uniref_annotations.tsv

but I’m getting this error:

CRITICAL ERROR: The directory provided for the translated database contains files ( all_proteins.faa.dmnd ) that are not of the expected version. Please install the latest version of the database: 201901b

Here’s my files:

(VEBA-profile_env) [jespinoz@exp-15-01 TestVEBA]$ ls -lhS veba_output/misc/diamond_database/
total 52M
-rw-rw---- 1 jespinoz jcl110 52M Oct 12 15:04 all_proteins.faa.dmnd
(VEBA-profile_env) [jespinoz@exp-15-01 TestVEBA]$ head veba_output/misc/humann_uniref_annotations.tsv
S1__NODE_166_length_33212_cov_10.244534_32964:33209(+)	UniRef50_Q41093	82	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_229_length_28347_cov_10.467447_5060:5751(-)	UniRef50_Q41093	196	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_298_length_24477_cov_10.374539_11070:11663(+)	UniRef50_Q41093	198	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_56_length_52126_cov_10.433043_49957:50562(+)	UniRef50_Q41093	202	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_56_length_52126_cov_10.433043_50725:52123(-)	UniRef50_Q41093	338	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_743_length_9370_cov_9.617391_5:550(+)	UniRef50_Q41093	182	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_14_length_79184_cov_10.324875_17207:17686(+)	UniRef50_A0A1Z5JXG7	160	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_646_length_11435_cov_10.486731_5428:5952(-)	UniRef50_A0A1Z5JXG7	175	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_674_length_10805_cov_9.362977_3661:4200(-)	UniRef50_A0A1Z5JXG7	180	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1
S1__NODE_8_length_86326_cov_10.138192_57293:57772(+)	UniRef50_A0A1Z5JXG7	160	c__Bacillariophyceae;o__Naviculales;f__Phaeodactylaceae;g__Phaeodactylum;s__Phaeodactylum tricornutum [CCAP 1055/1];t__S1__METABAT2__E.1__bin.1

Here’s my versions:

(VEBA-profile_env) [jespinoz@exp-15-01 TestVEBA]$ conda list | grep -E "humann|diamond"
diamond                   2.1.8                h43eeafb_0    bioconda
humann                    3.8                pyh7cba7a3_0    bioconda

You just have to add this suffix before the .dmnd 201901b. In my case, it was changing all_proteins.faa.dmnd to all_proteins.faa_201901b.dmnd

Issue is resolved. I don’t know how to update the issue

humann
–input “$input_fastq_dir”
–output “$output_dir”
–metaphlan “$metaphlan_database”
–nucleotide-database “$chocophlan_database”
–protein-database “$uniref_database”
–threads 8
–bypass-nucleotide-search
–input-format fastq
–search-mode uniref90
–log-level INFO
–output-basename sample_name
–output-format biom
Output files will be written to: /media/mo/New_Volume/HumanN_tot/output
Removing spaces from identifiers in input file …

sh: 0: getcwd() failed: Input/output error

CRITICAL ERROR: The directory provided for the translated database contains files ( uniref90.fasta.gz ) that are not of the expected version. Please install the latest version of the database: 201901b

The folders that contain extracted databases for use with HUMAnN cannot contain any other files. It looks like in this case you still have the compressed UniRef90 fasta file in the folder, so HUMAnN is complaining about its presence.

Hello, I figured out my error now. I downloaded the full Uniref90 from the original website from Uniprot and after that converted it to the Diamond format because it’s bigger (86 GB) than the Uniref 90 from HumanN itself (36.3 GB) but I got a lot of errors so I deleted it and download the Uniref 90 from HumanN itself (36.3 GB) and it works correctly I don’t know why !!?

We do some light reformatting on the UniRef database (e.g. simplifying the headers), so that might be why the raw download didn’t immediately work?

It’s not about the downloading process; it’s about the accuracy of the Uniref database itself. I thought if I downloaded it from the official website (Uniprot), it would be great because it’s a much bigger size (86 GB) than if I downloaded it from the tutorial.

Just downloading the latest UniRef database is not likely to work as 1) it has to be formatted for HUMAnN and 2) the rest of the files in HUMAnN’s ecosystem would need to be updated to understand the IDs in the new sequence database. You can download the latest versions of the database formatted for HUMAnN using the humann_databases utility script.

I will explain more first thing first I downloaded the Uniref90 from UniProt as a fasta file and after that converted it to Diamond extension using diamond makedb --in uniref90_201901b_fullt.fasta -d uniref90_201901b_full.dmnd now I have


and I run HUMAnN
humann -i “$input_fastq_dir” -o “$output_dir” --metaphlan “$metaphlan_database” --nucleotide-database “$chocophlan_database” --protein-database “$uniref_database” --search-mode uniref90 -v --metaphlan “$metaphlan_profile” --prescreen-threshold 0.01 --nucleotide-query-coverage-threshold 90.0 --nucleotide-subject-coverage-threshold 50.0 --evalue 1.0 --translated-identity-threshold 50.0 --translated-query-coverage-threshold 90.0 --translated-subject-coverage-threshold 50.0 --minpath on --pathways metacyc --annotation-gene-index 3 --log-level INFO --output-basename demo_m_samples --threads 8
Output files will be written to: /media/mo/New_Volume/HumanN_tot/output

Writing temp files to directory: /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp

File ( /media/mo/New_Volume/HumanN_tot/output/sample_name_humann_temp/tmpbm6io_rr/demo_m_samples.fastq ) is of format: fastq

Running metaphlan …

/home/mo/anaconda3/envs/HUMAnN/bin/metaphlan /media/mo/New_Volume/HumanN_tot/output/sample_name_humann_temp/tmpbm6io_rr/demo_m_samples.fastq -t rel_ab -o /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/demo_m_samples_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/demo_m_samples_metaphlan_bowtie2.txt --nproc 8

TIMESTAMP: Completed prescreen : 1294 seconds

Found t__SGB7967 : 97.10% of mapped reads ( s__Enterococcus_faecium,g__Enterococcus.s__Enterococcus_sp_BSD2780120874b_170522_B6,g__Enterococcus.s__Enterococcus_sp_HMSC035B04,g__Enterococcus.s__Enterococcus_sp_H57,g__Enterococcus.s__Enterococcus_sp_HMSC069A01,g__Enterococcus.s__Enterococcus_sp_HMSC063H10,g__Enterococcus.s__Enterococcus_sp_HMSC035C10,g__Enterococcus.s__Enterococcus_sp_GMD3E,g__Enterococcus.s__Enterococcus_sp_GMD4E,g__Enterococcus.s__Enterococcus_sp_GMD5E,g__Enterococcus.s__Enterococcus_sp_HMSC060D07,g__Enterococcus.s__Enterococcus_sp_GMD2E,g__Enterococcus.s__Enterococcus_sp_HMSC074F07,g__Enterococcus.s__Enterococcus_sp_HMSC34G12,g__Enterococcus.s__Enterococcus_sp_HMSC034B11,g__Enterococcus.s__Enterococcus_sp_GMD1E,g__Enterococcus.s__Enterococcus_sp_HMSC077E07,g__Enterococcus.s__Enterococcus_sp_HMSC070F12,g__Enterococcus.s__Enterococcus_sp_HMSC055G03,g__Enterococcus.s__Enterococcus_sp_HMSC058D07,g__Enterococcus.s__Enterococcus_sp_105332 )
Found t__SGB10115 : 2.24% of mapped reads ( s__Klebsiella_oxytoca,g__Klebsiella.s__Klebsiella_africana,g__Klebsiella.s__Klebsiella_sp_MS_92_3,g__Klebsiella.s__Klebsiella_sp_01_6622,g__Klebsiella.s__Klebsiella_sp_Kps,g__Klebsiella.s__Klebsiella_sp_K1,g__Klebsiella.s__Klebsiella_sp_KGM_IMP216,g__Klebsiella.s__Klebsiella_sp_28,g__Klebsiella.s__Klebsiella_sp_P1927,g__Klebsiella.s__Klebsiella_sp_HMSC16A12,g__Klebsiella.s__Klebsiella_sp_10,g__Klebsiella.s__Klebsiella_aerogenes,g__Klebsiella.s__Klebsiella_sp_4_1_44FAA,g__Klebsiella.s__Klebsiella_sp_AqSCr,g__Klebsiella.s__Klebsiella_sp_01_3681,g__Klebsiella.s__Klebsiella_sp_P1954,g__Klebsiella.s__Klebsiella_sp_K4,g__Klebsiella.s__Klebsiella_sp_KBG1 )
Found t__SGB8002 : 0.64% of mapped reads ( s__Streptococcus_thermophilus )

Total species selected from prescreen: 43

Selected species explain 99.99% of predicted community composition

Creating custom ChocoPhlAn database …

/usr/bin/gunzip -c /media/mo/New_Volume/HumanN_tot/chocophlan/g__Klebsiella.s__Klebsiella_aerogenes.centroids.v201901_v31.ffn.gz /media/mo/New_Volume/HumanN_tot/chocophlan/g__Klebsiella.s__Klebsiella_oxytoca.centroids.v201901_v31.ffn.gz /media/mo/New_Volume/HumanN_tot/chocophlan/g__Klebsiella.s__Klebsiella_pneumoniae.centroids.v201901_v31.ffn.gz /media/mo/New_Volume/HumanN_tot/chocophlan/g__Enterococcus.s__Enterococcus_faecium.centroids.v201901_v31.ffn.gz /media/mo/New_Volume/HumanN_tot/chocophlan/g__Streptococcus.s__Streptococcus_thermophilus.centroids.v201901_v31.ffn.gz

TIMESTAMP: Completed custom database creation : 9 seconds

Running bowtie2-build …

/home/mo/anaconda3/envs/HUMAnN/bin/bowtie2-build -f /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/demo_m_samples_custom_chocophlan_database.ffn /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/demo_m_samples_bowtie2_index

TIMESTAMP: Completed database index : 374 seconds

Running bowtie2 …

/home/mo/anaconda3/envs/HUMAnN/bin/bowtie2 -q -x /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/demo_m_samples_bowtie2_index -U /media/mo/New_Volume/HumanN_tot/output/sample_name_humann_temp/tmpbm6io_rr/demo_m_samples.fastq -S /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/demo_m_samples_bowtie2_aligned.sam -p 8 --very-sensitive

TIMESTAMP: Completed nucleotide alignment : 981 seconds

TIMESTAMP: Completed nucleotide alignment post-processing : 2053 seconds

Total bugs from nucleotide alignment: 5
g__Enterococcus.s__Enterococcus_faecium: 18887240 hits
g__Klebsiella.s__Klebsiella_pneumoniae: 993738 hits
g__Klebsiella.s__Klebsiella_oxytoca: 155895 hits
g__Streptococcus.s__Streptococcus_thermophilus: 97610 hits
g__Klebsiella.s__Klebsiella_aerogenes: 130875 hits

Total gene families from nucleotide alignment: 46095

Unaligned reads after nucleotide alignment: 24.9416395908 %

Running diamond …

Aligning to reference database: uniref90_201901b.fasta.gz.dmnd

/home/mo/anaconda3/envs/HUMAnN/bin/diamond blastx --query /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/demo_m_samples_bowtie2_unaligned.fa --evalue 1.0 --threads 8 --top 1 --outfmt 6 --db /media/mo/New_Volume/HumanN_tot/uniref90/uniref90_201901b.fasta.gz --out /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/tmpel7022_e/diamond_m8_r_p620bo --tmpdir /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/tmpel7022_e

CRITICAL ERROR: Error executing: /home/mo/anaconda3/envs/HUMAnN/bin/diamond blastx --query /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/demo_m_samples_bowtie2_unaligned.fa --evalue 1.0 --threads 8 --top 1 --outfmt 6 --db /media/mo/New_Volume/HumanN_tot/uniref90/uniref90_201901b.fasta.gz --out /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/tmpel7022_e/diamond_m8_r_p620bo --tmpdir /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/tmpel7022_e

Error message returned from diamond :
diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: Sensitive protein alignments at tree-of-life scale using DIAMOND | Nature Methods Nature Methods (2021)

#CPU threads: 8
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /media/mo/New_Volume/HumanN_tot/output/demo_m_samples_humann_temp/tmpel7022_e
Percentage range of top alignment score to report hits: 1
Opening the database…
Error opening file /media/mo/New_Volume/HumanN_tot/uniref90/uniref90_201901b.fasta.gz: No such file or directory