Uniref50s in humann2 output when using "--search-mode uniref90"

kescobo · November 16, 2019, 3:24pm

I am running a number of shotgun metagenomic samples through a custom pipeline that includes humann2 - the command for the primary humann2 run is

humann2 --input ${sample}.fastq.gz --taxonomic-profile ${sample}_profile.tsv --output $humann2_output --threads 8 --remove-temp-output --search-mode uniref90 --output-basename $sample

I have other steps in the workflow to regroup and rename, but in all of my initial ${sample}_genefamilies.tsv outputs, I have quite a few UniRef50 rows, including some that have names (eg UniRef50_K1TBF9: Transposase (Fragment) 1059.7162421954).

A typical file has ~400k rows, ~50k of which are uniref50s and half of the uniref50s have names.

These persist through humann2_renorm_table, and then when I do humann2_rename_table (expecting uniref90s), they’re all converted to eg UniRef50_K1TBF9: NO_NAME.

Is there something about my database configuration that might be causing this?

$ humann2_config
HUMAnN2 Configuration ( Section : Name = Value )
output_format : remove_stratified_output = False
output_format : output_max_decimals = 10
output_format : remove_column_description_output = False
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : evalue_threshold = 1.0
alignment_settings : translated_subject_coverage_threshold = 50.0
database_folders : utility_mapping = /pool001/vklepacc/databases/utility_mapping/
database_folders : protein = /pool001/vklepacc/databases/uniref/
database_folders : nucleotide = /pool001/vklepacc/databases/chocophlan/
run_modes : bypass_nucleotide_search = False
run_modes : verbose = False
run_modes : resume = False
run_modes : bypass_translated_search = False
run_modes : bypass_nucleotide_index = False
run_modes : threads = 1
run_modes : bypass_prescreen = False

$ humann2_databases
HUMANnN2 Databases ( database : build = location )
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann2_data/full_mapping_1_1.tar.gz
chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann2_data/chocophlan/DEMO_chocophlan.v0.1.1.tar.gz
chocophlan : full = http://huttenhower.sph.harvard.edu/humann2_data/chocophlan/full_chocophlan_plus_viral.v0.1.1.tar.gz
uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref90_DEMO_diamond.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref90_annotated_1_1.tar.gz
uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_1_1.tar.gz
uniref : uniref50_GO_filtered_rapsearch2 = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref50_GO_filtered/uniref50_GO_filtered_rapsearch2.tar.gz
uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref50_annotated_1_1.tar.gz
uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_1_1.tar.gz

Thanks!

kescobo · November 16, 2019, 6:12pm

This thread from the old google group seems to have the answer:

If pointed at a folder with more than one database, HUMAnN2 will perform separate searches against each one and merge the results (this allows a user to break up a large database, e.g. one that is too big to fit in memory, and search it serially). That is what you are seeing here.

If you store the UniRef50 and UniRef90 databases in separate folders you will not have this issue. To tell HUMAnN2 which database to use, you can point you individual runs at a folder with the --protein-database flag (OR) you can configure a default translated search database with the humann2_config utility:

https://bitbucket.org/biobakery/humann2/wiki/Home#markdown-header-configuration

For example:
$ humann2_config --update database_folders protein $DIR
Would update the default protein database to $DIR.

Topic		Replies	Views
CRITICAL ERROR: translated database uniref50_20190 HUMAnN	4	780	November 8, 2021
Rename_table doesn't work with uniref90 HUMAnN	2	1290	August 3, 2020
Humann3 --search-mode problem HUMAnN	4	840	November 9, 2021
HumanN: which reference database? why so many ummaped reads? HUMAnN	1	390	July 6, 2021
Humann uses outdated uniref50 mapping file for mapping Uniref50 ids to names by default HUMAnN	2	564	January 24, 2023

Uniref50s in humann2 output when using "--search-mode uniref90"

Related topics