HumanN: which reference database? why so many ummaped reads?

luciagg · June 25, 2021, 11:47am

Hello everyone,

I’m getting into the use of HumanN and as I understand, HumanN can be used with different reference databases: Unifref90, Uniref50, KEGG.

I would like to ask, which one is more recommended?

Until now, I have tried this commands:

humann2_databases --download chocophlan full /home/noe/Desktop/Shotgun_Lucia/humann2database/chocophlan

*parallel -j 1 ‘humann2 --metaphlan-options “–bt2_ps very-sensitive-local --min_alignment_len 50” --eta --threads 12 --input {} --output humann2_out/{/.} --memory-use maximum’ ::: fastq

And obtained a very high proportion of unmapped reads (68%), I attached the results.
humann2_pathabundance_relab_unstratified.txt (15.0 KB)
Is it normal or could it be due to an inappropriate reference database?

Thanks you in advance,

Lucia

franzosa · July 6, 2021, 5:37pm

I recommend UniRef90 for human-associated communities (or anything else that is comparably well studied) and UniRef50 for everything else.

Topic		Replies	Views
Uniref50s in humann2 output when using "--search-mode uniref90" HUMAnN	1	851	November 16, 2019
Running HUMAnN: pre-computed protein blastx M8 input HUMAnN	8	570	June 8, 2022
Compatibility of uniref50 with HUMAnN4 HUMAnN	1	94	April 11, 2025
High proportion of Unmapped Uniref90 reads (and very few KOs after regroup) HUMAnN	1	628	August 3, 2020
Protein database choose and low aligned rate in humann HUMAnN	13	541	September 5, 2023

HumanN: which reference database? why so many ummaped reads?

Related topics