UniRef90 or UniRef100 for building markers with ShortBRED-Identify?

ngocminhpham2601 · June 16, 2021, 6:19pm

Hello,
I am trying to build markers for antibiotic resistant genes using the updated CARD database. I see that there are different UniRef databases (50, 90, or 100). So, I wonder what would be the consequence of using one UniRef database over another. On the ShortBRED official page, you mention UniRef90, is this the background database you suggest for ShortBRED, if so, what are your reasoning for choosing this over UniRef100 or UniRef50? Would UniRef100 be a possible background reference database for ShortBRED-Identify?

sagunmaharjann · July 9, 2021, 4:41pm

Hi @ngocminhpham2601,

We choose UniRef90 as the default database for the ShortBRED because it performs better than the UniRef50 or the UniRef100. UniRef90 is built by clustering UniRef100 sequences such that each cluster is composed of sequences that have at least 90% sequence identity to, and 80% overlap with the longest sequence in the cluster (the seed sequence).

Regards,
Sagun

franzosa · July 9, 2021, 7:13pm

And to build on this answer a bit, part of why UniRef90 is performing better is that you’re mapping against a smaller sequence set (compared to UniRef100) without losing much resolution. Whereas UniRef50 (clustered at 50% AA identify to be even smaller) starts to lose some of the local homology we’re interested in finding with ShortBRED.

Topic		Replies	Views
Uniref90 database HUMAnN	3	564	June 28, 2022
Shortbred identify performance with CARD and Uniref90 ShortBRED	0	777	July 30, 2020
HumanN: which reference database? why so many ummaped reads? HUMAnN	1	408	July 6, 2021
ShortBRED Identify and CARD 2023 ShortBRED	2	363	March 29, 2026
How to annotate to the resistance gene database？ HUMAnN	4	750	December 17, 2020

UniRef90 or UniRef100 for building markers with ShortBRED-Identify?

Related topics