I performed metagenome analysis using humann3 with the full UniRef90 database (20.7GB). I found many UniRef100-IDs in the file analyzed with UniRef90 database. Why are UniRef100-IDs included in Uniref90 database? Do I mistake anywhere?
Do you mean 1) that you saw IDs that looked like
UniRef100_XYZ or 2) that you saw IDs that looked like
UniRef90_XYZ where XYZ is also a UniRef100 member? Note that UniRef90 is constructed by clustering UniRef100, so every UniRef90 representative will also be a UniRef100 representative.
Thank you for your comment.
For example, Uniref90_A0A015URD2 looks like Uniref100-IDs. Uniref90_A0A015URD2 is a member of cluster-UniRef90_C7X9K7. So, should UniRef90_C7X9K7 be used instead of Uniref90_A0A015URD2?
Sorry for missing this reply. There is still some confusion though, as UniRef90_X can’t be a member of UniRef90_Y: UniRef90s are non-overlapping clusters. The extension after the “UniRef90_” prefix is just a UniProtKB identifier, and those can vary in form depending on the source proteome from which the corresponding protein derives. For example, “A0A015URD2” and “C7X9K7” both look to me like UniProt (protein) IDs which could (in theory) have been selected as representatives for UniRef90 clusters.