uniprotID mapping to pangenome genes

Nicolai_Karcher · May 18, 2020, 7:52am

Hey everyone,

thanks for humann2, it’s very easy to use and super useful!

I have a question regarding the species-resolved gene abundance estimates humann2 provides: How do you provide the uniprotID mapping/association for genes in the species’ pangenomes? Is it a simple best hit approach?

Edit: I actually have another question: here, you provide a uniprotID <-> KO mapping for uniref90 and uniref50. How did you obtain those? Did you take that mapping directly from the uniprot metadata?

Thanks a lot for your help!

Cheers,
Nic

franzosa · May 18, 2020, 1:01pm

The mappings are best-hit subject to the alignment and coverage constraints used by UniRef. I.e. UniRef90 assignments are the best hit to UniRef90 with >=90% identity and >=80% coverage (likewise for UniRef50 but requiring >=50% identity).

And correct - we take the functional annotations directly from UniProt. If you consider a UniRef like UniRef90_A0A001 from the current release, you can drop the prefix to get a UniProt entry (A0A001) and then view its raw text entry in UniProt with the following link:

https://www.uniprot.org/uniprot/A0A001.txt

We parse functional annotations from the DR (database cross-reference) fields.

Nicolai_Karcher · May 18, 2020, 1:09pm

Thanks a lot for the info

Topic		Replies	Views
No UniRef90 IDs from Humann3 have information in UniProfKB site? HUMAnN	2	511	September 18, 2020
Count of individual genes from ChocoPhLan database rather than UniRef gene family based RPK HUMAnN	2	466	January 8, 2021
Low number of EC IDs mapped from gene families in HUMANn3 HUMAnN	4	765	October 5, 2020
Custom UniRef90 database with Humann3 HUMAnN	4	802	March 15, 2021
Mapping KOs to Uniref90 in humann3 HUMAnN	1	160	June 20, 2024

uniprotID mapping to pangenome genes

Related topics