uniprotID mapping to pangenome genes

Hey everyone,

thanks for humann2, it’s very easy to use and super useful!

I have a question regarding the species-resolved gene abundance estimates humann2 provides: How do you provide the uniprotID mapping/association for genes in the species’ pangenomes? Is it a simple best hit approach?

Edit: I actually have another question: here, you provide a uniprotID <-> KO mapping for uniref90 and uniref50. How did you obtain those? Did you take that mapping directly from the uniprot metadata?

Thanks a lot for your help!

Cheers,
Nic

The mappings are best-hit subject to the alignment and coverage constraints used by UniRef. I.e. UniRef90 assignments are the best hit to UniRef90 with >=90% identity and >=80% coverage (likewise for UniRef50 but requiring >=50% identity).

And correct - we take the functional annotations directly from UniProt. If you consider a UniRef like UniRef90_A0A001 from the current release, you can drop the prefix to get a UniProt entry (A0A001) and then view its raw text entry in UniProt with the following link:

https://www.uniprot.org/uniprot/A0A001.txt

We parse functional annotations from the DR (database cross-reference) fields.

Thanks a lot for the info :slight_smile: