Hey everyone,
thanks for humann2, it’s very easy to use and super useful!
I have a question regarding the species-resolved gene abundance estimates humann2 provides: How do you provide the uniprotID mapping/association for genes in the species’ pangenomes? Is it a simple best hit approach?
Edit: I actually have another question: here, you provide a uniprotID <-> KO mapping for uniref90 and uniref50. How did you obtain those? Did you take that mapping directly from the uniprot metadata?
Thanks a lot for your help!
Cheers,
Nic
The mappings are best-hit subject to the alignment and coverage constraints used by UniRef. I.e. UniRef90 assignments are the best hit to UniRef90 with >=90% identity and >=80% coverage (likewise for UniRef50 but requiring >=50% identity).
And correct - we take the functional annotations directly from UniProt. If you consider a UniRef like UniRef90_A0A001 from the current release, you can drop the prefix to get a UniProt entry (A0A001) and then view its raw text entry in UniProt with the following link:
https://www.uniprot.org/uniprot/A0A001.txt
We parse functional annotations from the DR
(database cross-reference) fields.
Thanks a lot for the info