What sort of data are you referring to? Sequencing reads? HUMAnN profiles?
Yes, I have some sequencing data from the metagenome. I learned that UniRef90 and UniRef50 gene families are available to the following systems：
- MetaCyc Reactions
- KEGG Orthogroups (KOs)
- Pfam domains
- Level-4 enzyme commission (EC) categories
- EggNOG (including COGs)
- Gene Ontology (GO)
- Informative GO
But I don’t know how to map to other databases，such as ARDB database.
I see what you mean. We don’t have ARDB-UniRef maps among the utility mapping files, that’s true. I’m not sure if it’s something that UniProt catalogs internally? In any case, you can build a custom mapping file for the
regroup_table script by listing e.g. an ARDB familiy followed by a tab-delimited list of UniRefs, as in:
ARDB1<tab>UniRef1<tab>UniRef2<tab>UniRef3 ARDB2<tab>UniRef4<tab>UniRef5 ARDB3<tab>UniRef6<tab>UniRef7<tab>UniRef8 ...
You would need to find the ARDB-UniRef relationships in either the UniProt or ARDB database OR determine them by homology (e.g. aligning ARDB proteins to the HUMAnN UniRef database with DIAMOND, for exampple).
@wfgui , did you have any luck here? Working on something similar.