Help with understanding identifier mappings for HUMAnN?

I’m looking for the identifier mapping tables used in the backend for HUMAnN.

More specifically, the following if they are available:

UniRef50 → EC
UniRef50(or EC) → KEGG KO
EC → MetaCyc pathway
UniRef50 (EC or KO) → KEGG pathway

I found some files here: site-packages/humann/data/pathways/

I have a few questions:

  • Is it expected for one UniRef50 ID to map to more than one EC in some cases?
UniRef50_G6EMD2        {5.4.2.6, 2.7.1.41}
UniRef50_Q1J7L4        {5.4.2.6, 2.7.1.41}
UniRef50_T0UKK6        {5.4.2.6, 2.7.1.41}
UniRef50_X5NX36        {5.4.2.6, 2.7.1.41}

How is this handled in the backend? Does a UniRef50 hit for these count towards both or only one?

  • I got ID mappings between pathways and rxns from data/pathways/metacyc_pathways. Many of these rxns do not have ECs and are not present in data/pathways/metacyc_reactions_level4ec_only.uniref.bz2. For example, the following:
list(pwy_to_rxns["PWY-2681"])
# ['RXN-4308',
#  'RXN-4305',
#  'RXN-4317',
#  'RXN-4310',
#  'RXN-4306',
#  'RXN-4314',
#  'RXN-4303',
#  'RXN-4307',
#  'RXN-4313',
#  'RXN-4312',
#  'RXN-4304']

pd.Series(rxn_to_ec)[list(pwy_to_rxns["PWY-2681"])]
# RXN-4308             {}
# RXN-4305    {2.5.1.112}
# RXN-4317             {}
# RXN-4310             {}
# RXN-4306             {}
# RXN-4314             {}
# RXN-4303    {2.5.1.112}
# RXN-4307     {2.5.1.27}
# RXN-4313             {}
# RXN-4312             {}
# RXN-4304             {}

The following to confirm:

grep "RXN-4308" metacyc_reactions_level4ec_only.uniref

Are there supposed to be ECs associated with some of the rxns here since it’s in the pathway?

The total mapping that HUMAnN understands between UniRef90/50 and other features (e.g. ECs, KOs, Pfams, GO terms) is bundled in the “utility mapping database.” In general, a UniRef90/50 can belong to more than term within a particular functional annotation and “count for” both. In the case of ECs, that may be because one gene family is capable of performing two different catalytic reactions.

HUMAnN quantifies MetaCyc pathways by using ECs as a link between UniRef90/50s and MetaCyc reactions. Not all MetaCyc reactions have an associated EC, and therefore some are not quantifiable by HUMAnN.