I’m looking for the identifier mapping tables used in the backend for HUMAnN.
More specifically, the following if they are available:
UniRef50 → EC
UniRef50(or EC) → KEGG KO
EC → MetaCyc pathway
UniRef50 (EC or KO) → KEGG pathway
I found some files here: site-packages/humann/data/pathways/
I have a few questions:
- Is it expected for one UniRef50 ID to map to more than one EC in some cases?
UniRef50_G6EMD2 {5.4.2.6, 2.7.1.41}
UniRef50_Q1J7L4 {5.4.2.6, 2.7.1.41}
UniRef50_T0UKK6 {5.4.2.6, 2.7.1.41}
UniRef50_X5NX36 {5.4.2.6, 2.7.1.41}
How is this handled in the backend? Does a UniRef50 hit for these count towards both or only one?
- I got ID mappings between pathways and rxns from
data/pathways/metacyc_pathways
. Many of these rxns do not have ECs and are not present indata/pathways/metacyc_reactions_level4ec_only.uniref.bz2
. For example, the following:
list(pwy_to_rxns["PWY-2681"])
# ['RXN-4308',
# 'RXN-4305',
# 'RXN-4317',
# 'RXN-4310',
# 'RXN-4306',
# 'RXN-4314',
# 'RXN-4303',
# 'RXN-4307',
# 'RXN-4313',
# 'RXN-4312',
# 'RXN-4304']
pd.Series(rxn_to_ec)[list(pwy_to_rxns["PWY-2681"])]
# RXN-4308 {}
# RXN-4305 {2.5.1.112}
# RXN-4317 {}
# RXN-4310 {}
# RXN-4306 {}
# RXN-4314 {}
# RXN-4303 {2.5.1.112}
# RXN-4307 {2.5.1.27}
# RXN-4313 {}
# RXN-4312 {}
# RXN-4304 {}
The following to confirm:
grep "RXN-4308" metacyc_reactions_level4ec_only.uniref
Are there supposed to be ECs associated with some of the rxns here since it’s in the pathway?