How to convert MetaCyc pathway PWY into UniProt90 identifiers

Hello,

I would like to retrieve the UniProt90 identifiers that are part of the PWY pathways. However, I’m unsure how to do this.

From reading the HUMAnN code, I see that I need to use these three files:

  1. metacyc_reactions_level4ec_only.uniref.bz2
  2. metacyc_pathways_structured_filtered_v24_subreactions
  3. map_metacyc-pwy_name.txt

Content of metacyc_reactions_level4ec_only.uniref.bz2:

4.2.3.25-RXN    4.2.3.25    UniRef50_A0A2G9FWX1    UniRef50_A0A2G9FXZ9    UniRef50_A0A2G9GJV9    UniRef50_A0A2I0ADH9    UniRef50_A0A2P6QBR7    UniRef50_D8RNZ9    UniRef50_Q29VN2    UniRef50_Q6ZH94    UniRef50_Q84UV0    UniRef50_Q96376    UniRef90_A0A2G9FWX1    UniRef90_A0A2G9FXZ9    UniRef90_A0A2G9G3G9    UniRef90_A0A2G9GJV9    UniRef90_A0A2I0ADH9    UniRef90_A0A2P6QBR7    UniRef90_D8RNZ9    UniRef90_G0R176    UniRef90_G7IK35    UniRef90_Q29VN2    UniRef90_Q6ZH94    UniRef90_Q84UV0    UniRef90_Q96376    UniRef90_V5JYG2    UniRef90_V5JZ68
RXN-19002    2.4.99.6    UniRef50_P72097    UniRef50_Q48211    UniRef50_Q9YN04    UniRef90_P72097    UniRef90_Q11203    UniRef90_Q48211    UniRef90_Q91Y74    UniRef90_Q9YJT5    UniRef90_Q9YN04
RXN-9502    1.14.14.117    UniRef50_O13345    UniRef90_O13345

Content of metacyc_pathways_structured_filtered_v24_subreactions:

PWY-7426    2.4.1.101-RXN 3.2.1.114-RXN 2.4.1.143-RXN ( 2.4.1.68-RXN , ( 2.4.1.145-RXN 2.4.1.155-RXN RXN-19001 RXN-19002 RXN-19003 ) , 2.4.1.144-RXN ) 
PWY-7831    2.4.1.38-RXN ( GALACTOSIDE-3-FUCOSYLTRANSFERASE-RXN , 2.4.99.6-RXN , ( RXN-18235 ( ( RXN-18245 RXN-18251 ) , ( RXN-18243 RXN-18250 ) , RXN-18249 ) ) ) RXN-18254 -RXN-18262 
PWY-7833    ( RXN-18259 , ( 2.4.1.38-RXN ( ( 2.4.99.6-RXN RXN-18254 ) , ( GALACTOSIDE-3-FUCOSYLTRANSFERASE-RXN RXN-18263 ) ) ) )  ( RXN-18234 , ( 2.4.1.65-RXN RXN-18264 ) ) 
PWY-7434    ( RXN-15271 ,  2.4.1.38-RXN 2.4.1.149-RXN RXN-15276 2.4.1.150-RXN RXN-15278  , 2.4.1.151-RXN , 2.4.99.6-RXN ) 

I think I need to cross these two files to get the PWY number and to get the name from the map_metacyc-pwy_name.txt file, right?

I wonder how HUMAnN handles the first two columns of the metacyc_reactions_level4ec_only.uniref.bz2 file, as the line

RXN-19002    2.4.99.6

because this row corresponds to different information in the metacyc_pathways_structured_filtered_v24_subreactions file:

RXN-19002:
PWY-7426    2.4.1.101-RXN 3.2.1.114-RXN 2.4.1.143-RXN ( 2.4.1.68-RXN , ( 2.4.1.145-RXN 2.4.1.155-RXN RXN-19001 RXN-19002 RXN-19003 ) , 2.4.1.144-RXN )

Whereas

2.4.99.6:
PWY-7831    2.4.1.38-RXN ( GALACTOSIDE-3-FUCOSYLTRANSFERASE-RXN ,2.4.99.6-RXN , ( RXN-18235 ( ( RXN-18245 RXN-18251 ) , ( RXN-18243 RXN-18250 ) , RXN-18249 ) ) ) RXN-18254 -RXN-18262 
PWY-7833    ( RXN-18259 , ( 2.4.1.38-RXN ( ( 2.4.99.6-RXN RXN-18254 ) , ( GALACTOSIDE-3-FUCOSYLTRANSFERASE-RXN RXN-18263 ) ) ) )  ( RXN-18234 , ( 2.4.1.65-RXN RXN-18264 ) ) 
PWY-7434    ( RXN-15271 ,  2.4.1.38-RXN 2.4.1.149-RXN RXN-15276 2.4.1.150-RXN RXN-15278  , 2.4.1.151-RXN , 2.4.99.6-RXN )

I have noticed that this issue does not occur necessarily on other lines.

Lastly, there is also a file named map_level4ec_uniref90.txt.gz that contains similar information of metacyc_reactions_level4ec_only.uniref.bz2 :

1.1.1.112    UniRef90_A0A084G895    UniRef90_Q04828
1.1.1.114    UniRef90_A2Q9G9
1.1.1.116    UniRef90_A0A0F8BWA0    UniRef90_A3LTU8    UniRef90_F0STM9    UniRef90_K0IVQ5    UniRef90_Q04212    UniRef90_W1QGZ4

I’m not sure if this file is better or just similar.

Thank you, and have a great day!

Best regards,
Jérémy Tournayre

There is a utility called humann_unpack_pathways that will accomplish what you want I think? It is not super widely used, but you will find some other posts here discussing it if you search for its name. Otherwise getting from pathways to genes would be a two-step process: 1) loading the mapping from RXNs to genes (UniRef90s), and then 2) transitively mapping genes to pathways based on RXNs that show up in the pathway definitions. This could tell you broadly which UniRef90s might’ve contributed to which pathways.