Hello,
I would like to retrieve the UniProt90 identifiers that are part of the PWY pathways. However, I’m unsure how to do this.
From reading the HUMAnN code, I see that I need to use these three files:
metacyc_reactions_level4ec_only.uniref.bz2
metacyc_pathways_structured_filtered_v24_subreactions
map_metacyc-pwy_name.txt
Content of metacyc_reactions_level4ec_only.uniref.bz2
:
4.2.3.25-RXN 4.2.3.25 UniRef50_A0A2G9FWX1 UniRef50_A0A2G9FXZ9 UniRef50_A0A2G9GJV9 UniRef50_A0A2I0ADH9 UniRef50_A0A2P6QBR7 UniRef50_D8RNZ9 UniRef50_Q29VN2 UniRef50_Q6ZH94 UniRef50_Q84UV0 UniRef50_Q96376 UniRef90_A0A2G9FWX1 UniRef90_A0A2G9FXZ9 UniRef90_A0A2G9G3G9 UniRef90_A0A2G9GJV9 UniRef90_A0A2I0ADH9 UniRef90_A0A2P6QBR7 UniRef90_D8RNZ9 UniRef90_G0R176 UniRef90_G7IK35 UniRef90_Q29VN2 UniRef90_Q6ZH94 UniRef90_Q84UV0 UniRef90_Q96376 UniRef90_V5JYG2 UniRef90_V5JZ68
RXN-19002 2.4.99.6 UniRef50_P72097 UniRef50_Q48211 UniRef50_Q9YN04 UniRef90_P72097 UniRef90_Q11203 UniRef90_Q48211 UniRef90_Q91Y74 UniRef90_Q9YJT5 UniRef90_Q9YN04
RXN-9502 1.14.14.117 UniRef50_O13345 UniRef90_O13345
Content of metacyc_pathways_structured_filtered_v24_subreactions
:
PWY-7426 2.4.1.101-RXN 3.2.1.114-RXN 2.4.1.143-RXN ( 2.4.1.68-RXN , ( 2.4.1.145-RXN 2.4.1.155-RXN RXN-19001 RXN-19002 RXN-19003 ) , 2.4.1.144-RXN )
PWY-7831 2.4.1.38-RXN ( GALACTOSIDE-3-FUCOSYLTRANSFERASE-RXN , 2.4.99.6-RXN , ( RXN-18235 ( ( RXN-18245 RXN-18251 ) , ( RXN-18243 RXN-18250 ) , RXN-18249 ) ) ) RXN-18254 -RXN-18262
PWY-7833 ( RXN-18259 , ( 2.4.1.38-RXN ( ( 2.4.99.6-RXN RXN-18254 ) , ( GALACTOSIDE-3-FUCOSYLTRANSFERASE-RXN RXN-18263 ) ) ) ) ( RXN-18234 , ( 2.4.1.65-RXN RXN-18264 ) )
PWY-7434 ( RXN-15271 , 2.4.1.38-RXN 2.4.1.149-RXN RXN-15276 2.4.1.150-RXN RXN-15278 , 2.4.1.151-RXN , 2.4.99.6-RXN )
I think I need to cross these two files to get the PWY number and to get the name from the map_metacyc-pwy_name.txt
file, right?
I wonder how HUMAnN handles the first two columns of the metacyc_reactions_level4ec_only.uniref.bz2
file, as the line
RXN-19002 2.4.99.6
because this row corresponds to different information in the metacyc_pathways_structured_filtered_v24_subreactions
file:
RXN-19002:
PWY-7426 2.4.1.101-RXN 3.2.1.114-RXN 2.4.1.143-RXN ( 2.4.1.68-RXN , ( 2.4.1.145-RXN 2.4.1.155-RXN RXN-19001 RXN-19002 RXN-19003 ) , 2.4.1.144-RXN )
Whereas
2.4.99.6:
PWY-7831 2.4.1.38-RXN ( GALACTOSIDE-3-FUCOSYLTRANSFERASE-RXN ,2.4.99.6-RXN , ( RXN-18235 ( ( RXN-18245 RXN-18251 ) , ( RXN-18243 RXN-18250 ) , RXN-18249 ) ) ) RXN-18254 -RXN-18262
PWY-7833 ( RXN-18259 , ( 2.4.1.38-RXN ( ( 2.4.99.6-RXN RXN-18254 ) , ( GALACTOSIDE-3-FUCOSYLTRANSFERASE-RXN RXN-18263 ) ) ) ) ( RXN-18234 , ( 2.4.1.65-RXN RXN-18264 ) )
PWY-7434 ( RXN-15271 , 2.4.1.38-RXN 2.4.1.149-RXN RXN-15276 2.4.1.150-RXN RXN-15278 , 2.4.1.151-RXN , 2.4.99.6-RXN )
I have noticed that this issue does not occur necessarily on other lines.
Lastly, there is also a file named map_level4ec_uniref90.txt.gz
that contains similar information of metacyc_reactions_level4ec_only.uniref.bz2
:
1.1.1.112 UniRef90_A0A084G895 UniRef90_Q04828
1.1.1.114 UniRef90_A2Q9G9
1.1.1.116 UniRef90_A0A0F8BWA0 UniRef90_A3LTU8 UniRef90_F0STM9 UniRef90_K0IVQ5 UniRef90_Q04212 UniRef90_W1QGZ4
I’m not sure if this file is better or just similar.
Thank you, and have a great day!
Best regards,
Jérémy Tournayre