Hello!
I hope you can help me. We have run humann2 on our samples and have gotten out the metacyc pathway information and pathway abundance. For another analysis, we would like to check how the pathways are distributed across the contigs, e.g. Pathway1: first 4 reactions are on contig1, last 3 reactions are on contig2. What would be the best way to do this?
My idea was to parse the metacyc_pathways_structured_filtered file the same way as humann2 does, so that the analyses stay consistent. Basically, I would like to know: from the metacyc_pathways_structured_filtered file, how does humann2 decide which combos of genes constitute a complete pathway?
examples:
METHGLYUT-PWY 1.1.1.283-RXN LACTALDDEHYDROG-RXN -L-LACTDEHYDROGFMN-RXN -RXN0-4281 -RXN-8632 GLYOXIII-RXN GLYOXI-RXN GLYOXII-RXN -DLACTDEHYDROGFAD-RXN
According to the code of the PathwaysDatabase class in store.py, that would be an unstructured pathway, right? So, for the unstructured pathways, does a pathway count as complete if all reactions are there that are not optional?
PWY-7431 ( -SYNEPHRINE-DEHYDRATASE-RXN , -RXN-15198 , -OCTOPAMINE-DEHYDRATASE-RXN , RXN-5821 ) -1.2.1.53-RXN RXN-8505 RXN6666-4 RXN6666-5
PWY-7433 2.4.1.41-RXN 2.4.1.122-RXN ( 2.4.99.4-RXN , ( 2.4.1.102-RXN 2.4.1.146-RXN ) )
For the structured pathways, I am a little more confused. I get how the boolean operators are used, I am just wondering how the optional reactions play into this. For PWY-7433, I would think that the pathway would count as complete if either: 2.4.1.41-RXN + 2.4.1.122-RXN + 2.4.99.4-RXN are present, or 2.4.1.41-RXN + 2.4.1.122-RXN + 2.4.1.102-RXN + 2.4.1.146-RXN are there, is that correct? But then, what does it mean for completeness when the reactions in the parentheses are marked as optional like in PWY 7431?
Thanks in advance!