Hello,
I’ve created tables for UniRef90 gene families and UniRef Pathways and I’m wondering if there’s a way to link these two. In general I’m looking for some sort of mapping file that says which gene families are part of which pathways.
-Eric
Hello,
I’ve created tables for UniRef90 gene families and UniRef Pathways and I’m wondering if there’s a way to link these two. In general I’m looking for some sort of mapping file that says which gene families are part of which pathways.
-Eric
You’d have to do this as a two-step process. Under the HUMAnN pathway data folder, you’ll find a file that maps gene families (UniRefs) to MetaCyc reactions:
metacyc_reactions_level4ec_only.uniref.bz2
And then there is a separate file that defines pathways according to their component reactions and arrangement:
metacyc_structured_pathways_filtered
You can consider that a gene family contributes to a detected pathway if it maps to a RXN that is included in that pathway’s definition.
The files above can also be found here:
Hi @franzosa!
humann v3.9 user here! I am basically trying to do the same as @esmith1032 (linking UniRef90 gene families and UniRef Pathways) but cannot found the files you mention, not your link does work anymore. Could you please ajourn?
Promise to update and close the post if I find the solution on my own
UPDATE 1: I found metacyc_reactions_level4ec_only.uniref.bz2
and
metacyc_structured_pathways_filtered
(found as metacyc_pathways_structured_filtered
, but also metacyc_pathways_structured_filtered
and metacyc_pathways_structured_filtered_v24_subreactions
are available) in ~/miniforge3/envs/humann3/lib/python3.7/site-packages/humann/data/pathways/
Updated here
Hi @franzosa!
Could you please explain how to interpret the metacyc_pathways_structured_filtered_v24_subreactions
file? I’d like to understand the specific meaning of symbols like +
, ,
, ()
within the file
This is syntax we borrowed from KEGG’s module definitions (see link below for more information).
Briefly, ,
is an OR relationship, +
is an AND relationship, and ( )
s mean to evaluate that part of the pathway/module as a unit. So, for example, ( A , B ) + ( C , D )
means you need at least one of A or B combined with at least one of C or D to satisfy the pathway/module.
In terms of abundances, HUMAnN would (roughly speaking) take a max
over any OR relationship and a min
over any AND relationship (possibly forcing the module/pathway abundance to 0 if a key reaction was not detected).
Thank you for your explanation. How about the space ‘ ‘ and minus ‘-‘? Do the meanings of the spaces and the ‘+’ sign correspond?
Thank you for your explanation. How about the space ‘ ‘ and minus ‘-‘? Do the meanings of the spaces and the ‘+’ sign correspond?
---- Replied Message ----
From | Eric Franzosa via The bioBakery help forumnotifications@biobakery.discoursemail.com |
- | - |
Date | 8/7/2025 04:55 |
To | 17863805829@163.com |
Subject | [The bioBakery help forum] [Microbial community profiling/HUMAnN] Uniref90 Gene Families to Pathways |
| franzosa bioBakery lab member
August 6 |
- | - |
This is syntax we borrowed from KEGG’s module definitions (see link below for more information).
Briefly, ,
is an OR relationship, +
is an AND relationship, and ( )
s mean to evaluate that part of the pathway/module as a unit. So, for example, ( A , B ) + ( C , D )
means you need at least one of A or B combined with at least one of C or D to satisfy the pathway/module.
In terms of abundances, HUMAnN would (roughly speaking) take a max
over any OR relationship and a min
over any AND relationship (possibly forcing the module/pathway abundance to 0 if a key reaction was not detected).
All explained in the link I provided above. Here’s the relevant quote:
In the logical expression a space or a plus sign, representing a connection in the pathway or the molecular complex, is treated as an AND operator and a comma, used for alternatives, is treated as an OR operator. A minus sign designates an optional item in the complex.