Uniref90 Gene Families to Pathways

Hello,

I’ve created tables for UniRef90 gene families and UniRef Pathways and I’m wondering if there’s a way to link these two. In general I’m looking for some sort of mapping file that says which gene families are part of which pathways.

-Eric

1 Like

You’d have to do this as a two-step process. Under the HUMAnN pathway data folder, you’ll find a file that maps gene families (UniRefs) to MetaCyc reactions:

metacyc_reactions_level4ec_only.uniref.bz2

And then there is a separate file that defines pathways according to their component reactions and arrangement:

metacyc_structured_pathways_filtered

You can consider that a gene family contributes to a detected pathway if it maps to a RXN that is included in that pathway’s definition.

The files above can also be found here:

Hi @franzosa!
humann v3.9 user here! I am basically trying to do the same as @esmith1032 (linking UniRef90 gene families and UniRef Pathways) but cannot found the files you mention, not your link does work anymore. Could you please ajourn?

:crossed_fingers: Promise to update and close the post if I find the solution on my own

UPDATE 1: I found metacyc_reactions_level4ec_only.uniref.bz2 and
metacyc_structured_pathways_filtered (found as metacyc_pathways_structured_filtered, but also metacyc_pathways_structured_filtered and metacyc_pathways_structured_filtered_v24_subreactions are available) in ~/miniforge3/envs/humann3/lib/python3.7/site-packages/humann/data/pathways/

Updated here

Hi @franzosa!
Could you please explain how to interpret the metacyc_pathways_structured_filtered_v24_subreactions file? I’d like to understand the specific meaning of symbols like +, ,, () within the file

This is syntax we borrowed from KEGG’s module definitions (see link below for more information).

Briefly, , is an OR relationship, + is an AND relationship, and ( )s mean to evaluate that part of the pathway/module as a unit. So, for example, ( A , B ) + ( C , D ) means you need at least one of A or B combined with at least one of C or D to satisfy the pathway/module.

In terms of abundances, HUMAnN would (roughly speaking) take a max over any OR relationship and a min over any AND relationship (possibly forcing the module/pathway abundance to 0 if a key reaction was not detected).

Thank you for your explanation. How about the space ‘ ‘ and minus ‘-‘? Do the meanings of the spaces and the ‘+’ sign correspond?

Thank you for your explanation. How about the space ‘ ‘ and minus ‘-‘? Do the meanings of the spaces and the ‘+’ sign correspond?

---- Replied Message ----

From | Eric Franzosa via The bioBakery help forumnotifications@biobakery.discoursemail.com |

  • | - |
    Date | 8/7/2025 04:55 |
    To | 17863805829@163.com |
    Subject | [The bioBakery help forum] [Microbial community profiling/HUMAnN] Uniref90 Gene Families to Pathways |

| franzosa bioBakery lab member
August 6 |

  • | - |

This is syntax we borrowed from KEGG’s module definitions (see link below for more information).

Briefly, , is an OR relationship, + is an AND relationship, and ( )s mean to evaluate that part of the pathway/module as a unit. So, for example, ( A , B ) + ( C , D ) means you need at least one of A or B combined with at least one of C or D to satisfy the pathway/module.

In terms of abundances, HUMAnN would (roughly speaking) take a max over any OR relationship and a min over any AND relationship (possibly forcing the module/pathway abundance to 0 if a key reaction was not detected).

genome.jp

KEGG MODULE Database

All explained in the link I provided above. Here’s the relevant quote:

In the logical expression a space or a plus sign, representing a connection in the pathway or the molecular complex, is treated as an AND operator and a comma, used for alternatives, is treated as an OR operator. A minus sign designates an optional item in the complex.