MetaCyc hierarchy to invetigate/identify specific pathways

fconstancias · March 22, 2021, 5:00pm

Dear bioBakery forum,

Thanks for the awesome tools you are developing.

I am working on oral microbiome dataset and I would like to:

decorate some pathway - same as in: https://msystems.asm.org/content/1/5/e00052-16
focus specifically on some Pathways belonging e.g., Carbohydrate Degradation (134 instances).

I have been using the latest humann3 - alpha .

Do you have the hierarchy file available for the pathways? for the genes (uniref90, GO, EC, …)

Thanks

Flo

ewissel · March 22, 2021, 6:11pm

Hey Flo,

I’m not one of the developers but was recently looking at this. You can use the humann3 utility script humann_regroup_table.py (link) to “regroup” the IDed UniRef90 genes into GO, level 4 ec, metacyc, etc, then use the humann_rename script to convert to more readable names.

I believe one of the other utility scripts will also focus on specific pathways (based on files you input), but I’m not sure which one. Maybe humann_split_stratified_table??

fconstancias · March 22, 2021, 7:15pm

Hi Emily,

Thanks for your suggestions.

I am quite familiar with the humann_regroup_table and rename to map gene families to different functional informations. It seems that humann_split_table does actually the opposite of humann_join_tables and will not help to achieve what I am trying to! But thanks again for sharing

fconstancias · April 15, 2021, 2:28pm

any help would be very much appreciated!

thanks in advance.

franzosa · April 15, 2021, 2:45pm

Sorry for missing this message. The attached file is not a part of HUMAnN but will associate MetaCyc pathways with higher-level organizational terms. More specifically, each line maps a pathway to a “taxonomy” of less to more specific categories of metabolism. I hope this helps!

map_metacyc-pwy_lineage.tsv (270.7 KB)

Chi · April 16, 2021, 3:11pm

The map file provided by franzosa is what I am looking for in a few days before. MetaCyc pathway hierarchical structure like KEGG pathway map file? . At that time, without any reply in my topic, I created such a pathway map file for the downstream analysis. The difference between this file and mine is that I only kept the superclass1 and 2 and trimmed the longer lineage for convenience in data analysis. What you need for “Carbohydrate Degradation” is in the superclass2. I followed the naming with MetaCyc website. See the R package file2meco (GitHub - ChiLiubio/file2meco: Tranform files to the microtable object in microeco package) and data/MetaCyc_pathway_map.RData in it if it is useful.

franzosa · April 19, 2021, 3:42pm

Apologies for missing your thread - I was apparently not following the parent “community profiling” category (just the tools within it). Nice work creating the independent pathway annotation file and thanks for sharing it!

fconstancias · April 22, 2021, 9:34am

Hi Eric, hi ChiLui,

Thanks for your answer. Awsome package Chi, thanks for sharing.

Best,

Flo

Dhrati_Patangia · August 12, 2021, 4:45pm

Hi @fconstancias
I am using humann2 but am trying to do something similar and wish to focus of xenobiotic metabolism and carbohydrate metabolism pathways. Can you share exactly how you overcame this problem?
I tried regrouping the gene abundance file with GO and then rename it, but thats not exactly what I want.
Is there a mapping file for metacyc pathways that could be used for regrouping? And then renaming using metacy-name-pathway file?
Would that be the approach?

Any help would be appreciated.
Thank you
DP

Marco_Severgnini · February 24, 2022, 9:10am

Dear Eric, I’ve been using the map_metacyc-pwy_lineage.tsv in HUMAnN2 and will do the same now for HUMAnN3. I found that the file here is the same that I downloaded some years ago… Do you plan to obtain an updated version of the MeatCyc pathways lineages? Or, how did you create the first one, so that anyone potentially interested can work on it? Thanks in advance

Jordan_Poitras · September 26, 2022, 3:48am

Hey all

i found this thread in a search of an updated hierarchy of the MetaCyc pathways. I can see that the one from the above does need some updating but it’s pretty close and good enough for my purpose. I tried using smart tables to make an updated table - but I don’t have the knowledge/tools to make it happen.

I’m wondering what everyone does with the duplicated hierarchy labels. for example: DENITRIFICATION-PWY is assigned as Degradation/Utilization/Assimilation as well as Generation of Precursor Metabolites and Energy. it’s not helpful to have both so i’m going through and selecting the label that is relevant to me. What have others done?

Jordan_Poitras · September 26, 2022, 4:07am

edit - i see i’m asking the same question as Chi in their other thread

franzosa · September 26, 2022, 9:32pm

I don’t think we’ve ever updated this file for HUMAnN 3 (it was produced for a specific paper and isn’t necessary for HUMAnN operation). The pathway lineages are based on groupings of MetaCyc pathways into higher-level categories on the MetaCyc website. It might be possible to download these relationships by creating a free account with MetaCyc? The groupings follow a DAG structure, such that a child term can have multiple parent terms. Hence, you would either need to consider all parents or come up with some rule for picking a single parent (e.g. for coloring purposes).

MalbertR · March 13, 2023, 3:44pm

Hi @franzosa,

I’ve also been looking for this for quite a while. Finally came across this thread.

Does a similar file also exist for EC, GO, KEGG?

franzosa · March 30, 2023, 6:36pm

ECs have a built in-hierarchy based on their numbering, e.g. in 1.2.3.4, “1” corresponds to the top level of the hierarchy, “2” to the next level, and so forth. You can find more about that here: https://enzyme.expasy.org/.

GO is based on a hierarchy which can be manipulated in many ways. The raw information about term relationships is represented in OBO files that you can download from the GO website.

I believe KEGG also has a hierarchical organization, but I’m the least familiar with that one.

We don’t have any special files representing these hierarchies bundled with HUMAnN, however. Users would need to generate them to meet the needs of a particular analysis / project.

changhu · July 24, 2023, 8:03pm

Hi, I ran into the same issue and ended up writing my own function to retrieve the pathway hierarchies in MetaCyc (MetaCyc Pathways). Hope it might help somebody!

The resulting pandas DataFrame looks like this:

The code:

import pandas as pd
import requests
import json

def dfs(current_node_id, branch_visited):
    """
    Depth-First Search (DFS) function to retrieve pathway hierarchies from MetaCyc.
    
    Parameters:
        current_node_id (str): The ID of the current node (pathway) being visited.
        branch_visited (list): List of pathway IDs and labels visited so far in the current branch.
    
    Returns:
        None (The results are stored in the global variable 'recorded_pathways').
    """    
    global recorded_pathways
    
    # Make a request to get the direct children-pathways of the current node from the MetaCyc website.
    response = requests.get(f"https://biocyc.org/META/ajax-direct-subs?object={current_node_id}")
    
    # Process the response (JSON) to retrieve child-pathway information.
    for pathway in json.loads(response.text):
        next_node_id = pathway["id"]          # ID of the child pathway to explore.
        next_node_label = pathway["label"]    # Label (name) of the child pathway.
        
        # Update the list of visited pathways in the current branch with information of the new child pathway.
        branch_updated = branch_visited + [f"{next_node_id}: {next_node_label}"]

        # If the child pathway is at the lowest hierarchy (leaf pathway), add it to the recorded pathways.
        if pathway["numInstances"] == 0:
            recorded_pathways.append(branch_updated)
        else:
            # Recursively call the DFS function to explore children pathways of the child pathway.
            dfs(current_node_id = next_node_id, branch_visited = branch_updated)
    
    return


# retrieving the hierarchy by traversing all pathway pages on the biocyc website using DFS
recorded_pathways = []
dfs(current_node_id = "Pathways", branch_visited = ["Pathways: Pathways"])

# Prepare the data for creating the pandas DataFrame with hierarchical annotations.
max_pathway_hierarchy = max([len(i)-1 for i in recorded_pathways])
padded_recorded_pathways = []

# Loop through the recorded pathways and pad the hierarchy levels for a consistent DataFrame.
for pathway in recorded_pathways:
    actual_pathway = pathway[1:]
    padded_pathway = actual_pathway
    
    leaf_pathway = pathway[-1]
    
    # Add None to the pathway hierarchy if it is shallower than the the maximum depth.
    if len(actual_pathway) < max_pathway_hierarchy:
        padded_pathway = actual_pathway + [None] * (max_pathway_hierarchy - len(actual_pathway))

    # Store the padded pathway along with the leaf pathway in a dictionary.        
    padded_recorded_pathways.append({leaf_pathway:padded_pathway})

# Create a DataFrame with the padded hierarchical annotations.
pathway_annotated = pd.DataFrame({})

for pathway in padded_recorded_pathways:
    pathway_annotated = pd.concat((pathway_annotated, pd.DataFrame(pathway).T))

# Rename the index to 'feature' for a more descriptive name.
pathway_annotated.rename_axis('feature', inplace = True)

# Create annotated column names 'level_1', 'level_2', etc. based on the hierarchy depth.
annotated_columns = []
for i, col in enumerate(pathway_annotated.columns):
    annotated_columns.append(f"level_{i+1}")
    
pathway_annotated.columns = annotated_columns

Chi · December 5, 2024, 5:37am

This issue is indeed very troublesome.
The MetaCyc database has undergone significant changes compared to before.
Therefore, I have recently updated the file2meco package,
manually curating the ontology information for all over 3000 MetaCyc metabolic pathways.
For metabolic pathways with multiple labels at the Superclass level,
I have used the character “&&” to connect them.
If you need to filter relevant metabolic pathways from the table,
you can use regular expressions to match, and direct filtering can produce incorrect results.
This “&&” character will be automatically recognized by the cal_abund function of microtable class that calculates abundance,
and then it will be split and calculated separately.
So, if a metabolic pathway M has Superclass1 as A&&B,
then the final calculation of RPK or relative abundance for both A and B will include M.
The command to view this table is file2meco::MetaCyc_pathway_map

Topic		Replies	Views
Confusion with HUMAnN 'regroup_table' and higher-level pathway information HUMAnN	1	1187	February 2, 2024
MetaCyc pathway hierarchical structure like KEGG pathway map file? HUMAnN	12	2654	December 5, 2024
Mapping MetaCyc Pathways HUMAnN	5	837	December 15, 2022
Uniref90 Gene Families to Pathways HUMAnN	3	1042	June 16, 2025
Linking metacyc pathway to EC gene family HUMAnN	3	744	August 10, 2021

MetaCyc hierarchy to invetigate/identify specific pathways

Related topics