MetaCyc hierarchy to invetigate/identify specific pathways

Dear bioBakery forum,

Thanks for the awesome tools you are developing.

I am working on oral microbiome dataset and I would like to:

I have been using the latest humann3 - alpha .

Do you have the hierarchy file available for the pathways? for the genes (uniref90, GO, EC, …)

Thanks

Flo

3 Likes

Hey Flo,

I’m not one of the developers but was recently looking at this. You can use the humann3 utility script humann_regroup_table.py (link) to “regroup” the IDed UniRef90 genes into GO, level 4 ec, metacyc, etc, then use the humann_rename script to convert to more readable names.

I believe one of the other utility scripts will also focus on specific pathways (based on files you input), but I’m not sure which one. Maybe humann_split_stratified_table??

Hi Emily,

Thanks for your suggestions.

I am quite familiar with the humann_regroup_table and rename to map gene families to different functional informations. It seems that humann_split_table does actually the opposite of humann_join_tables and will not help to achieve what I am trying to! But thanks again for sharing

any help would be very much appreciated!

thanks in advance.

Sorry for missing this message. The attached file is not a part of HUMAnN but will associate MetaCyc pathways with higher-level organizational terms. More specifically, each line maps a pathway to a “taxonomy” of less to more specific categories of metabolism. I hope this helps!

map_metacyc-pwy_lineage.tsv (270.7 KB)

3 Likes

The map file provided by franzosa is what I am looking for in a few days before. MetaCyc pathway hierarchical structure like KEGG pathway map file? . At that time, without any reply in my topic, I created such a pathway map file for the downstream analysis. The difference between this file and mine is that I only kept the superclass1 and 2 and trimmed the longer lineage for convenience in data analysis. What you need for “Carbohydrate Degradation” is in the superclass2. I followed the naming with MetaCyc website. See the R package file2meco (GitHub - ChiLiubio/file2meco: Tranform files to the microtable object in microeco package) and data/MetaCyc_pathway_map.RData in it if it is useful.

3 Likes

Apologies for missing your thread - I was apparently not following the parent “community profiling” category (just the tools within it). Nice work creating the independent pathway annotation file and thanks for sharing it!

Hi Eric, hi ChiLui,

Thanks for your answer. Awsome package Chi, thanks for sharing.

Best,

Flo

Hi @fconstancias
I am using humann2 but am trying to do something similar and wish to focus of xenobiotic metabolism and carbohydrate metabolism pathways. Can you share exactly how you overcame this problem?
I tried regrouping the gene abundance file with GO and then rename it, but thats not exactly what I want.
Is there a mapping file for metacyc pathways that could be used for regrouping? And then renaming using metacy-name-pathway file?
Would that be the approach?

Any help would be appreciated.
Thank you
DP

Dear Eric, I’ve been using the map_metacyc-pwy_lineage.tsv in HUMAnN2 and will do the same now for HUMAnN3. I found that the file here is the same that I downloaded some years ago… Do you plan to obtain an updated version of the MeatCyc pathways lineages? Or, how did you create the first one, so that anyone potentially interested can work on it? Thanks in advance

2 Likes

Hey all

i found this thread in a search of an updated hierarchy of the MetaCyc pathways. I can see that the one from the above does need some updating but it’s pretty close and good enough for my purpose. I tried using smart tables to make an updated table - but I don’t have the knowledge/tools to make it happen.

I’m wondering what everyone does with the duplicated hierarchy labels. for example: DENITRIFICATION-PWY is assigned as Degradation/Utilization/Assimilation as well as Generation of Precursor Metabolites and Energy. it’s not helpful to have both so i’m going through and selecting the label that is relevant to me. What have others done?

edit - i see i’m asking the same question as Chi in their other thread

I don’t think we’ve ever updated this file for HUMAnN 3 (it was produced for a specific paper and isn’t necessary for HUMAnN operation). The pathway lineages are based on groupings of MetaCyc pathways into higher-level categories on the MetaCyc website. It might be possible to download these relationships by creating a free account with MetaCyc? The groupings follow a DAG structure, such that a child term can have multiple parent terms. Hence, you would either need to consider all parents or come up with some rule for picking a single parent (e.g. for coloring purposes).

Hi @franzosa,

I’ve also been looking for this for quite a while. Finally came across this thread.

Does a similar file also exist for EC, GO, KEGG?

ECs have a built in-hierarchy based on their numbering, e.g. in 1.2.3.4, “1” corresponds to the top level of the hierarchy, “2” to the next level, and so forth. You can find more about that here: https://enzyme.expasy.org/.

GO is based on a hierarchy which can be manipulated in many ways. The raw information about term relationships is represented in OBO files that you can download from the GO website.

I believe KEGG also has a hierarchical organization, but I’m the least familiar with that one.

We don’t have any special files representing these hierarchies bundled with HUMAnN, however. Users would need to generate them to meet the needs of a particular analysis / project.

Hi, I ran into the same issue and ended up writing my own function to retrieve the pathway hierarchies in MetaCyc (MetaCyc Pathways). Hope it might help somebody!

The resulting pandas DataFrame looks like this:

The code:

import pandas as pd
import requests
import json

def dfs(current_node_id, branch_visited):
    """
    Depth-First Search (DFS) function to retrieve pathway hierarchies from MetaCyc.
    
    Parameters:
        current_node_id (str): The ID of the current node (pathway) being visited.
        branch_visited (list): List of pathway IDs and labels visited so far in the current branch.
    
    Returns:
        None (The results are stored in the global variable 'recorded_pathways').
    """    
    global recorded_pathways
    
    # Make a request to get the direct children-pathways of the current node from the MetaCyc website.
    response = requests.get(f"https://biocyc.org/META/ajax-direct-subs?object={current_node_id}")
    
    # Process the response (JSON) to retrieve child-pathway information.
    for pathway in json.loads(response.text):
        next_node_id = pathway["id"]          # ID of the child pathway to explore.
        next_node_label = pathway["label"]    # Label (name) of the child pathway.
        
        # Update the list of visited pathways in the current branch with information of the new child pathway.
        branch_updated = branch_visited + [f"{next_node_id}: {next_node_label}"]

        # If the child pathway is at the lowest hierarchy (leaf pathway), add it to the recorded pathways.
        if pathway["numInstances"] == 0:
            recorded_pathways.append(branch_updated)
        else:
            # Recursively call the DFS function to explore children pathways of the child pathway.
            dfs(current_node_id = next_node_id, branch_visited = branch_updated)
    
    return


# retrieving the hierarchy by traversing all pathway pages on the biocyc website using DFS
recorded_pathways = []
dfs(current_node_id = "Pathways", branch_visited = ["Pathways: Pathways"])

# Prepare the data for creating the pandas DataFrame with hierarchical annotations.
max_pathway_hierarchy = max([len(i)-1 for i in recorded_pathways])
padded_recorded_pathways = []

# Loop through the recorded pathways and pad the hierarchy levels for a consistent DataFrame.
for pathway in recorded_pathways:
    actual_pathway = pathway[1:]
    padded_pathway = actual_pathway
    
    leaf_pathway = pathway[-1]
    
    # Add None to the pathway hierarchy if it is shallower than the the maximum depth.
    if len(actual_pathway) < max_pathway_hierarchy:
        padded_pathway = actual_pathway + [None] * (max_pathway_hierarchy - len(actual_pathway))

    # Store the padded pathway along with the leaf pathway in a dictionary.        
    padded_recorded_pathways.append({leaf_pathway:padded_pathway})

# Create a DataFrame with the padded hierarchical annotations.
pathway_annotated = pd.DataFrame({})

for pathway in padded_recorded_pathways:
    pathway_annotated = pd.concat((pathway_annotated, pd.DataFrame(pathway).T))

# Rename the index to 'feature' for a more descriptive name.
pathway_annotated.rename_axis('feature', inplace = True)

# Create annotated column names 'level_1', 'level_2', etc. based on the hierarchy depth.
annotated_columns = []
for i, col in enumerate(pathway_annotated.columns):
    annotated_columns.append(f"level_{i+1}")
    
pathway_annotated.columns = annotated_columns    
1 Like