MetaCyc pathway hierarchical structure like KEGG pathway map file?

Dear bioBakery forum,

First of all, thanks for the awesome tools you are developing.

I want to use the MetaCyc pathway abundance data from HUMAaN to perform some further analysis (e.g. lefse) related with the pathway hierarchical structure, such as MetaCyc: DHGLUCONATE-PYR-CAT-PWY: glucose degradation ----> ?, like the KEGG map file (e.g. ko00052:Galactose_metabolism ----> Carbohydrate_metabolism ----> Metabolism). I think the enrichment analysis at higher level is important to capture the functional info in data set fast. However, I donot find such file in MetaCyc website and other tools. I have integrated the KEGG pathway file input based on the HUMAaN result into my R package file2meco(GitHub - ChiLiubio/file2meco: Tranform files to the microtable object in microeco package) and microeco to make the downstream data analysis easy for more researchers in this field. Do you know whether the similar map file is available for me? Sorry if I am missing something important.

1 Like

Dear guys,

Because of no ready-made MetaCyc pathway mapping file for me, I build a mapping file by collecting the superclasses in MetaCyc pathway wetsites. Now this mapping file is available in R package file2meco(GitHub - ChiLiubio/file2meco: Tranform files to the microtable object in microeco package) under data/MetaCyc_pathway_map.RData . Additionally, the MetaCyc and KEGG pathway enrichment analysis using HUMAnN software results are both supported in file2meco function humann2meco() and in the further analysis in microeco package (GitHub - ChiLiubio/microeco: An R package for data analysis in microbial community ecology). Currently, only the top two superclasses are used as there are multiple pathway mapping structure in 524 pathways (in total 2703), such as FERMENTATION-PWY and ENTNER-DOUDOROFF-PWY. Generally, the top two superclasses are relatively constant and are very useful in the top level enrichment analysis. Thus, this method is conservative. Any suggestion is welcome.

Actually, I also wish to split the multiple mapping structure and attempt to sum the abundance for the multiple structure. For example, GLUTATHIONESYN-PWY has two lines in the superclasses in MetaCyc pathway website (MetaCyc glutathione biosynthesis).


Two routes have different class numbers, which is extremely difficult to make sure the levels for the superclasses and do the abundance enrichment calculation. If the numbers are same, it is relatively easy to calculate the abundance by one to many, which has been considered in the cal_abund() function in the package microeco.

A similar topic and the recent reply also referred to the mapping file MetaCyc hierarchy to invetigate/identify specific pathways

1 Like

Hi Chi, this is brilliant and almost exactly what I needed. I’m just wondering if it’s possible to create these plots based on relative abundance?

1 Like

Sure. Please have a try and feel free to tell me if there are some problems.

1 Like

I think it works. I’m trying to get the percentage output for this data, but I have no idea how to do it. How would I go about this?

How about run the example of HUMAnN metagenomic results following Chapter 8 file2meco package | Tutorial for R microeco package (v0.7.0) ?
I think the cal_abund function may be what you need. The RPK or relative abundance are both supported.

2 Likes

Hi,
I have a question please about the microeco R package below what are these variables rever to?
I get a genefamilies.tsv , pathabundance.tsv and the coverage so how can I use theme in this tutorial ??

sample_file_path <- system.file("extdata", "example_metagenome_sample_info.tsv", package="file2meco")
match_file_path <- system.file("extdata", "example_metagenome_match_table.tsv", package="file2meco")

# MetaCyc pathway database based analysis
# use the raw data files stored inside the package for MetaCyc pathway database based analysis
abund_file_path <- system.file("extdata", "example_HUMAnN_MetaCyc_abund.tsv", package="file2meco")

Hi. It should be “pathabundance.tsv”, which has the pathway abundances.

Yes I know that… I’m asking which variable for genefamilies.tsv and pathabundance.tsv in your script in the screen shot that I sent before

The codes in the screen shot are the file path used in the example. You can ignore them and use the humann2meco function directly like this:

d1 <- humann2meco(feature_table = "your_pathabundance.tsv", db = "MetaCyc")