First of all, thanks for the awesome tools you are developing.
I want to use the MetaCyc pathway abundance data from HUMAaN to perform some further analysis (e.g. lefse) related with the pathway hierarchical structure, such as MetaCyc: DHGLUCONATE-PYR-CAT-PWY: glucose degradation ----> ?, like the KEGG map file (e.g. ko00052:Galactose_metabolism ----> Carbohydrate_metabolism ----> Metabolism). I think the enrichment analysis at higher level is important to capture the functional info in data set fast. However, I donot find such file in MetaCyc website and other tools. I have integrated the KEGG pathway file input based on the HUMAaN result into my R package file2meco(GitHub - ChiLiubio/file2meco: Tranform files to the microtable object in microeco package) and microeco to make the downstream data analysis easy for more researchers in this field. Do you know whether the similar map file is available for me? Sorry if I am missing something important.
Because of no ready-made MetaCyc pathway mapping file for me, I build a mapping file by collecting the superclasses in MetaCyc pathway wetsites. Now this mapping file is available in R package file2meco(GitHub - ChiLiubio/file2meco: Tranform files to the microtable object in microeco package) under data/MetaCyc_pathway_map.RData . Additionally, the MetaCyc and KEGG pathway enrichment analysis using HUMAnN software results are both supported in file2meco function humann2meco() and in the further analysis in microeco package (GitHub - ChiLiubio/microeco: An R package for data analysis in microbial community ecology). Currently, only the top two superclasses are used as there are multiple pathway mapping structure in 524 pathways (in total 2703), such as FERMENTATION-PWY and ENTNER-DOUDOROFF-PWY. Generally, the top two superclasses are relatively constant and are very useful in the top level enrichment analysis. Thus, this method is conservative. Any suggestion is welcome.
Actually, I also wish to split the multiple mapping structure and attempt to sum the abundance for the multiple structure. For example, GLUTATHIONESYN-PWY has two lines in the superclasses in MetaCyc pathway website (MetaCyc glutathione biosynthesis).
Two routes have different class numbers, which is extremely difficult to make sure the levels for the superclasses and do the abundance enrichment calculation. If the numbers are same, it is relatively easy to calculate the abundance by one to many, which has been considered in the cal_abund() function in the package microeco.
Hi,
I have a question please about the microeco R package below what are these variables rever to?
I get a genefamilies.tsv , pathabundance.tsv and the coverage so how can I use theme in this tutorial ??
sample_file_path <- system.file("extdata", "example_metagenome_sample_info.tsv", package="file2meco")
match_file_path <- system.file("extdata", "example_metagenome_match_table.tsv", package="file2meco")
# MetaCyc pathway database based analysis
# use the raw data files stored inside the package for MetaCyc pathway database based analysis
abund_file_path <- system.file("extdata", "example_HUMAnN_MetaCyc_abund.tsv", package="file2meco")
I have recently updated the file2meco package, manually curating the ontology information for all over 3000 MetaCyc metabolic pathways. For metabolic pathways with multiple labels at the Superclass level, I have used the character “&&” to connect them. If the user need to filter relevant metabolic pathways from the table, please use regular expressions to match, and direct filtering can produce incorrect results. This “&&” character will be automatically recognized by the cal_abund function of microtable class that calculates abundance, and then it will be split and calculated separately. So, if a metabolic pathway M has Superclass1 as A&&B, then the final calculation of RPK or relative abundance for both A and B will include M.
The command to view this updated table in R is file2meco::MetaCyc_pathway_map