I have noticed that there is 4 metacyc pathway files.
I wonder what’s difference among the files and which one should I use for regroup to metacyc pathway?
The metacyc_pathways_structured_v24 is structured files, It can’t be regrouped by humann_regroup_table command correctly. But I also noticed that some IDs in “metacyc_pathways” can’t be found matched name in “map_metacyc-pwy_name.txt.gz” file. And I also found that " metacyc_pathways_structured_v24" 's ID can be matched with “map_metacyc-pwy_name.txt.gz” completely. I wonder to know if I should use “metacyc_pathways_structured_v24” to regroup to metacyc pathway level? As for structured files, would you have some suggestion to regroup to metacyc correctly?
Those files are a combination of 1) two versions of the MetaCyc pathways as well as 2) an accounting of the filters we apply (starting from the full set of pathways, then filtering for pathways with at least four quantifiable reactions, and so forth). The default version that your HUMAnN installation points to is almost certainly the only one you want to be using.
In addition, please note that you can’t use these files for regrouping in the sense of the
regroup_table utility. That utility can only sum more specific features (e.g. UniRef90s) into broader features (e.g. KOs), but it doesn’t apply any of the metabolic reconstruction that goes into quantifying pathways. For that you need to provide the main HUMAnN driver program (
humann) with gene-level input and then specify a set of reactions and pathways, though the defaults there are usually what you want (unless you are supplying your own custom definitions).