Reaction to Pathways

Hi,
I ran HUMAnN 3.8 on a metagenomic dataset using default settings, which produced the expected outputs: gene family and pathway abundance tables.

To better understand how HUMAnN derives pathway-level profiles, I attempted to manually reconstruct the pathway abundance from the gene family level. For this, I used the same gene family table output by HUMAnN as input to the humann_regroup_table utility, first regrouping UniRef90 gene families to MetaCyc reactions.

Subsequently, I regrouped the reactions to MetaCyc pathways using the metacyc_pathways mapping file provided with HUMAnN:

/opt/conda/envs/package_env/lib/python3.10/site-packages/humann/data/pathways/metacyc_pathways

However, when I compared the resulting pathway abundance table from this manual regrouping process to the original pathway output produced directly by HUMAnN, I found substantial discrepancies — both quantitative (in abundance values) and qualitative (in the presence or absence of certain pathways).

I was wondering why this happened.

Best wishes,
Sumeet

1 Like

The process of going from reactions to pathways in HUMAnN is more complicated than what regroup does, which is probably why you’re seeing a lot of differences. Regroup is just summing items according to group membership, whereas HUMAnN is attempting to reconstruct pathways that are well satisfied by community enzymes (for algorithmic details, please see the original HUMAnN v1 paper in PLoS Comp Biol, which is linked elsewhere in the forum as well).

As an example, imagine that a pathway X contains 10 reactions, but only one of them (Y) is non-zero in your community. Regroup would assign Y’s abundance to X by summing, but HUMAnN proper would not report an abundance for X since it isn’t even close to being complete in the community.