Hi,
I ran HUMAnN 3.8 on a metagenomic dataset using default settings, which produced the expected outputs: gene family and pathway abundance tables.
To better understand how HUMAnN derives pathway-level profiles, I attempted to manually reconstruct the pathway abundance from the gene family level. For this, I used the same gene family table output by HUMAnN as input to the humann_regroup_table
utility, first regrouping UniRef90 gene families to MetaCyc reactions.
Subsequently, I regrouped the reactions to MetaCyc pathways using the metacyc_pathways
mapping file provided with HUMAnN:
/opt/conda/envs/package_env/lib/python3.10/site-packages/humann/data/pathways/metacyc_pathways
However, when I compared the resulting pathway abundance table from this manual regrouping process to the original pathway output produced directly by HUMAnN, I found substantial discrepancies — both quantitative (in abundance values) and qualitative (in the presence or absence of certain pathways).
I was wondering why this happened.
Best wishes,
Sumeet