Hello,
I’m currently running HUMAnN, and when I enable the --minpath
option, certain pathways almost completely disappear from the output.
For example, for PWY-7237: myo-, chiro- and scyllo-inositol degradation
, when MinPath is OFF I see many strain-level entries (dozens), but when MinPath is ON, only a single entry remains.
Here’s what I have checked:
-
Database integrity
-
metacyc_reactions_level4ec_only.uniref.bz2
,metacyc_pathways_structured_filtered_v24_subreactions
,map_metacyc-pwy_name.txt.gz
,map_metacyc-rxn_name.txt.gz
all have identicalmd5sum
values compared to the other environment. -
Verified via logs that both runs are using the same
--pathways-database
file:metacyc_pathways_structured_filtered_v24_subreactions
.
-
-
Environment variables / execution settings
- Setting
PYTHONHASHSEED=0
,LC_ALL=C
, and changing multi-threading settings made no difference in results.
- Setting
-
Additional observations
-
When MinPath is OFF, my results are identical to the comparison environment.
-
This means the discrepancy is introduced only during the MinPath step.
-
It seems MinPath is selecting a minimal set of pathways from RXNs, which is filtering out many strain-level pathway entries.
-
I’m wondering if noise filtering or gap-fill steps might be overly aggressive for certain pathways like PWY-7237.
-
Questions:
-
Is such a drastic reduction in pathways expected behavior when running MinPath, or could this indicate a database/mapping issue?
-
For cases like PWY-7237, what in the MinPath logic could cause this over-filtering?
-
Does the gap-fill process (and its related mapping files) have any impact on MinPath results?
Thank you in advance for your help.