HUMAnN MinPath drastically reducing pathway counts and inconsistent results

Hello,

I’m currently running HUMAnN, and when I enable the --minpath option, certain pathways almost completely disappear from the output.
For example, for PWY-7237: myo-, chiro- and scyllo-inositol degradation, when MinPath is OFF I see many strain-level entries (dozens), but when MinPath is ON, only a single entry remains.

Here’s what I have checked:

  1. Database integrity

    • metacyc_reactions_level4ec_only.uniref.bz2, metacyc_pathways_structured_filtered_v24_subreactions, map_metacyc-pwy_name.txt.gz, map_metacyc-rxn_name.txt.gz all have identical md5sum values compared to the other environment.

    • Verified via logs that both runs are using the same --pathways-database file: metacyc_pathways_structured_filtered_v24_subreactions.

  2. Environment variables / execution settings

    • Setting PYTHONHASHSEED=0, LC_ALL=C, and changing multi-threading settings made no difference in results.
  3. Additional observations

    • When MinPath is OFF, my results are identical to the comparison environment.

    • This means the discrepancy is introduced only during the MinPath step.

    • It seems MinPath is selecting a minimal set of pathways from RXNs, which is filtering out many strain-level pathway entries.

    • I’m wondering if noise filtering or gap-fill steps might be overly aggressive for certain pathways like PWY-7237.

Questions:

  • Is such a drastic reduction in pathways expected behavior when running MinPath, or could this indicate a database/mapping issue?

  • For cases like PWY-7237, what in the MinPath logic could cause this over-filtering?

  • Does the gap-fill process (and its related mapping files) have any impact on MinPath results?

Thank you in advance for your help.

MinPath tries to explain the reactions you saw using the smallest number of pathways possible, so if you turn it on you will 100% expect to see fewer pathways in the output! Imagine you have three pathways:

  1. A → B
  2. B → C
  3. A → B → C

If you see all three reactions in your sample (A + B + C), and MinPath is on, HUMAnN will only report pathway #3 since that is the simplest explanation of the three reactions. Whereas if you turn MinPath off then HUMAnN would report all three pathways.