Get_counts_from_humann_logs.py crashes with ValueError when MetaPhlAn 3.0.13 emits [WARNING] lines into HUMAnN log (biobakery_workflows 3.1)

get_counts_from_humann_logs.py crashes with ValueError when MetaPhlAn 3.0.13 emits [WARNING] lines into HUMAnN log

Environment

Package Version
biobakery_workflows 3.1
metaphlan 3.0.13 (27 Jul 2021)
humann 3.9
kneaddata 0.12.4
Python 3.10
OS Linux x86_64 (TSCC HPC, Rocky Linux 9)

What happens

The wmgx workflow fails at the humann_count_alignments_species task with:

anadama2.util.ShellException: Command `get_counts_from_humann_logs.py \
  --input .../humann/main \
  --output .../humann/counts/humann_read_and_species_count_table.tsv` failed.

Traceback (most recent call last):
  File "get_counts_from_humann_logs.py", line 79, in <module>
    main()
  File "get_counts_from_humann_logs.py", line 59, in main
    data[1]=int(line.split()[7][2:])
ValueError: invalid literal for int() with base 10: '[WARNING]'

Root cause

MetaPhlAn 3.0.13 emits these lines to stderr during its run, which get captured into the HUMAnN log file:

[WARNING] Failed to launch x86-64-v3 version, staying with default
[WARNING] Failed to launch x86-64-v3 version, staying with default
[WARNING] Failed to launch x86-64-v3 version, staying with default
[WARNING] Failed to launch x86-64-v3 version, staying with default
WARNING: The metagenome profile contains clades that represent multiple species
merged into a single representant. An additional column listing the merged
species is added to the MetaPhlAn output.

get_counts_from_humann_logs.py parses the HUMAnN log using a hardcoded positional index (line.split()[7]). When it hits a [WARNING] line, line.split()[7] returns '[WARNING]' instead of an integer, causing the crash.


Impact

Everything else completes successfully โ€” kneaddata, metaphlan taxonomic profiling, humann functional profiling, all join/regroup/renorm tables are all generated. Only humann_read_and_species_count_table.tsv is missing because of this crash.


Questions

  1. Is this a known issue? Should get_counts_from_humann_logs.py be updated to skip lines containing [WARNING] before doing positional field parsing?
  2. Is there a safe workaround โ€” e.g., manually patching line 59 to filter [WARNING] lines โ€” while waiting for a fix?

The log and the job script are attached.

anadama4.log.txt (29.9 KB)
biobakery_workflows_demo.sb.txt (1.3 KB)

Thanks for any help.