get_counts_from_humann_logs.py crashes with ValueError when MetaPhlAn 3.0.13 emits [WARNING] lines into HUMAnN log
Environment
| Package | Version |
|---|---|
| biobakery_workflows | 3.1 |
| metaphlan | 3.0.13 (27 Jul 2021) |
| humann | 3.9 |
| kneaddata | 0.12.4 |
| Python | 3.10 |
| OS | Linux x86_64 (TSCC HPC, Rocky Linux 9) |
What happens
The wmgx workflow fails at the humann_count_alignments_species task with:
anadama2.util.ShellException: Command `get_counts_from_humann_logs.py \
--input .../humann/main \
--output .../humann/counts/humann_read_and_species_count_table.tsv` failed.
Traceback (most recent call last):
File "get_counts_from_humann_logs.py", line 79, in <module>
main()
File "get_counts_from_humann_logs.py", line 59, in main
data[1]=int(line.split()[7][2:])
ValueError: invalid literal for int() with base 10: '[WARNING]'
Root cause
MetaPhlAn 3.0.13 emits these lines to stderr during its run, which get captured into the HUMAnN log file:
[WARNING] Failed to launch x86-64-v3 version, staying with default
[WARNING] Failed to launch x86-64-v3 version, staying with default
[WARNING] Failed to launch x86-64-v3 version, staying with default
[WARNING] Failed to launch x86-64-v3 version, staying with default
WARNING: The metagenome profile contains clades that represent multiple species
merged into a single representant. An additional column listing the merged
species is added to the MetaPhlAn output.
get_counts_from_humann_logs.py parses the HUMAnN log using a hardcoded positional index (line.split()[7]). When it hits a [WARNING] line, line.split()[7] returns '[WARNING]' instead of an integer, causing the crash.
Impact
Everything else completes successfully โ kneaddata, metaphlan taxonomic profiling, humann functional profiling, all join/regroup/renorm tables are all generated. Only humann_read_and_species_count_table.tsv is missing because of this crash.
Questions
- Is this a known issue? Should
get_counts_from_humann_logs.pybe updated to skip lines containing[WARNING]before doing positional field parsing? - Is there a safe workaround โ e.g., manually patching line 59 to filter
[WARNING]lines โ while waiting for a fix?
The log and the job script are attached.
anadama4.log.txt (29.9 KB)
biobakery_workflows_demo.sb.txt (1.3 KB)
Thanks for any help.