FIX: metaphlan4 bowtie2-build --threads flag bug #229

metaphlan4 bowtie2-build “–threads” flag bug

Software details

metaphlan4 version: MetaPhlAn version 4.1.1 (11 Mar 2024)
bowtie2-build version: bowtie2-build version 2.2.3 64-bit, compiled with gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)


Issue

Traceback output

bowtie2-build: unrecognized option '--threads'
Bowtie 2 version 2.2.3 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
    reference_in            comma-separated list of files with ref sequences
    bt2_index_base          write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as
                            <reference_in>)
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, less memory
    --bmax <int>            max bucket sz for blockwise suffix-array builder
    --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
    --dcv <int>             diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4 index files
    -3/--justref            just build .3/.4 index files
    -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
    -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
    --seed <int>            seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 -q /tmp/tmpm62vnjcr/v_mks.fa /tmp/tmpm62vnjcr/v_mks --threads 4
Traceback (most recent call last):
  File "/usr/local/bin/metaphlan", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/metaphlan/metaphlan.py", line 1529, in main
    VSC_report = vsc_bowtie2(viralTempFolder, pars['nproc'], file_format=pars['input_type'],
  File "/usr/local/lib/python3.10/site-packages/metaphlan/metaphlan.py", line 450, in vsc_bowtie2
    subp.check_call( [bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)] )
  File "/usr/local/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bowtie2-build', '/tmp/tmpm62vnjcr/v_mks.fa', '/tmp/tmpm62vnjcr/v_mks', '-q', '--threads', '4']' returned non-zero exit status 1.

In the “./metaphlan/metaphlan.py” file, on line 445, in the subprocess call, the script calls the “bowtie2-build” command with the flag “–threads” to enable multithreading/multiprocessing. The “–threads” doesn’t exists, rather the correct switch for multi-cores seems to be “-p”.

It’s probably a case of poor documentation, as of the current “–help” flag gives the following output:

Bowtie 2 version 2.2.3 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
    reference_in            comma-separated list of files with ref sequences
    bt2_index_base          write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as
                            <reference_in>)
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, less memory
    --bmax <int>            max bucket sz for blockwise suffix-array builder
    --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
    --dcv <int>             diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4 index files
    -3/--justref            just build .3/.4 index files
    -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
    -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
    --seed <int>            seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit

Command to reproduce

metaphlan \
        --bowtie2db /gpfs/projects/bsc40/current/okhannous/Metaphlan4/db \
        --index mpa_vJun23_CHOCOPhlAnSGB_202307 /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/BAM/corachan.unmapped.fastq.gz \
        --input_type fastq \
        --bowtie2out /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan.bz2 -s /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachansam.bz2 \
        --profile_vsc -o /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan_profiled.txt \
        --nproc 4 \
        --vsc_out /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan.vsc.txt

Submitted fix

Substitute line 445 in ./metaphlan/metaphlan.py FROM:

    subp.check_call( [bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)] )

TO:

    try:
        subp.check_call([bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)])
    except subp.CalledProcessError as e:
        print(e, file=sys.stderr)
        errored_cmd = " ".join([bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)])
        corrected_cmd = " ".join([bt2build_call, markerfile, dbpath, '-q','-p', str(nproc)])
        print(
            f"==> WARNING: '{errored_cmd}' command is incompatible with the "
            "current version of bowtie2-build. "
            f"Re-trying the process with '{corrected_cmd}'",
            file=sys.stderr
        )
        subp.check_call([bt2build_call, markerfile, dbpath, '-q','-p', str(nproc)])

If the “–threads” flag was used, then at some point it was working correctly with older versions of Bowtie2. This way there’s backwards compatibility.