metaphlan4 bowtie2-build “–threads” flag bug
Software details
metaphlan4 version: MetaPhlAn version 4.1.1 (11 Mar 2024)
bowtie2-build version: bowtie2-build version 2.2.3 64-bit, compiled with gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
Issue
Traceback output
bowtie2-build: unrecognized option '--threads'
Bowtie 2 version 2.2.3 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
reference_in comma-separated list of files with ref sequences
bt2_index_base write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1). Likewise for v1 indexes. ***
Options:
-f reference files are Fasta (default)
-c reference sequences given on cmd line (as
<reference_in>)
--large-index force generated index to be 'large', even if ref
has fewer than 4 billion nucleotides
-a/--noauto disable automatic -p/--bmax/--dcv memory-fitting
-p/--packed use packed strings internally; slower, less memory
--bmax <int> max bucket sz for blockwise suffix-array builder
--bmaxdivn <int> max bucket sz as divisor of ref len (default: 4)
--dcv <int> diff-cover period for blockwise (default: 1024)
--nodc disable diff-cover (algorithm becomes quadratic)
-r/--noref don't build .3/.4 index files
-3/--justref just build .3/.4 index files
-o/--offrate <int> SA is sampled every 2^<int> BWT chars (default: 5)
-t/--ftabchars <int> # of chars consumed in initial lookup (default: 10)
--seed <int> seed for random number generator
-q/--quiet verbose output (for debugging)
-h/--help print detailed description of tool and its options
--usage print this usage message
--version print version information and quit
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 -q /tmp/tmpm62vnjcr/v_mks.fa /tmp/tmpm62vnjcr/v_mks --threads 4
Traceback (most recent call last):
File "/usr/local/bin/metaphlan", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/metaphlan/metaphlan.py", line 1529, in main
VSC_report = vsc_bowtie2(viralTempFolder, pars['nproc'], file_format=pars['input_type'],
File "/usr/local/lib/python3.10/site-packages/metaphlan/metaphlan.py", line 450, in vsc_bowtie2
subp.check_call( [bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)] )
File "/usr/local/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bowtie2-build', '/tmp/tmpm62vnjcr/v_mks.fa', '/tmp/tmpm62vnjcr/v_mks', '-q', '--threads', '4']' returned non-zero exit status 1.
In the “./metaphlan/metaphlan.py” file, on line 445, in the subprocess call, the script calls the “bowtie2-build” command with the flag “–threads” to enable multithreading/multiprocessing. The “–threads” doesn’t exists, rather the correct switch for multi-cores seems to be “-p”.
It’s probably a case of poor documentation, as of the current “–help” flag gives the following output:
Bowtie 2 version 2.2.3 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
reference_in comma-separated list of files with ref sequences
bt2_index_base write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1). Likewise for v1 indexes. ***
Options:
-f reference files are Fasta (default)
-c reference sequences given on cmd line (as
<reference_in>)
--large-index force generated index to be 'large', even if ref
has fewer than 4 billion nucleotides
-a/--noauto disable automatic -p/--bmax/--dcv memory-fitting
-p/--packed use packed strings internally; slower, less memory
--bmax <int> max bucket sz for blockwise suffix-array builder
--bmaxdivn <int> max bucket sz as divisor of ref len (default: 4)
--dcv <int> diff-cover period for blockwise (default: 1024)
--nodc disable diff-cover (algorithm becomes quadratic)
-r/--noref don't build .3/.4 index files
-3/--justref just build .3/.4 index files
-o/--offrate <int> SA is sampled every 2^<int> BWT chars (default: 5)
-t/--ftabchars <int> # of chars consumed in initial lookup (default: 10)
--seed <int> seed for random number generator
-q/--quiet verbose output (for debugging)
-h/--help print detailed description of tool and its options
--usage print this usage message
--version print version information and quit
Command to reproduce
metaphlan \
--bowtie2db /gpfs/projects/bsc40/current/okhannous/Metaphlan4/db \
--index mpa_vJun23_CHOCOPhlAnSGB_202307 /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/BAM/corachan.unmapped.fastq.gz \
--input_type fastq \
--bowtie2out /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan.bz2 -s /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachansam.bz2 \
--profile_vsc -o /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan_profiled.txt \
--nproc 4 \
--vsc_out /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan.vsc.txt
Submitted fix
Substitute line 445 in ./metaphlan/metaphlan.py FROM:
subp.check_call( [bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)] )
TO:
try:
subp.check_call([bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)])
except subp.CalledProcessError as e:
print(e, file=sys.stderr)
errored_cmd = " ".join([bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)])
corrected_cmd = " ".join([bt2build_call, markerfile, dbpath, '-q','-p', str(nproc)])
print(
f"==> WARNING: '{errored_cmd}' command is incompatible with the "
"current version of bowtie2-build. "
f"Re-trying the process with '{corrected_cmd}'",
file=sys.stderr
)
subp.check_call([bt2build_call, markerfile, dbpath, '-q','-p', str(nproc)])
If the “–threads” flag was used, then at some point it was working correctly with older versions of Bowtie2. This way there’s backwards compatibility.