I’m trying out humann3, the newest metaphlan, MetaPhlAn version 3.0.1 (25 Jun 2020), running with bowtie2-align version 2.1.0.
The command is
/home/wbazant/.local/bin/metaphlan /home/wbazant/humann-nextflow/work/7e/c273ef3d436cddae2cc3c259455373/reads.fastq -t rel_ab -o /home/wbazant/humann-nextflow/work/7e/c273ef3d436cddae2cc3c259455373/reads_humann_temp/reads_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /home/wbazant/humann-nextflow/work/7e/c273ef3d436cddae2cc3c259455373/reads_humann_temp/reads_metaphlan_bowtie2.txt --nproc 4
It spends time in bowtie2-align building an index,
bowtie2-align --wrapper basic-0 --very-sensitive -x /home/wbazant/.local/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901 -p 4 --passthrough -U - --quiet
When using a tiny file with 20 mito reads this is the cost of this part:
CPU time : 1229.94 sec.
Max Memory : 2290 MB
Average Memory : 1699.60 MB
Run time : 741 sec.
Is this unavoidable? What are the typical index load times like, with the standard chocophlan, in an efficient setup?
I’m running this in a cluster environment, but without any care for the details (yet).
First potential problem is that I’m using an old bowtie, which didn’t pick up this change:
bowtie2.2
- Improved way in which index files are loaded for alignment. Should fix efficiency problems on some filesystems.
and a second is that I’m storing an index in a low-end filesystem (just where my python modules are, the way metaphlan installed it by default).