Large constant run time cost to metaphlan?

I’m trying out humann3, the newest metaphlan, MetaPhlAn version 3.0.1 (25 Jun 2020), running with bowtie2-align version 2.1.0.

The command is

/home/wbazant/.local/bin/metaphlan /home/wbazant/humann-nextflow/work/7e/c273ef3d436cddae2cc3c259455373/reads.fastq -t rel_ab -o /home/wbazant/humann-nextflow/work/7e/c273ef3d436cddae2cc3c259455373/reads_humann_temp/reads_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /home/wbazant/humann-nextflow/work/7e/c273ef3d436cddae2cc3c259455373/reads_humann_temp/reads_metaphlan_bowtie2.txt --nproc 4

It spends time in bowtie2-align building an index,

bowtie2-align --wrapper basic-0 --very-sensitive -x /home/wbazant/.local/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901 -p 4 --passthrough -U - --quiet

When using a tiny file with 20 mito reads this is the cost of this part:

    CPU time :                                   1229.94 sec.
    Max Memory :                                 2290 MB
    Average Memory :                             1699.60 MB
    Run time :                                   741 sec.

Is this unavoidable? What are the typical index load times like, with the standard chocophlan, in an efficient setup?

I’m running this in a cluster environment, but without any care for the details (yet).

First potential problem is that I’m using an old bowtie, which didn’t pick up this change:

bowtie2.2

  • Improved way in which index files are loaded for alignment. Should fix efficiency problems on some filesystems.

and a second is that I’m storing an index in a low-end filesystem (just where my python modules are, the way metaphlan installed it by default).

Unfortunately, no, the index is needed to be load into memory to perform the mapping. On my machine the index is loaded in 6.61 s ± 216 ms but the measure you are referring to is the full mapping of the input to the index