What is the approximate running time of a sample and how to reduce the running time

c_liu · May 21, 2024, 9:15am

I am using humann3.9 and metaphlan4.1 for fastq file processing. I want to know if it is normal to process a fastq.gz file of about 5GB that takes close to 12 hours. Moreover, I have 2000 such samples, how can I reduce the overall running time of my project. Here are the parameters:
humann --input /public/home/CXZX03/perl5/3_tasks/1_metaphlan/demo/AA0001.fq.gz --threads 24 --search-mode uniref90 --remove-temp-output --nucleotide-database /public/home/CXZX03/perl5/2_data_base/humann/chocophlan --protein-database /public/home/CXZX03/perl5/2_data_base/humann/uniref_90 --output /public/home/CXZX03/perl5/3_tasks/2_humann/output/tmp --metaphlan-options=“–bowtie2db /public/home/CXZX03/perl5/2_data_base/Metaphlan4/vJun23”

franzosa · June 20, 2024, 9:17pm

For comparison, a 10M read sample metagenome I use for testing takes about 40 CPU hours running in pure translated search mode. The tiered workflow takes 16 CPU hours and bypassing translated search (so HUMAnN stops after nucleotide search) takes 3 CPU hours.

12 hours on 24 threads is 288 CPU hours, which would only make sense if 1) your sample was much larger than mine and 2) highly uncharacterized (such that most of the work is being done in the translated search phase). Are either/both of those statements true?

Incidentally, when multithreading read mapping tools I tend to max out at 8 threads, since my experience has been that the performance improvement is highly sublinear in the number of threads used. The stats I cited above were based on a run with 8 threads.

Topic		Replies	Views
MetaPhlAn4 run time MetaPhlAn	1	745	March 30, 2023
Humann3 computation speed HUMAnN	1	2019	September 29, 2020
Time of analysis questions HUMAnN	2	681	December 11, 2021
Speed up humann3 Data resource	1	738	June 26, 2020
Optimising Humann run time - low species number - uniref database question HUMAnN	2	886	February 11, 2022

What is the approximate running time of a sample and how to reduce the running time

Related topics