MetaPhlAn4 run time

MetaPhlAn version 4.0.4 (17 Jan 2023)

Hi friends,
I would like to gain some perspective on the normal run time of MetaPhlAn4 as I am am concerned it is taking much longer than it should.

For reference I have shotgun sequencing data in the form of paired end fastq.gz files. Each single fastq.gz file is on average 3GB.
I am running MetaPhlAn4 on each paired sample, providing 25GB memory and 6 cores for each sample. This is an example of what my command looks like for running one of these samples…

metaphlan path/to/sampleX_R1.fastq.gz, path/to/sampleX_R2.fastq.gz --bowtie2db /path/to/metaphlan4_database --bowtie2out path/to/sampleX_metagenome.bowtie2.bz2 --nproc 6 --input_type fastq -o /path/to/sampleX_profiled_metagenome.txt

I am finding that it is taking around 2 hours for a single sample to be processed. This seems like a long time. It’s a concern as I have a couple thousand samples to process on a 32core 128GB machine so it will take almost 3 weeks for metaphlan to run on all of them.

If anyone has experience running MetaPhlAn4 I would appreciate your input on if what I am seeing is normal.

Thank you!

Hi @owright
It depends on the type of data you are analyzing, for a typical human gut metagenome of 50M reads it should be around few hours using ~5 cores. Other more diverse environments as the rumen, as more reads are mapping, could go up to 10h