Speeding up PhylophlAn

nicholascdove · June 20, 2022, 2:37pm

I’m running PhyloPhlAn on a rather large genome set (~20k). I’m trying to figure out the best computational resources to give it for this task. From the most recent paper it looks like 100 CPUs at 10 days would do it for 17k genomes. So more cores the better? What about memory? I experimented with a high mem volume, and it didn’t really speed things up. Is there a way for PhyloPhlAn to use more memory? Thanks!

f.asnicar · June 30, 2022, 9:16am

Hi and thanks for the question.

So, one thing to speed up things a bit (at least t a later stage) would be to be quite aggressive on trimming (like the --diversity high --fast combination of params).

Prior to this, though the most time-consuming step will be the mapping of the database markers against your ~20k inputs. In this case, PhyloPhlAn will parallelize the mapping based on the number of CPUs. Since the gain is never linear when using multi-threading we decided that we prefer to run single-thread jobs but we can parallelize on the inputs. So, the more CPUs you provide, the more inputs will be mapped at the same time. Now, when doing multiple mappings, one should look for the RAM usage of every single thread so as to not exceed the memory available in the machine.

I hope this helps, but please let me know if something is not clear.

Many thanks,
Francesco

Topic		Replies	Views
Phylophlan is running too slow when mapping DNA PhyloPhlAn	3	671	October 12, 2022
Speeding up PhyloPhlAn (RAxML-HPC step) PhyloPhlAn	3	939	May 18, 2023
Phylophlan_write_config_file: threads parameter PhyloPhlAn	3	406	March 29, 2023
Optimize run time in Phylophlan3 PhyloPhlAn	2	688	August 28, 2023
Phylogenetic tree for 2000 genomes PhyloPhlAn	2	72	July 11, 2024

Speeding up PhylophlAn

Related topics