Phylophlan_write_config_file: threads parameter

nick-youngblut · March 24, 2023, 7:31pm

For phylophlan v3.0.3, it appears that phylophlan_write_config_file does not have a --threads parameter, so --threads set for the diamond jobs in the output config file are all just set to --threads 1. An example:

[map_aa]
program_name = /usr/local/bin/diamond
params = blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0

Moreover, it appears that phylophlan will use the --threads set via the config file (so --threads 1) even if phylophlan --nproc is set to a greater number of threads.

If this is indeed the case, it would be helpful to include a --threads parameter for phylophlan_write_config_file which sets the threads for all multi-threaded jobs specified by the config.

f.asnicar · March 28, 2023, 4:48am

Hi there! That’s wanted by design. I believe that there is (almost) never a linear speed-up when using multi-threading and the number of threads specified. So, for the sub-jobs within PhyloPhlAn I prefer running --nproc of them each with a single thread than running them sequentially one after the other using the --nproc number of threads. That’s why the config specifies using only 1 thread. If you look at the RAxML definition in the config (for either [tree1] or [tree2]), in that case, you’ll see that the number of threads is not set and will be set by PhyloPhlAn using the --nproc parameter specified by the user (when the multi-threading version of RAxML is found in the system).

I hope this clarifies the issue.

Many thanks,
Francesco

nick-youngblut · March 29, 2023, 2:23pm

Thanks for helping to clarify! So --nproc is per-genome, correct? I could then see the multiplication issue of --nproc x --threads (e.g., threads for blastp), which could overload the compute resources. Assuming one has many genomes, parallelizing at the level of genome instead of intra-genome (e.g., multi-threaded blastp per genome) is probably best.

f.asnicar · March 29, 2023, 3:36pm

Yes, --nproc is per-genome when mapping, extracting, selecting, aligning, and trimming markers (also for single gene trees reconstruction, if someone uses that pipeline). Then it is passed on to jobs (like RAxML and IQTREE) that cannot be parallelized per genome, so to exploit intra-multi-threading.

Topic		Replies	Views
Speeding up PhylophlAn PhyloPhlAn	1	436	June 30, 2022
Phylophlan is running too slow when mapping DNA PhyloPhlAn	3	685	October 12, 2022
Speeding up PhyloPhlAn (RAxML-HPC step) PhyloPhlAn	3	942	May 18, 2023
Running time during mapping with diamond PhyloPhlAn	1	711	November 18, 2022
Configuration file error PhyloPhlAn	1	1067	June 29, 2020

Phylophlan_write_config_file: threads parameter

Related topics