Dear Bakery Team
i use humann3 installed conda using Mac.
i use option of “–threads”.
bowtie2 and diamond work with multiprocessing.
but bowtie2-build works single processing after Creating custom ChocoPhlAn database …
bowtie2-build has option of “–threads”.
why bowtie2-build works single processing?
Dear Bakery Team
Hello, Thank you for the post. You are correct that bowtie2-build now has the threads option. When we originally wrote that portion of the code the build was small and I am not sure if bowtie2-build had a thread option. Now that it does and custom database builds are getting larger it definitely makes sense to include that in the HUMAnN software. That is something we have been considering adding for a while and are working on including in the next release.
I noticed this as well this morning. We are detecting thousands of bugs in our sample and the index build takes a really long time (one sample takes 10h on 48 cores w/ 250G memory to build the database then index it). Is there a way we can manually add that option in the command line when calling humann ?
It made me wonder if it would make sense, especially in a project with many samples, to generate a consolidated bugs list for all samples (I’m generating my bugs list using kraken at the moment so merging it is easy with kraken tools), generate one index for all my samples, then provide each separate humann run with the path to that index so as to skip the database generation?
It doesn’t seem like you can add this at the command line, though it’s a pretty simple fix to edit the actual bowtie2-build call. One way is to insert the following lines in
search/nucleotide.py in the
index() function after the
args+=opts line, e.g.:
args+=opts #add threads if config.threads > 1: args+=["--threads",config.threads]
config.py also contains a
bowtie2_build_opts parameter which is probably how the code maintainers will add this permanently in future updates.