I have just installed and tested biobakery workflows with tutorial files and it works for me. It was a bi difficult to install so I created a conda environment file to save my configuration.
Now I am trying to us biobakery workflows with my samples, 32 human microbiota shotgun metagenome samples (64 paired files of 0.7-2GB each).
For now it takes 12 hours to perform 60/292 tasks, 9 samples.
I am running it in this PC:
Intel(R) Core™ i7-9700K CPU @ 3.60GHz
CPU(s): 8
CPU MHz: 3600.000
CPU MHz máx.: 4900,0000
CPU MHz mĂn.: 800,0000
Memory: 32026
Do you know any method to improve my computation speed. Could it be a code problem at --local-jobs and --threads arguments?
Hello, Thank you for the detailed post. I think your settings based on your compute environment are spot on. The only thing you could possibly try is to run with --local-jobs 2 --threads 4 (2 tasks at once, each with 4 threads) to see if this might speed it up a bit.
@lauren.j.mciver speaking of computational speed is there a way to utilize multiple nodes in slurm to speed up further? trying to run around 100 samples.
Thank you so much for your prompt reply. I tried the following but ended up the following error in the anadama.log.
2024-05-30 00:04:34,911 root submit_job ERROR: Unable to submit job to queue: sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
Sure thing! Thanks for trying it out. The error message indicates that a slurm job was submitted with a memory request that is over the max memory allowed for any of the partitions on your grid. It works if you run locally because the workflows does not specify the amount of memory needed to the grid. Instead you request the total memory overall in your SBATCH script.
I am working on the next release of AnADAMA2 (the workflow management system that is the grid meta-scheduler for the bioBakery workflows). In this release it will include options for the user to specify the time or memory for specific workflow tasks. With this new option you can override the default memory request for a task, which uses an equation based on the size of the input file.