Biobakery workflows computational speed

imontero · January 26, 2022, 10:13am

Hi

I have just installed and tested biobakery workflows with tutorial files and it works for me. It was a bi difficult to install so I created a conda environment file to save my configuration.

Now I am trying to us biobakery workflows with my samples, 32 human microbiota shotgun metagenome samples (64 paired files of 0.7-2GB each).

I used this code:

biobakery_workflows wmgx --input /mnt/d/DATOSSECUENCIACION/MVT-prj10_rendimiento_deportivo/BIKE/ --output /mnt/e/workflows_output --local-jobs 1 --threads 8

For now it takes 12 hours to perform 60/292 tasks, 9 samples.

I am running it in this PC:
Intel(R) Core™ i7-9700K CPU @ 3.60GHz
CPU(s): 8
CPU MHz: 3600.000
CPU MHz máx.: 4900,0000
CPU MHz mín.: 800,0000
Memory: 32026

Do you know any method to improve my computation speed. Could it be a code problem at --local-jobs and --threads arguments?

Thanks

lauren.j.mciver · February 11, 2022, 10:37pm

Hello, Thank you for the detailed post. I think your settings based on your compute environment are spot on. The only thing you could possibly try is to run with --local-jobs 2 --threads 4 (2 tasks at once, each with 4 threads) to see if this might speed it up a bit.

Thanks!
Lauren

imontero · February 14, 2022, 9:10am

Thank you for your help. It works

lauren.j.mciver · February 14, 2022, 9:45pm

That is great to hear! Thanks!
Lauren

Jaydee · May 29, 2024, 9:01pm

@lauren.j.mciver speaking of computational speed is there a way to utilize multiple nodes in slurm to speed up further? trying to run around 100 samples.

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

biobakery_workflows wmgx \
–input /path \
–output /path \
–local-jobs 6 \
–threads 8

lauren.j.mciver · May 29, 2024, 9:48pm

Yes, good question! Add “–grid-jobs N” to submit N grid jobs at a time (try N=100). Use “–grid-jobs N” in place of “–local-jobs N”.

Thanks!
Lauren

Jaydee · May 30, 2024, 5:29am

Thank you so much for your prompt reply. I tried the following but ended up the following error in the anadama.log.

2024-05-30 00:04:34,911	root	submit_job	ERROR: Unable to submit job to queue: sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

Here is the command I used:

biobakery_workflows wmgx \
    --input /path/RNA_reads/ \
    --output /path/workflow_wmtx/ \
    --bypass-strain-profiling \
    --remove-intermediate-output \
    --contaminate-databases /users/path/biobakery_workflows_databases/kneaddata_db_human_transcriptome,/users/path/biobakery_workflows_databases/kneaddata_db_ribosomal_rna \
    --grid-jobs 5 \
    --grid slurm \
    --grid-scratch '/path/temp/' \
    --grid-partition 'xxx' \
    --grid-environment 'conda activate mtx' \
    --input-extension fastq.bz2 \
    --threads 8

Workflow version = 3.1
when I use --local-jobs, my cluster shows no problems at all. everything works great.

what could be the reason?

lauren.j.mciver · May 30, 2024, 9:02pm

Sure thing! Thanks for trying it out. The error message indicates that a slurm job was submitted with a memory request that is over the max memory allowed for any of the partitions on your grid. It works if you run locally because the workflows does not specify the amount of memory needed to the grid. Instead you request the total memory overall in your SBATCH script.

I am working on the next release of AnADAMA2 (the workflow management system that is the grid meta-scheduler for the bioBakery workflows). In this release it will include options for the user to specify the time or memory for specific workflow tasks. With this new option you can override the default memory request for a task, which uses an equation based on the size of the input file.

Jaydee · June 12, 2024, 3:21pm

Thank you for working on the new version. It will be greatly helpful.

Topic		Replies	Views
BioBakery grid job options bioBakery workflows	1	343	June 23, 2023
Pipeline and stdout stalls bioBakery workflows	5	2297	May 15, 2020
Biobakery metagenomics memory storage requirements for running Biobakery wgmx pipeline bioBakery workflows	4	275	November 12, 2023
Biobakery_workflows wmgx ValueError: could not convert string to float: '8.355B' bioBakery workflows	0	76	May 22, 2024
Computational specifications for shotgun metagenomic workflows bioBakery workflows	0	151	January 11, 2024

Biobakery workflows computational speed

Related topics