I’m testing out running grid-jobs with biobakery_workflows on a computing cluster and have found that bench marking each step is really slowing down the analysis. Right now I’m testing out running grid jobs with one of the demo samples and have found that benchmarking each step is taking anywhere from 1~10 minutes before the next job is submitted. I’m mostly concerned that if I ran a big batch of samples the bench marking would balloon the analysis time. Is this typically the case, or could there be some issues on my end?
- Currently I’m using biobakery_workflows 3.1 and anadama2 0.10.0
sbatch -c16 biobakery_workflows wmgx
--input ./test_files --output ./biobake_output
--functional-profiling-option='--bypass-translated-search'
--threads 32
--taxonomic-profiling-options "--bowtie2db
miniconda3/envs/biobake/lib/python3.10/site-
packages/metaphlan/metaphlan_databases/
--index mpa_vJun23_CHOCOPhlAnSGB_202403"
--grid slurm
--grid-jobs 2 --grid-partition node
--bypass-strain-profiling
Heres an example from the log:
2025-10-03 17:50:43,657 root
log_grid_output INFO: Grid 16 from task id return code:0
2025-10-03 17:50:43,658 LoggerReporter log_event
INFO: task 16, humann_renorm_ecs_relab____HD32R1_subsample.gz : grid job id 4836412 has status Getting benchmarking data
2025-10-03 17:59:43,499 root get_queue_status
INFO: Getting latest queue info to refresh job status
2025-10-03 17:59:43,566 root record_benchmark
INFO: Benchmark information for job id 16: