How to parallelize baqlava over many samples?

When I want to quickly parallelize a bioinformatic tool I write one batch script that loops through my samples and one batch script that launches a slurm job for each sample. Unfortunately, I run into an anadama error when I try to do this with Baqlava:

Traceback (most recent call last):
  File "/home/danielsg/miniconda3/envs/baqlava/bin/baqlava", line 7, in <module>
    sys.exit(main())
  File "/home/danielsg/miniconda3/envs/baqlava/lib/python3.10/site-packages/baqlava/baqlava.py", line 422, in main
    workflow.go()
  File "/home/danielsg/miniconda3/envs/baqlava/lib/python3.10/site-packages/anadama2/workflow.py", line 772, in go
    self._backend = backends.default(self.vars.get("output"))
  File "/home/danielsg/miniconda3/envs/baqlava/lib/python3.10/site-packages/anadama2/backends.py", line 22, in default
    return LevelDBBackend(
  File "/home/danielsg/miniconda3/envs/baqlava/lib/python3.10/site-packages/anadama2/backends.py", line 111, in __init__
    self.db = leveldb.LevelDB(self.data_directory,
leveldb.LevelDBError: IO error: lock /mnt/isilon/hvp/baqlava_out/.anadama/db/LOCK: Resource temporarily unavailable

Is this something that could be set with the grid options in baqlava? Or a way to use anadama to write some other kind of workflow? Thanks.

For reference, a portion the the launcher script:

for f in /mnt/isilon/hvp/shared_folders/cleaned/*1.fastq.gz
do
  bn=$(basename "$f" .fastq.gz)
  echo $bn

  if [[ ! -f $OUTDIR/"$bn"_BAQLaVa_profile.txt ]]; then
      sbatch baqlava_batch.sh $f
  fi
done

and a portion of the child script:

conda activate baqlava

set -x
set -e

baqlava \
    -i $1 \
    -o $OUTDIR \
    --nucdb /mnt/isilon/hvp/databases/BAQLaVa.V0.5.nucleotide \
    --protdb /mnt/isilon/hvp/databases/BAQLaVa.V0.5.protein \
    --quit-early \
    --threads 8

Hi Scott!

I believe this is happening because anadama workflows will create locks on any directories they have in use. Individual slurm jobs launching for e.g. samples #1-N with the output directory being the same location for all samples will thus cause a lock issue. This can be resolved by specifying unique output locations e.g. outputdir/sample1… outputdir/sampleN.
We are currently working on an update to facilitate parallelized run of multiple samples so that users do not run into this issue in the future. Please let me know if the suggestion above does not resolve your issue!

Thanks, Jordan

Thanks, I will try that