When I want to quickly parallelize a bioinformatic tool I write one batch script that loops through my samples and one batch script that launches a slurm job for each sample. Unfortunately, I run into an anadama error when I try to do this with Baqlava:
Traceback (most recent call last):
File "/home/danielsg/miniconda3/envs/baqlava/bin/baqlava", line 7, in <module>
sys.exit(main())
File "/home/danielsg/miniconda3/envs/baqlava/lib/python3.10/site-packages/baqlava/baqlava.py", line 422, in main
workflow.go()
File "/home/danielsg/miniconda3/envs/baqlava/lib/python3.10/site-packages/anadama2/workflow.py", line 772, in go
self._backend = backends.default(self.vars.get("output"))
File "/home/danielsg/miniconda3/envs/baqlava/lib/python3.10/site-packages/anadama2/backends.py", line 22, in default
return LevelDBBackend(
File "/home/danielsg/miniconda3/envs/baqlava/lib/python3.10/site-packages/anadama2/backends.py", line 111, in __init__
self.db = leveldb.LevelDB(self.data_directory,
leveldb.LevelDBError: IO error: lock /mnt/isilon/hvp/baqlava_out/.anadama/db/LOCK: Resource temporarily unavailable
Is this something that could be set with the grid options in baqlava? Or a way to use anadama to write some other kind of workflow? Thanks.
For reference, a portion the the launcher script:
for f in /mnt/isilon/hvp/shared_folders/cleaned/*1.fastq.gz
do
bn=$(basename "$f" .fastq.gz)
echo $bn
if [[ ! -f $OUTDIR/"$bn"_BAQLaVa_profile.txt ]]; then
sbatch baqlava_batch.sh $f
fi
done
and a portion of the child script:
conda activate baqlava
set -x
set -e
baqlava \
-i $1 \
-o $OUTDIR \
--nucdb /mnt/isilon/hvp/databases/BAQLaVa.V0.5.nucleotide \
--protdb /mnt/isilon/hvp/databases/BAQLaVa.V0.5.protein \
--quit-early \
--threads 8