Biobakery_workflows using SGE hangs after kneaddata completes, gives error for each task

Keaton_Stagaman · March 31, 2023, 7:10pm

Hello, I’m trying to set up biobakery_workflows on a centos 7 cluster.

I’m running the following command:

biobakery_workflows wmgx \
    --input $INPUT \
    --output $OUTPUT \
    --threads 20 \
    --taxonomic-profiling-options '--bowtie2db /home/.local/lib/python3.10/site-packages/metaphlan/metaphlan_databases --index mpa_vJan21_CHOCOPhlAnSGB_202103' \
    --log-level DEBUG \
    --strain-profiling-options '--database /home/.local/lib/python3.10/site-packages/metaphlan/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl' \
    --grid-jobs $NJOBS \
    --grid sge \
    --grid-environment 'source install_workflows/bin/activate' \
    --grid-partition $QNAME

It successfully submits jobs to the queue, and kneaddata completes successfully. For example, one of the output files reads:

Decompressing gzipped file ...

Reformatting file sequence identifiers ...

Initial number of reads ( /home/DIR/Test_output/kneaddata/main/reformatted_identifiers0_06nl5s_decompressed_4fvmccat_SMPL1 ): 2717219.0
Running Trimmomatic ...
Total reads after trimming ( /home/DIR/Test_output/kneaddata/main/SMPL1.trimmed.fastq ): 2274338.0
Running trf ...
Decontaminating ...
Running bowtie2 ...
Total reads after removing those found in reference database ( /home/DIR/Test_output/kneaddata/main/SMPL1_hg37dec_v0.1_bowtie2_clean.fastq ): 2269078.0
Total reads after merging results from multiple databases ( /home/DIR/Test_output/kneaddata/main/SMPL1.fastq ): 2269078.0

Final output file created:
/home/DIR/Test_output/kneaddata/main/SMPL1.fastq

However, after kneaddata each task gives the same type of error to STDOUT:

  File "/r/build_centos7/_admin/build/bld/python/xujvlkcqoh5q/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/.local/lib/python3.10/site-packages/anadama2/grid/grid.py", line 485, in run
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/r/build_centos7/_admin/build/bld/python/xujvlkcqoh5q/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/.local/lib/python3.10/site-packages/anadama2/grid/grid.py", line 485, in run
    return runners.worker_run_loop(self.work_q, self.result_q, self.run_task_by_type,
    return runners.worker_run_loop(self.work_q, self.result_q, self.run_task_by_type,
  File "/home/.local/lib/python3.10/site-packages/anadama2/runners.py", line 184, in worker_run_loop
  File "/home/.local/lib/python3.10/site-packages/anadama2/runners.py", line 184, in worker_run_loop
    result = run_task(task, extra)
  File "/home/.local/lib/python3.10/site-packages/anadama2/grid/grid.py", line 494, in run_task_by_type
    result = run_task(task, extra)
  File "/home/.local/lib/python3.10/site-packages/anadama2/grid/grid.py", line 494, in run_task_by_type
    return cls.run_task_command(task, extra)
  File "/home/.local/lib/python3.10/site-packages/anadama2/grid/grid.py", line 533, in run_task_command
    return cls.run_task_command(task, extra)
  File "/home/.local/lib/python3.10/site-packages/anadama2/grid/grid.py", line 533, in run_task_command
    while ( grid_queue.job_timeout(job_final_status, jobid, time) or grid_queue.job_memkill(job_final_status, jobid, memory) ) and resubmission < 3:
  File "/home/.local/lib/python3.10/site-packages/anadama2/grid/sge.py", line 150, in job_timeout
    while ( grid_queue.job_timeout(job_final_status, jobid, time) or grid_queue.job_memkill(job_final_status, jobid, memory) ) and resubmission < 3:
  File "/home/.local/lib/python3.10/site-packages/anadama2/grid/sge.py", line 150, in job_timeout
    exceed_allocation = True if float(new_time) > float(time) else False
    exceed_allocation = True if float(new_time) > float(time) else False
TypeError: float() argument must be a string or a real number, not 'list'
TypeError: float() argument must be a string or a real number, not 'list'

The .err files in sge_files all contain this:

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified

I can’t tell if this is an issue with how anadama2 is submitting jobs, or with how I’ve set things up, any help would be appreciated.

lauren.j.mciver · August 30, 2024, 5:53pm

Hi @Keaton_Stagaman , I am sorry for the very slow response on our end. It looks like the AnADAMA2 sge meta-scheduler works with your environment to submit and track jobs. I think the error is from the sgb benchmarking portion that records time and memory used for each job. I think this portion is not in sync with your grid environment. If you would try running without benchmarking, using the option "--grid-benchmark off", I think this should resolve the errors.

Thank you,
Lauren

Topic		Replies	Views
Pipeline and stdout stalls bioBakery workflows	5	2297	May 15, 2020
Biobakery_workflows wmgx ValueError: could not convert string to float: '8.355B' bioBakery workflows	0	76	May 22, 2024
Error_report:biobakery_workflows wmgx bioBakery workflows	3	1445	March 1, 2024
Error_report: biobakery_workflows wmgx bioBakery workflows	0	244	December 9, 2023
Biobakery conda install: can't run example data bioBakery workflows	0	426	June 30, 2022

Biobakery_workflows using SGE hangs after kneaddata completes, gives error for each task

Related topics