The bioBakery help forum

Biobakery workflows strainphlan failed due to makeblastdb error

Hello, first of all I apologize for starting another topic, I am working on Singularity image of Workflows that I want to use on our cluster and I am having hard time with bugs. I am based on docker image with Humann version 3.0.0.a.7.

After running workflows on RNAseq data with the following command: biobakery_workflows wmgx --input ./workflow_test/ --output ./workflow_test_out_sin5/ --input-extension fastq --local-jobs 2 --threads 4 --pair-identifier “.R1” the following error (from anadama):

021-11-18 07:49:36,208 LoggerReporter task_failed ERROR: task 82, strainphlan_clade_3 : Failed! Error message : Error executing action 0. Original Exception:
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/anadama2/runners.py”, line 201, in _run_task_locally
action_func(task)
File “/usr/local/lib/python3.6/dist-packages/biobakery_workflows/tasks/shotgun.py”, line 762, in strainphlan
args=[os.path.abspath(os.path.join(os.path.dirname(task.depends[0].name),"…")),os.path.dirname(task.targets[0].name),profile_clade,threads])
File “/usr/local/lib/python3.6/dist-packages/biobakery_workflows/utilities.py”, line 1049, in run_task
return_code = sh(command)()
File “/usr/local/lib/python3.6/dist-packages/anadama2/helpers.py”, line 89, in actually_sh
ret = sh(s, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/anadama2/util/init.py”, line 320, in sh
raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
anadama2.util.ShellException: [Errno 1] Command `strainphlan --samples /root/tmp/workflow_test_out_sin5/strainphlan//.pkl --output_dir /root/tmp/workflow_test_out_sin5/strainphlan --clade s__Eubacterium_siraeum --nprocs 4 --clade_markers /root/tmp/workflow_test_out_sin5/strainphlan/s__Eubacterium_siraeum.fna > /root/tmp/workflow_test_out_sin5/strainphlan/2_clade.log && touch /root/tmp/workflow_test_out_sin5/strainphlan/2_clade.tree && if [ -f /root/tmp/workflow_test_out_sin5/strainphlan/RAxML_bestTree.s__Eubacterium_siraeum.tree ]; then cp /root/tmp/workflow_test_out_sin5/strainphlan/RAxML_bestTree.s__Eubacterium_siraeum.tree /root/tmp/workflow_test_out_sin5/strainphlan/2_clade.tree; fi’ failed.
Out: b’’
Err: b"\n[e] Command ‘[’/usr/bin/makeblastdb’, ‘-parse_seqids’, ‘-dbtype’, ‘nucl’, ‘-in’, ‘/root/tmp/workflow_test_out_sin5/strainphlan/tmphwmn2xmn/s__Eubacterium_siraeum/s__Eubacterium_siraeum.fna’, ‘-out’, ‘/root/tmp/workflow_test_out_sin5/strainphlan/tmphwmn2xmn/s__Eubacterium_siraeum/s__Eubacterium_siraeum’]’ returned non-zero exit status 1.\n\n[e] cannot execute command\n command_line: /usr/bin/makeblastdb -parse_seqids -dbtype nucl -in /root/tmp/workflow_test_out_sin5/strainphlan/tmphwmn2xmn/s__Eubacterium_siraeum/s__Eubacterium_siraeum.fna -out /root/tmp/workflow_test_out_sin5/strainphlan/tmphwmn2xmn/s__Eubacterium_siraeum/s__Eubacterium_siraeum\n stdin: None\n stdout: None\n env: {‘SUDO_GID’: ‘0’, ‘MAIL’: ‘/var/mail/root’, ‘USER’: ‘root’, ‘LD_LIBRARY_PATH’: ‘/.singularity.d/libs’, ‘SHLVL’: ‘1’, ‘HOME’: ‘/root’, ‘USER_PATH’: ‘/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin’, ‘PS1’: 'Singularity> ', ‘COLORTERM’: ‘truecolor’, ‘SUDO_UID’: ‘0’, ‘SINGULARITY_ENVIRONMENT’: ‘/.singularity.d/env/91-environment.sh’, ‘LOGNAME’: ‘root’, '
’: ‘/usr/local/bin/biobakery_workflows’, ‘USERNAME’: ‘root’, ‘TERM’: ‘xterm-256color’, ‘STRAINPHLAN_DB_MARKERS’: ‘/root/SIN_DB/metaphlan_databases/’, ‘PATH’: ‘/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin’, ‘STRAINPHLAN_DB_REFERENCE’: ‘/root/SIN_DB/metaphlan_databases/’, ‘DISPLAY’: ‘:10’, ‘SINGULARITY_COMMAND’: ‘shell’, ‘LANG’: ‘en_US.UTF-8’, ‘LS_COLORS’: ‘rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:’, ‘XAUTHORITY’: ‘/var/opt/thinlinc/sessions/root/10/Xauthority’, ‘SUDO_COMMAND’: ‘/usr/local/bin/singularity shell 300a7 --writable’, ‘SHELL’: ‘/bin/bash’, ‘SUDO_USER’: ‘root’, ‘SINGULARITY_CONTAINER’: ‘/root/tmp/300a7’, ‘SINGULARITY_BIND’: ‘’, ‘PWD’: ‘/root/tmp’, ‘KNEADDATA_DB_HUMAN_GENOME’: ‘/root/SIN_DB/kneaddata_database/human’, ‘SINGULARITY_NAME’: ‘300a7’}\n\n[e] An error was ocurred executing a external tool, exiting…\nThu Nov 18 07:49:36 2021: Stop StrainPhlAn 3.0 execution.\n"

After some investigation it turns out that makeblastdb fails with error:

volume: /root/tmp/workflow_test_out_sin5/strainphlan/tmp23epv8pv/s__Bacteroides_uniformis/s__Bacteroides_uniformis

file: /root/tmp/workflow_test_out_sin5/strainphlan/tmp23epv8pv/s__Bacteroides_uniformis/s__Bacteroides_uniformis.nin
file: /root/tmp/workflow_test_out_sin5/strainphlan/tmp23epv8pv/s__Bacteroides_uniformis/s__Bacteroides_uniformis.nhr
file: /root/tmp/workflow_test_out_sin5/strainphlan/tmp23epv8pv/s__Bacteroides_uniformis/s__Bacteroides_uniformis.nsq
file: /root/tmp/workflow_test_out_sin5/strainphlan/tmp23epv8pv/s__Bacteroides_uniformis/s__Bacteroides_uniformis.nsi
file: /root/tmp/workflow_test_out_sin5/strainphlan/tmp23epv8pv/s__Bacteroides_uniformis/s__Bacteroides_uniformis.nsd
file: /root/tmp/workflow_test_out_sin5/strainphlan/tmp23epv8pv/s__Bacteroides_uniformis/s__Bacteroides_uniformis.nog

BLAST Database creation error: Error: Duplicate seq_ids are found:
LCL|630968278759

And indeed, the file contents are doubled (each ID is twice; ). What could be the reason for such error? Any possible workarounds?

The file in question is an output of extract_markers.py, is it possible that extract_markers.py is somehow missconfigured?

Updating PhyloPhlAn to the latest version solved the issue.