Processing samples... error during StrainPhlAn 4 run

Hi, @aitor.blancomiguez ,
Question: I am getting the following error when running StrainPhlAn 4. I had some problems in step 5 "build the multiple sequence alignment and the phylogenetic tree”,Where could be a problem? What I should search for?

Database: I’m using sample data(http://cmprod1.cibio.unitn.it/biobakery4/github_strainphlan4/)

Command: strainphlan -s consensus_markers/*.pkl -m db_markers/t__SGB1877.fna -r reference_genomes/G000273725.fna.bz2 -o output -n 8 -c t__SGB1877 --sample_with_n_markers 0 --marker_in_n_samples 0

Mon Apr 17 16:04:33 2023: Start StrainPhlAn 4.0.6 execution
Mon Apr 17 16:04:33 2023: Creating temporary directory…
Mon Apr 17 16:04:33 2023: Done.
Mon Apr 17 16:04:33 2023: Filtering markers and samples…
Mon Apr 17 16:04:33 2023: Getting markers from main samples…
Mon Apr 17 16:04:33 2023: Done.
Mon Apr 17 16:04:33 2023: Getting markers from main references…
Warning: [blastn] Examining 5 or more matches is recommended
Mon Apr 17 16:04:34 2023: Done.
Mon Apr 17 16:04:34 2023: Removing bad markers / samples…
Mon Apr 17 16:04:34 2023: Done.
Mon Apr 17 16:04:34 2023: Getting markers from secondary samples and references…
Mon Apr 17 16:04:34 2023: Done.
Mon Apr 17 16:04:34 2023: Done.
Mon Apr 17 16:04:34 2023: Writing samples as markers’ FASTA files…
Mon Apr 17 16:04:34 2023: Done.
Mon Apr 17 16:04:34 2023: Writing filtered clade markers as FASTA file…
Mon Apr 17 16:04:34 2023: Done.
Mon Apr 17 16:04:34 2023: Calculating polymorphic rates…
Mon Apr 17 16:04:34 2023: Done.
Mon Apr 17 16:04:34 2023: Executing PhyloPhlAn…
Mon Apr 17 16:04:34 2023: Creating PhyloPhlAn database…
Mon Apr 17 16:04:35 2023: Done.
Mon Apr 17 16:04:35 2023: Generating PhyloPhlAn configuration file…
Mon Apr 17 16:04:35 2023: Done.
Mon Apr 17 16:04:35 2023: Processing samples…

[e] Command ‘[’/bin/mafft’, ‘–quiet’, ‘–anysymbol’, ‘–thread’, ‘1’, ‘–auto’, ‘output/tmpom0svapv/markers/848025373357.fna’]’ returned non-zero exit status 1.

[e] error while aligning
command_line: /bin/mafft --quiet --anysymbol --thread 1 --auto output/tmpom0svapv/markers/848025373357.fna
stdin: None
stdout: /StrainPhlAn_test/output/tmpom0svapv/msas/848025373357.aln
env: {‘XDG_SESSION_ID’: ‘322385’, ‘HOSTNAME’: ‘research’, ‘HARDWARE_PLATFORM’: ‘x86_64’, ‘TERM’: ‘xterm’, ‘SHELL’: ‘/bin/bash’, ‘HISTSIZE’: ‘1000’, ‘SSH_CLIENT’: ‘192.168.6.3 54521 22’, ‘CONDA_SHLVL’: ‘2’, ‘CONDA_PROMPT_MODIFIER’: '(MetaPhlAn) ', ‘OLDPWD’: ‘/home/xy.zhou’, ‘TIME_STYLE’: ‘+%Y-%m-%d %H:%M:%S’, ‘SSH_TTY’: ‘/dev/pts/2’, ‘USER’: ‘xy.zhou’, ‘LS_COLORS’: ‘rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.jpg=01;35:.jpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.axv=01;35:.anx=01;35:.ogv=01;35:.ogx=01;35:.aac=01;36:.au=01;36:.flac=01;36:.mid=01;36:.midi=01;36:.mka=01;36:.mp3=01;36:.mpc=01;36:.ogg=01;36:.ra=01;36:.wav=01;36:.axa=01;36:.oga=01;36:.spx=01;36:*.xspf=01;36:’, ‘CONDA_EXE’: ‘/software/Anaconda3/bin/conda’, ‘_CE_CONDA’: ‘’, ‘CONDA_PREFIX_1’: ‘/software/Anaconda3’, ‘MAIL’: ‘/var/spool/mail/xy.zhou’, ‘PATH’: ‘/bin:/software/Anaconda3/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/lampp/bin:/mngs/bin:/code/pmd/Console:/ncbi-blast-2.10.0+/bin:/mngs/soft/kraken2:/Miniconda3/bin:/home/xy.zhou/.local/bin:/home/xy.zhou/bin’, ‘CONDA_PREFIX’: ‘’, ‘PWD’: ‘/StrainPhlAn_test’, ‘LANG’: ‘zh_CN.UTF-8’, ‘PS1’: ‘(MetaPhlAn) [$USER@$PWD]$’, ‘HISTIGNORE’: ‘ls:ls -lrt:ls -al:clear:pwd’, ‘CE_M’: ‘’, ‘HISTCONTROL’: ‘ignoredups’, ‘SHLVL’: ‘1’, ‘HOME’: ‘/home/xy.zhou’, ‘CONDA_PYTHON_EXE’: ‘/software/Anaconda3/bin/python’, ‘LOGNAME’: ‘xy.zhou’, ‘SSH_CONNECTION’: ‘192.168.6.3 54521 192.168.1.229 22’, ‘CONDA_DEFAULT_ENV’: ‘MetaPhlAn’, ‘LESSOPEN’: ‘||/usr/bin/lesspipe.sh %s’, ‘XDG_RUNTIME_DIR’: ‘/run/user/1006’, ‘HISTTIMEFORMAT’: ‘[%Y.%m.%d %H:%M:%S]’, '’: ‘/bin/strainphlan’, ‘TMPDIR’: ‘/tmp’}

[e] Command ‘[’/bin/mafft’, ‘–quiet’, ‘–anysymbol’, ‘–thread’, ‘1’, ‘–auto’, ‘output/tmpom0svapv/markers/848025373357.fna’]’ returned non-zero exit status 1.

[e] error while aligning
{‘program_name’: ‘/bin/mafft’, ‘params’: ‘–quiet --anysymbol --thread 1 --auto’, ‘version’: ‘–version’, ‘command_line’: ‘#program_name# #params# #input# > #output#’, ‘environment’: ‘TMPDIR=/tmp’}
output/tmpom0svapv/markers/848025373357.fna
/data/analysis/xingyazhou/StrainPhlAn_test/output/tmpom0svapv/msas
848025373357.aln

[e] Command ‘[’/bin/mafft’, ‘–quiet’, ‘–anysymbol’, ‘–thread’, ‘1’, ‘–auto’, ‘output/tmpom0svapv/markers/848025373357.fna’]’ returned non-zero exit status 1.

[e] msas crashed
Mon Apr 17 16:04:37 2023: [Error] An error was ocurred executing a external tool, exiting…
Mon Apr 17 16:04:37 2023: Stop StrainPhlAn execution.

Database:mpa_vOct22_CHOCOPhlAnSGB_202212

Best regards,
xingya

Hi @zxyEmily
I see two problems with the analysis:

  1. The tutorial read files were filtered out for speed up purposes to contain only reads mapping against the Jan21 markers. I see that in your case you are running it against Oct22. As the markers between versions might change, it can produce slightly different results or even not work.
  2. The --sample_with_n_markers and --marker_in_n_samples parameters are set up to 0%, so you are adding empty markers and empty samples to the msa that might be producing the mafft error you are seeing.