Strainphlan execution failed due to too many discarded samples

Dear authors,
I encountered a critical problem when running Strainphlan. The samples in your tutorial worked, but when my samples were used, there are the following issues:
mkdir -p output
strainphlan -s consensus_markers/*.pkl -m db_markers/s__Akkermansia_muciniphila.fna -r Reference_genome/AKK.fna --marker_in_n_samples 3 -o output -n 8 -c s__Akkermansia_muciniphila --mutation_rates (I already changed my marker_in_n_sample according to a similar post, and i have 12 samples in all)
Sun Jun 19 09:49:02 2022: Start StrainPhlAn 3.0.14 execution
Sun Jun 19 09:49:02 2022: Creating temporary directory…
Sun Jun 19 09:49:02 2022: Done.
Sun Jun 19 09:49:02 2022: Getting markers from main sample files…
Sun Jun 19 09:49:02 2022: Done.
Sun Jun 19 09:49:02 2022: Getting markers from main reference files…Warning: [blastn] Examining 5 or more matches is recommended

Sun Jun 19 09:49:03 2022: Done.
Sun Jun 19 09:49:03 2022: Removing bad markers / samples…
[e] Phylogeny can not be inferred. Too many samples were discarded
Sun Jun 19 09:49:03 2022: Stop StrainPhlAn 3.0 execution.

Please let me know if this problem could be solved, thank u very much!

Please ignore the last post, coz when I used --sample_with_n_markers 0 --marker_in_n_samples 0, this issue did not occur, but a following error was found:
mkdir -p output
strainphlan -s consensus_markers/*.pkl -m db_markers/s__Akkermansia_muciniphila.fna -r Reference_genome/AKK.fna --sample_with_n_markers 0 --marker_in_n_samples 0 -o output -n 8 -c s__Akkermansia_muciniphila --mutation_rates
Mon Jun 20 20:05:21 2022: Start StrainPhlAn 3.0.14 execution
Mon Jun 20 20:05:21 2022: Creating temporary directory…
Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:21 2022: Getting markers from main sample files…
Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:21 2022: Getting markers from main reference files…Warning: [blastn] Examining 5 or more matches is recommended

Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:21 2022: Removing bad markers / samples…
Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:21 2022: Writing samples as markers’ FASTA files…
Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:21 2022: Writing filtered clade markers as FASTA file…
Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:21 2022: Calculating polymorphic rates…
Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:21 2022: Executing PhyloPhlAn 3.0…
Mon Jun 20 20:05:21 2022: Creating PhyloPhlAn 3.0 database…
Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:21 2022: Generating PhyloPhlAn 3.0 configuration file…
Mon Jun 20 20:05:21 2022: Done.
Mon Jun 20 20:05:22 2022: Processing samples…[e] “/home/winger/miniconda3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/” folder does not exists

[e] Command ‘[’/home/winger/miniconda3/bin/raxmlHPC-PTHREADS-SSE3’, ‘-p’, ‘1989’, ‘-m’, ‘GTRCAT’, ‘-T’, ‘8’, ‘-w’, ‘/home/winger/strains/output’, ‘-s’, ‘output/./s__Akkermansia_muciniphila.StrainPhlAn3_concatenated.aln’, ‘-n’, ‘s__Akkermansia_muciniphila.StrainPhlAn3.tre’]’ returned non-zero exit status 255.

[e] error while executing
command_line: /home/winger/miniconda3/bin/raxmlHPC-PTHREADS-SSE3 -p 1989 -m GTRCAT -T 8 -w /home/winger/strains/output -s output/./s__Akkermansia_muciniphila.StrainPhlAn3_concatenated.aln -n s__Akkermansia_muciniphila.StrainPhlAn3.tre
stdin: None
stdout: None
env: {‘LESSOPEN’: ‘| /usr/bin/lesspipe %s’, ‘CONDA_PROMPT_MODIFIER’: '(base) ', ‘LANGUAGE’: ‘zh_CN:zh’, ‘USER’: ‘winger’, ‘XDG_SESSION_TYPE’: ‘wayland’, ‘SHLVL’: ‘1’, ‘HOME’: ‘/home/winger’, ‘CONDA_SHLVL’: ‘1’, ‘DESKTOP_SESSION’: ‘ubuntu’, ‘GNOME_SHELL_SESSION_MODE’: ‘ubuntu’, ‘GTK_MODULES’: ‘gail:atk-bridge’, ‘DBUS_STARTER_BUS_TYPE’: ‘session’, ‘SYSTEMD_EXEC_PID’: ‘2484’, ‘DBUS_SESSION_BUS_ADDRESS’: ‘unix:path=/run/user/1000/bus,guid=f35078b7ba2cfb57be3ed1d262b05eee’, ‘COLORTERM’: ‘truecolor’, ‘CE_M’: ‘’, ‘IM_CONFIG_PHASE’: ‘1’, ‘WAYLAND_DISPLAY’: ‘wayland-0’, ‘LOGNAME’: ‘winger’, '’: ‘/home/winger/miniconda3/bin/strainphlan’, ‘XDG_SESSION_CLASS’: ‘user’, ‘USERNAME’: ‘winger’, ‘TERM’: ‘xterm-256color’, ‘GNOME_DESKTOP_SESSION_ID’: ‘this-is-deprecated’, ‘_CE_CONDA’: ‘’, ‘PATH’: ‘/home/winger/miniconda3/bin:/home/winger/miniconda3/condabin:/home/winger/anaconda3/bin:/home/winger/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin’, ‘SESSION_MANAGER’: ‘local/880729:@/tmp/.ICE-unix/2484,unix/880729:/tmp/.ICE-unix/2484’, ‘XDG_MENU_PREFIX’: ‘gnome-’, ‘GNOME_TERMINAL_SCREEN’: ‘/org/gnome/Terminal/screen/51c60735_8902_412f_a4bf_e767c4052064’, ‘GNOME_SETUP_DISPLAY’: ‘:1’, ‘XDG_RUNTIME_DIR’: ‘/run/user/1000’, ‘DISPLAY’: ‘:0’, ‘LANG’: ‘zh_CN.UTF-8’, ‘XDG_CURRENT_DESKTOP’: ‘ubuntu:GNOME’, ‘XMODIFIERS’: ‘@im=ibus’, ‘XDG_SESSION_DESKTOP’: ‘ubuntu’, ‘XAUTHORITY’: ‘/run/user/1000/.mutter-Xwaylandauth.P7OGO1’, ‘LS_COLORS’: ‘rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:’, ‘GNOME_TERMINAL_SERVICE’: ‘:1.109’, ‘SSH_AGENT_LAUNCHER’: ‘gnome-keyring’, ‘SSH_AUTH_SOCK’: ‘/run/user/1000/keyring/ssh’, ‘CONDA_PYTHON_EXE’: ‘/home/winger/miniconda3/bin/python’, ‘SHELL’: ‘/bin/bash’, ‘QT_ACCESSIBILITY’: ‘1’, ‘GDMSESSION’: ‘ubuntu’, ‘LESSCLOSE’: ‘/usr/bin/lesspipe %s %s’, ‘CONDA_DEFAULT_ENV’: ‘base’, ‘QT_IM_MODULE’: ‘ibus’, ‘PWD’: ‘/home/winger/strains’, ‘XDG_CONFIG_DIRS’: ‘/etc/xdg/xdg-ubuntu:/etc/xdg’, ‘CONDA_EXE’: ‘/home/winger/miniconda3/bin/conda’, ‘DBUS_STARTER_ADDRESS’: ‘unix:path=/run/user/1000/bus,guid=f35078b7ba2cfb57be3ed1d262b05eee’, ‘XDG_DATA_DIRS’: ‘/usr/share/ubuntu:/usr/local/share/:/usr/share/:/var/lib/snapd/desktop’, ‘CONDA_PREFIX’: ‘/home/winger/miniconda3’, ‘VTE_VERSION’: ‘6800’}

[e] An error was ocurred executing a external tool, exiting…
Mon Jun 20 20:05:26 2022: Stop StrainPhlAn 3.0 execution.
Can this issue be solved?

Hi @Yi_Xu ,
The problem of the first message is that, with the current parameters, you cannot run StrainPhlAn in your samples because, due to lack of markers, too many samples have been discarded. By default, StrainPhlAn needs that, at least, the 80% of the available markers of a species are present in, at least, 80% of the samples. Lowering the --sample_with_n_markers and --markers_in_n_samples will relax this thresholds. However, you cannot lower it down to 0, that will produce samples without markers and markers in no samples and thus an error in the execution.

Best,
Aitor

Dear @aitor.blancomiguez ,
Thank you so much for your kind reply. That’s really helpful, and the current problem is, even I lowered the --sample_with_n_markers and --marker_in_n_samples to 1, and changed the species to E. coli, the same problem still occurred (too many samples have been discarded). Is there any alternative strategy to run Strainphlan in this context? For example, will manually selecting some samples with high abundance of the tested species help? Thanks.

Hi @Yi_Xu ,
You can run strainphlan with the option --print_clades_only (without specifying any species) and it will return you the species (and in how many of your samples) StrainPhlAn is able to run for the specified --sample_with_n_markers and --markers_in_n_samples parameters. Notice that the minimum number of samples to run the tree is 4

Dear @aitor.blancomiguez ,
That’s so cool, you solved my problem! When I use the option --print_clades_only, finally I’ve opportunity to know which species could be run in my samples prior to running strainphlan. And one last question (so sorry~~), after successfully running strainphlan with the species of Muribaculum instestinale, I got a series of documents in the “output” folder. Expect for subjecting it to graphlan for phylogenetic tree construction, is there a way to know which strain exists in the specified sample? Thanks~

Hi @Yi_Xu
Please, take a look at this post: Identifying Strain Names

Dear @aitor.blancomiguez ,
Thanks so much, and I will follow your suggestions, thanks again for the patience!

Dear @aitor.blancomiguez ,
I’ve got a follow-up question. As per your instructions, I first calculated which clades can be subjected to Stainphlan analysis, by using the option --print_clades_only. And there appeared a strain “Prevotella sp. MGM2”. As this is not a standard species name, I can retrieve nothing from db_marker extraction: mkdir -p db_markers
extract_markers.py -c s__Prevotellaceae bacterium -o db_markers/
usage: extract_markers.py [-h] [-d DATABASE] [-c CLADE] [-o OUTPUT_DIR]
extract_markers.py: error: unrecognized arguments: bacterium;

and if I use the term “Prevotella”, the following will occur:
mkdir -p db_markers
extract_markers.py -c s__Prevotella -o db_markers/
Sat Jun 25 17:12:41 2022: Start extract markers execution
Sat Jun 25 17:12:41 2022: Generating DB markers FASTA…
Sat Jun 25 17:13:06 2022: Done.
Sat Jun 25 17:13:06 2022: Loading MetaPhlan 3.0.14 database…
Sat Jun 25 17:13:10 2022: Done.
[e] No markers were found for the clade “s__Prevotella” in the database
Sat Jun 25 17:13:10 2022: Stop StrainPhlAn 3.0 execution.

Is there a standard species name for these unclassified genus group, to extract db_markers from database?