PhyloPhlAn database error during StrainPhlAn 4 run

Hi,
I am getting the following error when running StrainPhlAn 4. When running StrainPhlAn 3 on the same data there are no errors. Where could be a problem? What I should search for?
‘’’
Wed Mar 15 10:03:31 2023: Start StrainPhlAn 4.0.3 execution
Wed Mar 15 10:03:31 2023: Creating temporary directory…
Wed Mar 15 10:03:31 2023: Done.
Wed Mar 15 10:03:31 2023: Filtering markers and samples…
Wed Mar 15 10:03:31 2023: Getting markers from main samples…
Wed Mar 15 10:03:31 2023: Done.
Wed Mar 15 10:03:31 2023: Getting markers from main references…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Removing bad markers / samples…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Getting markers from secondary samples and references…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Writing samples as markers’ FASTA files…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Writing filtered clade markers as FASTA file…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Calculating polymorphic rates…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Executing PhyloPhlAn…
Wed Mar 15 10:03:32 2023: Creating PhyloPhlAn database…[e] no sequences found, make sure the input folder/file provided is not empty
Wed Mar 15 10:03:32 2023: [Error] An error was ocurred executing a external tool, exiting…Wed Mar 15 10:03:32 2023: Stop StrainPhlAn execution.
‘’’

Best regards,
Nadja

Hi @Nadja
I would need some more info to understand the problem.
Can you share the full command you are using?
Which database version did you use for running metaphlan and sample2markers?
Which database do you have currently installed?

Hi @aitor.blancomiguez ,

I am using the new database mpa_vJan21_CHOCOPhlAnSGB_202103.pkl.
I am running the following commands:

sample2markers.py -n 20 -d metaphlan_database/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl -i IN_FOLDER/*.sam.bz2 -o MARKERS -b 50

strainphlan -n 20 -s MARKERS/*.pkl -o . --marker_in_n_samples 50 --sample_with_n_markers 5 --trim_sequences 50 -d metaphlan_database/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl --print_clades_only > Clades_list.txt

extract_markers.py -c t__SGB6014 -o CLADES -d metaphlan_database/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl

strainphlan -n 20 -s MARKERS/*.pkl -o SGB6014 -m CLADES/t__SGB6014.fna -c t__SGB6014 --marker_in_n_samples 50 --sample_with_n_markers 5 --trim_sequences 50 -d metaphlan_database/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl --mutation_rates

First 3 run without problems.

I am getting the file with polymorphic sides, but then Strainphlan stops with the error message:
“Creating PhyloPhlAn database…[e] no sequences found, make sure the input folder/file provided is not empty”

Best,
Nadja

This is very odd. could you run the last command with the option --debug and share the full content of the temporal folder created?

After running with --debug option, I don’t see more output than running without.
The temporal folder has 3 folders: blastn (is empty), t__SGB6014 (is empty), t__SGB6014.StrainPhlAn4 (has files with sample_name.fna, that all are empty).
There is one more file in the temporal folder “t__SGB6014.fna”, is not empty, I will try to attach it here.
t__SGB6014.fna.txt (24.2 KB)

The MetaPhlAn is installed as a singularity container on the server.

Thank you for helping

Mmmm, I see. When you increase the --sample_with_n_markers (percentage of markers present to keep a sample) parameter to let’s say 50, is it possible to build the phylogeny? It might be possible that the threshold is way too low

Thank you for the answer!

Yes, with increasing --sample_with_n_markers, I can get one step further. But still not to the end.
Now I have the following error:
Fri Mar 31 15:20:47 2023: Creating PhyloPhlAn database…
Fri Mar 31 15:20:48 2023: Done.
Fri Mar 31 15:20:48 2023: Generating PhyloPhlAn configuration file…
Fri Mar 31 15:20:49 2023: Done.
Fri Mar 31 15:20:50 2023: Processing samples…
[e] Command ‘[’/miniconda/bin/mafft’, ‘–quiet’, ‘–anysymbol’, ‘–thread’, ‘1’, ‘–auto’, ‘M_Strainphlan4/SGB8163/tmpi4jsm8ny/markers/146775037412.fna’]’ returned non-zero exit status 1.

[e] Command ‘[’/miniconda/bin/mafft’, ‘–quiet’, ‘–anysymbol’, ‘–thread’, ‘1’, ‘–auto’, ‘M_Strainphlan4/SGB8163/tmpi4jsm8ny/markers/241714728896.fna’]’ returned non-zero exit status 1.

[e] error while aligning

And it goes on with very long lines.
Do you know what could be now the problem?

Why is it now a problem to set --sample_with_n_markers lower, it was not a problem with StrainPhlAn3? And if I understand correctly, the PhyloPhlAn didn’t change yet. Am I missing something? With StrainPhlAn3 I had to set this parameter lower, otherwise some of the strains were not detected, that are important for oral microbiome.

Is the tmp folder now containing more data or is it still empty?

Now it contains more data.

Folder “blastn” is empty
Folder “clean_dna” contains all samples in .fna format
Folder “map_dna” has files .bkp and .bz2
Folder “markers” has several files like 200090307551.fna
Folder “markers_dna” has samples in .fna.bz2 format
Folder “msas” is empty
Folders “t__SGB8163” and “t__SGB8163.StrainPhlAn4” have also several files.

It stops on the mafft command.
I tried to run just the command from the error message
" /miniconda/bin/mafft --quiet --anysymbol --thread 1 --auto M_Strainphlan4/SGB8163/tmprn8w9iun/markers/200090307551.fna"
And this command runs fine.

But StrainPhlAn4 just stops there.

Can you check whether the files that failed were empty? e.g. M_Strainphlan4/SGB8163/tmpi4jsm8ny/markers/241714728896.fna
M_Strainphlan4/SGB8163/tmpi4jsm8ny/markers/146775037412.fna

All the .fna files in markers folder are not empty.

Dear Aitor,

I have the same problem, unfortunately “export TMPDIR=/path/to/my/temp/dir” did not help.
Your further assistance is greatly appreciated at your earliest.

Best regards,
-Mike