PhyloPhlAn database error during StrainPhlAn 4 run

Hi,
I am getting the following error when running StrainPhlAn 4. When running StrainPhlAn 3 on the same data there are no errors. Where could be a problem? What I should search for?
‘’’
Wed Mar 15 10:03:31 2023: Start StrainPhlAn 4.0.3 execution
Wed Mar 15 10:03:31 2023: Creating temporary directory…
Wed Mar 15 10:03:31 2023: Done.
Wed Mar 15 10:03:31 2023: Filtering markers and samples…
Wed Mar 15 10:03:31 2023: Getting markers from main samples…
Wed Mar 15 10:03:31 2023: Done.
Wed Mar 15 10:03:31 2023: Getting markers from main references…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Removing bad markers / samples…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Getting markers from secondary samples and references…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Writing samples as markers’ FASTA files…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Writing filtered clade markers as FASTA file…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Calculating polymorphic rates…
Wed Mar 15 10:03:32 2023: Done.
Wed Mar 15 10:03:32 2023: Executing PhyloPhlAn…
Wed Mar 15 10:03:32 2023: Creating PhyloPhlAn database…[e] no sequences found, make sure the input folder/file provided is not empty
Wed Mar 15 10:03:32 2023: [Error] An error was ocurred executing a external tool, exiting…Wed Mar 15 10:03:32 2023: Stop StrainPhlAn execution.
‘’’

Best regards,
Nadja

Hi @Nadja
I would need some more info to understand the problem.
Can you share the full command you are using?
Which database version did you use for running metaphlan and sample2markers?
Which database do you have currently installed?

Hi @aitor.blancomiguez ,

I am using the new database mpa_vJan21_CHOCOPhlAnSGB_202103.pkl.
I am running the following commands:

sample2markers.py -n 20 -d metaphlan_database/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl -i IN_FOLDER/*.sam.bz2 -o MARKERS -b 50

strainphlan -n 20 -s MARKERS/*.pkl -o . --marker_in_n_samples 50 --sample_with_n_markers 5 --trim_sequences 50 -d metaphlan_database/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl --print_clades_only > Clades_list.txt

extract_markers.py -c t__SGB6014 -o CLADES -d metaphlan_database/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl

strainphlan -n 20 -s MARKERS/*.pkl -o SGB6014 -m CLADES/t__SGB6014.fna -c t__SGB6014 --marker_in_n_samples 50 --sample_with_n_markers 5 --trim_sequences 50 -d metaphlan_database/mpa_vJan21_CHOCOPhlAnSGB_202103.pkl --mutation_rates

First 3 run without problems.

I am getting the file with polymorphic sides, but then Strainphlan stops with the error message:
“Creating PhyloPhlAn database…[e] no sequences found, make sure the input folder/file provided is not empty”

Best,
Nadja

This is very odd. could you run the last command with the option --debug and share the full content of the temporal folder created?

After running with --debug option, I don’t see more output than running without.
The temporal folder has 3 folders: blastn (is empty), t__SGB6014 (is empty), t__SGB6014.StrainPhlAn4 (has files with sample_name.fna, that all are empty).
There is one more file in the temporal folder “t__SGB6014.fna”, is not empty, I will try to attach it here.
t__SGB6014.fna.txt (24.2 KB)

The MetaPhlAn is installed as a singularity container on the server.

Thank you for helping