Hello!
I’ve been having issues with StrainPhlAn when I try to print the set of clades that work for my samples through the --print_clades_only
option. I am currently trying to run it on the sample set provided in the tutorial as, essentially, a positive control for what I’m hoping to achieve in my own data. This is my code and the resulting text:
strainphlan -s consensus_markers/*.json.bz2 -o clade_names --print_clades_only -d metaphlan_db/mpa_vJun23_CHOCOPhlAnSGB_202307.pkl
Thu Sep 11 13:44:42 2025: Start StrainPhlAn 4.1.0 execution
Thu Sep 11 13:44:42 2025: Loading MetaPhlAn mpa_vJun23_CHOCOPhlAnSGB_202307 database...
Thu Sep 11 13:45:05 2025: Done.
Thu Sep 11 13:45:12 2025: Processing samples...
Thu Sep 11 13:45:12 2025: Constructing the big marker matrix
Thu Sep 11 13:45:12 2025: Checking 1 species
Thu Sep 11 13:45:12 2025: Done.
Thu Sep 11 13:45:13 2025: Detected clades:
Thu Sep 11 13:45:13 2025: Done.
Thu Sep 11 13:45:13 2025: Finish StrainPhlAn 4.1.0 execution (30.89 seconds): Results are stored at "clade_names"
It correctly identifies and checks the 1 species present in that positive control data, but then the print_clades_only.tsv
is entirely empty:
Clade Number_of_samples
I have confirmed that the database I use for MetaPhlAn and the one I link to above and use for all previous steps is indeed consistent - I’ve checked all of the intermediate files where there is a database name listed and confirmed they are the same.
Are there ways to get more verbosity from StrainPhlAn to understand why it considered the 1 detected species/clade to not be sufficient to place in the print_clades_only.tsv? Are there other arguments I need to provide StrainPhlAn to detect the species/clade correctly?
If there are other components or outputs I can provide, do let me know. Thank you so much!
Just wanted to add that I’ve run the specific clade that is used for the tutorial and it works:
strainphlan -s consensus_markers/*.json.bz2 \
> -o clade_pos -c t__SGB1877 \
> -d metaphlan_db/mpa_vJun23_CHOCOPhlAnSGB_202307.pkl
Thu Sep 11 15:26:00 2025: Start StrainPhlAn 4.1.0 execution
Thu Sep 11 15:26:00 2025: Loading MetaPhlAn mpa_vJun23_CHOCOPhlAnSGB_202307 database...
Thu Sep 11 15:26:23 2025: Done.
Thu Sep 11 15:26:24 2025: Creating temporary directory...
Thu Sep 11 15:26:24 2025: Done.
Thu Sep 11 15:26:24 2025: Filtering markers and samples...
Thu Sep 11 15:26:24 2025: Getting markers from samples...
Thu Sep 11 15:26:24 2025: Done.
Thu Sep 11 15:26:24 2025: Removing markers / samples...
Thu Sep 11 15:26:24 2025: Done.
Thu Sep 11 15:26:24 2025: Done.
Thu Sep 11 15:26:24 2025: Writing samples as markers' FASTA files...
Thu Sep 11 15:26:24 2025: Done.
Thu Sep 11 15:26:24 2025: Calculating polymorphic rates...
Thu Sep 11 15:26:24 2025: Done.
Thu Sep 11 15:26:24 2025: Computing phylogeny...
Thu Sep 11 15:26:24 2025: Generating PhyloPhlAn configuration file...
Thu Sep 11 15:26:27 2025: Done.
Thu Sep 11 15:26:27 2025: Executing PhyloPhlAn...
Thu Sep 11 15:26:34 2025: Done.
Thu Sep 11 15:26:34 2025: Done.
Thu Sep 11 15:26:34 2025: Writing information file...
Thu Sep 11 15:26:34 2025: Done.
Thu Sep 11 15:26:34 2025: Removing temporary files...
Thu Sep 11 15:26:34 2025: Done.
Thu Sep 11 15:26:34 2025: Finish StrainPhlAn 4.1.0 execution (33.48 seconds): Results are stored at "clade_pos"
This also suggests to me that the database is correctly linked up and it works outside of the print_clades_only
option.
Hello!
Just reporting that upticking my version to 4.1.1 resolves this!
> strainphlan -s consensus_markers/*.json.bz2 \
> -o clade_names --print_clades_only \
> -d metaphlan_db/mpa_vJun23_CHOCOPhlAnSGB_202307.pkl
Mon Sep 15 14:30:47 2025: Start StrainPhlAn 4.1.1 execution
Mon Sep 15 14:30:47 2025: Loading MetaPhlAn mpa_vJun23_CHOCOPhlAnSGB_202307 database...
Mon Sep 15 14:31:33 2025: Done.
Mon Sep 15 14:31:45 2025: Processing samples...
Mon Sep 15 14:31:45 2025: Constructing the big marker matrix
Mon Sep 15 14:31:45 2025: Checking 1 species
Mon Sep 15 14:31:45 2025: Done.
Mon Sep 15 14:31:45 2025: Detected 1 clades:
Mon Sep 15 14:31:45 2025: t__SGB1877: in 6 samples.
Mon Sep 15 14:31:45 2025: Done.
Mon Sep 15 14:31:45 2025: Finish StrainPhlAn 4.1.1 execution (57.95 seconds): Results are stored at "clade_names"