[e] Phylogeny can not be inferred. Too many markers were discarded

Dear aitor,

I run the example for strainphlan3 it worked, but when I run my own data it get an error

I have 200 samples,

(metaphlan3) [ckzhu@vm-login02 strainphlan3]$ strainphlan -s consensus_markers/*.pkl -m db_markers/s__Gordonibacter_pamelaeae.fna \
> -r reference_genomes/GordonibacterPamelaeae.txt -o output -n 8 \
> -c s__Gordonibacter_pamelaeae --phylophlan_mode accurate --mutation_rates
Sat May 16 17:24:59 2020: Start StrainPhlAn 3.0 execution
Sat May 16 17:24:59 2020: Creating temporary directory...
Sat May 16 17:24:59 2020: Done.
Sat May 16 17:24:59 2020: Getting markers from main sample files...
Sat May 16 17:25:23 2020: Done.
Sat May 16 17:25:23 2020: Getting markers from main reference files...Warning: [blastn] Examining 5 or more matches is recommended

Sat May 16 17:26:13 2020: Done.
Sat May 16 17:26:13 2020: Removing bad markers / samples...
[e] Phylogeny can not be inferred. Too many markers were discarded
Sat May 16 17:26:13 2020: Stop StrainPhlAn 3.0 execution.


what is the meaning of [e] Phylogeny can not be inferred. Too many markers were discarded ?

I downlaod Gordonibacter_pamelaeae references from NCBI genome
the s__Gordonibacter_pamelaeae.fna I get from the follow code
extract_markers.py -c s__Gordonibacter_pamelaeae -o db_markers/

Before phylogenetic reconstruction, StrainPhlAn filters samples not supported by enough species’ markers and markers not conserved across samples. Your problem here is that you are discarding too many markers in this filtering. By default, StrainPhlAn discards all the markers that are not present in at least 80% of the samples. Since you have a large number of samples, this could be a little bit restrictive. You could try to decrease this threshold using the parameter: “–marker_in_n_samples”
I hope this help you!

3 Likes