How to interpret phylophlan_strain_finder result

Hi @f.asnicar

I had few MAGs of the same species and I wanted to check if they are belonging to a same or different strain, so I ran phylophlan_strain_finder script and this as the terminal output:

phylophlan_strain_finder -i RAxML_bestTree.floor_ecoli_refined.tre -m mutation_rates.tsv
#phylogenetic_threshold 0.05
#mutation_rate_threshold 0.05
#total_branch_length 1.628370432508481
#subtree min_dist mean_dist max_dist min_mut mean_mut max_mut distances mutation_rates
(A9_bin.17:0.04783,(A18_bin.33:0.01085,A74_bin.8:0.00007):0.02194):0.01295; 0.010922493303105412 0.0537953070780006 0.08062510853907626 0.0004495698153051713 0.0023098078785621537 0.003409944001735001 A9_bin.17,A18_bin.33:0.08062510853907626|A9_bin.17,A74_bin.8:0.06983831939182016|A18_bin.33,A74_bin.8:0.010922493303105412 A9_bin.17,A18_bin.33:0.003409944001735001|A9_bin.17,A74_bin.8:0.003069909818646289|A18_bin.33,A74_bin.8:0.0004495698153051713

How do I interpret these results?

Thanks

Hi @saras22,

So, the output file reports in each line a subtree of what can be considered “same strain” according to the parameters (as reported in the header)

#phylogenetic_threshold 0.05
#mutation_rate_threshold 0.05

If that’s the complete output, it appears only that subtree

(A9_bin.17:0.04783,(A18_bin.33:0.01085,A74_bin.8:0.00007):0.02194):0.01295;

falls within the definition and the labels (i.e., genomes) can be considered to be “same strain”.
The other columns report some stats on the phylogenetic distances, the pairwise distances of the leaves in the subtree and the mutation rates.

Hope this helps,
Francesco