Strain Identity Cutoff

cazzlewazzle89 · September 30, 2020, 2:37pm

Hi All,

I have ran my samples through StrainPhlAn3 and calculated distance matrices using Emboss (with Kimura correction as per the tutorial).

My question is whether there is a consensus on what to use as an identity cutoff when looking to see if the same strain is present in two different samples? Should I look for a distance of zero or should this be relaxed to allow for sequencing errors?

Thanks in advance,
Calum

aitor.blancomiguez · September 30, 2020, 3:09pm

Hi @cazzlewazzle89,
If you are interested in detect the same strain in different samples I would suggest you to use the normalized phylogenetic distances retrieved from the branch lenghts of the phylogenetic tree (normalizing the branch distances by the total branch length). In terms of the cutoff, this will slightly differ depending on the species you are interested on, but 0.01 would be a good approximation (https://www.nature.com/articles/s41467-020-18127-y#Sec4)

Best,
Aitor

cazzlewazzle89 · September 30, 2020, 3:38pm

Thanks Aitor

cazzlewazzle89 · October 1, 2020, 8:53am

Hi @aitor.blancomiguez

Thanks again for the reply. As a follow up question:
By total branch length do you mean the sum of all edges in the tree or the sum of branch length from all tips to the root?
I have also seen another normalisation method suggested based on dividing the phylogenetic distance between each pair of leaves by the median of these distances.

All the best,
Calum

aitor.blancomiguez · October 2, 2020, 9:56am

Hi @cazzlewazzle89
I meant the sum of all the branch lengths in the tree. If you are using Python, you could use the total_branch_length() function of the Phylo.BaseTree included in Biopython: https://biopython.org/docs/1.75/api/Bio.Phylo.BaseTree.html#Bio.Phylo.BaseTree.TreeMixin.total_branch_length
For the other normalisation method using the mean value, that will also work (https://www.sciencedirect.com/science/article/pii/S1931312818303172)

Best,
Aitor

Topic		Replies	Views
Understanding branch length from strainphlan output StrainPhlAn	3	536	June 9, 2021
Query on Strain Identification Methods and Comparisons in StrainPhlAn4 Usage StrainPhlAn	0	215	December 20, 2023
Length of concatenated alignment file StrainPhlAn	12	595	August 31, 2021
Phylogeny can not be inferred. No enough markers were kept for the samples StrainPhlAn	4	452	May 31, 2022
Metaphlan-Strainphlan discrepancy StrainPhlAn	1	627	July 28, 2022

Strain Identity Cutoff

Related topics