Query on Strain Identification Methods and Comparisons in StrainPhlAn4 Usage

Dear StrainPhlAn4 Team,

I am currently using StrainPhlAn4 for microbial strain analysis and have a couple of queries regarding the best practices and applications of the tool:

  1. Determining Strain Identity with nGD vs. Mutation Files: I am attempting to determine if two samples have the same strain using StrainPhlAn4 and am considering whether to base this on the normalized phylogenetic distance (nGD) or mutation files. A recent study (The person-to-person transmission landscape of the gut and oral microbiomes | Nature) suggests that nGD might be a better choice, especially due to (1) the rather low coverage that we obtain for species in metagenomic samples even after passing our sequencing depth threshold that would add noise especially to SNV rate estimations and(2)the limited length of the marker gene alignment of some SGBs that would make SNV rates rather unreliable. Could you please help me understand why phylogenetic distance might be less sensitive to low coverage and short marker gene lengths compared to XXX.mutation (supported by Strainphlan4)?

  2. Applicability of StrainPhlAn4 Across Different Environments: Is StrainPhlAn4 suitable for comparing strain mutation rates across different environments, such as soil, water, and gut microbiomes? Can I use the same thresholds to compare and determine if, for instance, soil microbial communities have a higher strain turnover rate compared to gut microbiomes? does the result of StrainPhlAn show a bias towards gut samples?

Your insights and recommendations on these matters would be greatly appreciated, as they will significantly aid in guiding my research using StrainPhlAn4.

Thank you for your time and for developing this valuable tool.