Question About StrainPhlAn3 Polymorphic Output File


Could you please provide details on the columns/information presented in the clade.polymorphic file produced by StrainPhlAn3?

The column names appear to be different than the columns of the previous analogous file produced by StrainPhlAn2.

Thank you!

Hi @raufs
The new polymorphic file contains the following columns per sample:

  • sample: The name of the sample
  • percentage_of_polymorphic_sites: the percentage of polymorphic sites concatenating all markers together
  • avg_by_marker: the average percentage of polymorphic sites along all markers
  • median_by_marker: the median (Q2) percentage of polymorphic sites along all markers
  • std_by_marker: the standard deviation of the percentage of polymorphic sites along all markers
  • min_by_marker: the minimum percentage of polymorphic sites along all markers
  • max_by_marker: the maximum percentage of polymorphic sites along all markers
  • q25_by_marker: The first quartile (Q1) percentage of polymorphic sites along all markers
  • q75_by_marker: The third quartile (Q3) percentage of polymorphic sites along all markers


Thank you for the column descriptions Aitor!

And just to confirm, is a polymorphic site one where the sample differs from the reference marker’s sequence or is a site at which there is ambiguity for a base call within the sample, which could arise due to multiple strains being present in the metagenomic sample, for instance?

Kind regards,

Hi @raufs
It is the second case, when reconstructing the markers sequences using CMSeq (, a polymorphic site is called if the frequence of the dominand allele is lower than 80%.


Great, thank you again Aitor!

Much appreciated,