Identifying Strain Names

My aim is to identify well-known strains in my own samples. For example, in a recent journal article titled RNA Landscape of the Emerging Cancer-associated Microbe Fusobacterium nucleatum, the introduction contains

… we generate high-resolution global RNA maps for five clinically relevant fusobacterial strains— F. nucleatum subspecies nucleatum, animalis, polymorphum and vincentii, as well as F. periodonticum

Can StrainPhlAn do that? I can’t find anything in its user guide that suggests I could identify Fusobacterium nucleatum subsp. nucleatum, for instance. Is StrainPhlAn only for calculating the number of SNPs to a species reference genome?

Step 2. Obtain the sam files from these samples by mapping them against MetaPhlAn database
Step 4. Extract the markers of Bacteroides_caccae from MetaPhlAn database

Isn’t the database called ChocoPhlAn? MetaPhlAn is the name of a software.

1 Like

Hi @Dario
Thanks for getting in touch. For identify well-known strains in your samples you first need to include them as reference genomes when calling StrainPhlAn (using the -r / --references parameter). Once StrainPhlAn generates the tree, you can check whether the strains you included are also present in your samples by checking their phylogenetic distance in the tree. Please, see this last post: Strain Identity Cutoff