Hi everyone,
I’m relatively new to metagenomic analysis and currently using MetaPhlAn 4 for taxonomic profiling. While reviewing the output, I noticed that some bacteria are annotated up to the strain level (see example below). For instance: “k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales|f__Akkermansiaceae|g__Akkermansia|s__Akkermansia_muciniphila|t__SGB9228“
This has left me with a couple of questions:
If MetaPhlAn 4 already provides strain-level annotations (as shown in the example), is StrainPhlAn still necessary for strain-level identification?
If so, what additional value does StrainPhlAn offer compared to MetaPhlAn?
I’d greatly appreciate any clarification on the roles of these tools and when to use each one.
Thank you in advance for your help!
Hi @Serna_Blasco
SGBs stand for species-level genome bins and it does not refer to strains but to groups of genomes that have 95% average nucleotide identity among them (find here more details on SGBs definition). MetaPhlAn works by mapping reads against SGB level markers (see here), hence identifying the relative abundance of each SGB and then calculating hierarchically all the taxonomic levels above SGB, which reflect the NCBI taxonomy structure.
On the other hand, StrainPhlAn allows to compare the same SGB detected in different samples by building a multiple sequence alignment file of the strains of the SGB of interest from multiple metagenomes. In other words, if you find the same SGB9228 in other samples, you may be able to build a SGB-level phylogenetic tree to determine, for example, if you found the same strain in different samples.