StrainPhlAn Usage for Bacterial SNP Annotation

Hello everyone,

I have some questions regarding the analysis outputs. Concerning the .fna files located in the output/tmp/clean_dna directory, do these files encompass all the reads mapped to the markers and classified to the species? I am specifically looking for files that contain the reads mapped to the reference genome of the species and exclude reads assigned with the same likelihood to more than one species. Additionally, I am interested in using StrainPhlAn to annotate bacterial SNPs.

Hello

to get reads that map against markers – you have to look at the .sam output of metaphlan, it’s exactly that: alignment of reads to the Metaphlan markers. By design of the database no reads should be alignable to more than one marker.

As for the bacterial SNPs, Strainphlan produces a consensus strain masking out polymorphic positions with *, so you can look for those in the output of sample2markers. For anything more sophisticated I suggest running samtools pileup manually on the sam file or use some dedicated SNP analysis tools.

Best
Michal