Hello everyone,
I have some questions regarding the analysis outputs. Concerning the .fna files located in the output/tmp/clean_dna directory, do these files encompass all the reads mapped to the markers and classified to the species? I am specifically looking for files that contain the reads mapped to the reference genome of the species and exclude reads assigned with the same likelihood to more than one species. Additionally, I am interested in using StrainPhlAn to annotate bacterial SNPs.
Hello
to get reads that map against markers – you have to look at the .sam output of metaphlan, it’s exactly that: alignment of reads to the Metaphlan markers. By design of the database no reads should be alignable to more than one marker.
As for the bacterial SNPs, Strainphlan produces a consensus strain masking out polymorphic positions with *, so you can look for those in the output of sample2markers. For anything more sophisticated I suggest running samtools pileup manually on the sam file or use some dedicated SNP analysis tools.
Best
Michal
Hi Michal,
StrainPhaln outputs include a MSA file of aligned marker genes for samples in the study. We have developed a tool, deepBreaks (GitHub - omicsEye/deepbreaks) that prioritize and rank SNVs using the MSA file that are associated with a metadata of interest (e.g., health-disease and niche of samples). deepBreaks provide multiple applications and one is analyzing starinphlan outputs for HMP project (deepbreaks/examples/discrete_phenotype_HMP.ipynb at master · omicsEye/deepbreaks · GitHub)
Please let us know if you need help running deepBreaks.
Ali