Hello BioBakery community,
I’m currently trying to run a strain-level analysis using StrainPhlAn 4.0 on a large metagenomic dataset of mouse gut samples, previously processed with KneadData and MetaPhlAn (ChocoPhlAn Jun 2023 release).
My primary goal is to assess the genomic variation (strain-level diversity) exclusively among my own samples for a series of specific SGBs, e.g., t__SGB41441. Ideally, I would like to get some kind of diversity measure from the MSA.
My understanding is that StrainPhlAn can use the internal marker genes (from ChocoPhlAn) to perform the alignment and SNP calling. In that case, is it mandatory to provide external SGB reference genomes using the -r argument?
And if it is, where can I obtain these genomes? I’ve found that only 4931 reference genomes linked to SGB IDs are available online. Is there an updated list anywhere, maybe linking to the NCBI accession IDs?
Thank you in advance for your time and assistance