I recently tried to run strainphlan3 on shallow shotgun sequencing data of 166 skin samples (sequencing depth 12M reads). In the example, underneath, I show the phylogenetic tree of the most abundant species Cutibacterium acnes within all my samples.
My first question is, can you use strainphlan3 on shallow shotgun sequencing data to look at the most abundant species to look for sub-species diversity? I suppose for low abundant species, it will be more difficult due to insufficient coverage.
My second question is, can strainphlan identify the number of strains of the same species that are present in a sample?
My third question, as you can see in the PCoA plot of the MSA. The first component explains more than 100% of the variance, which is not possible. Is this due to correlation between the variables that I am comparing with one another? Or outliers that are present within my data? What is the most correct way to adjust this? Removal of outliers and removal of one of the highly correlated variables?
Thank you in advance!