I want to include assembled genomes in my PanPhlAn analysis to see the level of similarity between the metagenomic data and strains isolated from related samples.
I was planning to chop the assembled genome into simulated fastq reads, align these to the pre-made pangenomic database, and include the outputs in the profiling as though it was a metagenomic sample.
Do you think this is a suitable approach or is there a better method?
seems for me a nice and relevant idea. I use MAGs when available to validate some PanPhlAn results.
I think it’s only important to keep in mind the limits of using MAGs compared to the full pangenome, have a look at these papers :
I had considered including MAGs recovered from the same samples but didn’t feel that they would contribute much due to the completeness issues you highlighted.
Is this case I was referring to genome-sequenced strains physically isolated in the lab from the same samples and from related samples (we have recovered isolates from fecal samples for species which were not detected by PanPhlAn3 [presumably due to low coverage/abundance]).
My first instinct was to use the method mentioned above (making a fq file from the genome using wgsim or similar) but I wanted to double check that there wasn’t a more appropriate method (Eg. aligning every ORF to the PanPhlAn species database instead)
(we have recovered isolates from fecal samples for species which were not detected by PanPhlAn3 [presumably due to low coverage/abundance]).
Maybe this can be due to parameters tuning of strain detection, in this topic we discussed this kind of issue :
If that still does not work, the last method you mentioned (“Eg. aligning every ORF to the PanPhlAn species database instead”)
In the topic linked above, I explain how the coverage threshold work and the kind of profile PanPhlAn expect from a natural microbiota sample. I’m not sure making a fq file out of the genome will be treatable by PanPhlAn, as house-keeping genes present in other species are expected at very high coverage while parts of the accessory genome should have a low coverage compared to the rest of the species genome.