Include Genome Sequences in PanPhlAn3

Hi all,

I want to include assembled genomes in my PanPhlAn analysis to see the level of similarity between the metagenomic data and strains isolated from related samples.
I was planning to chop the assembled genome into simulated fastq reads, align these to the pre-made pangenomic database, and include the outputs in the profiling as though it was a metagenomic sample.
Do you think this is a suitable approach or is there a better method?

Thanks,
Calum

Hello,

seems for me a nice and relevant idea. I use MAGs when available to validate some PanPhlAn results.
I think it’s only important to keep in mind the limits of using MAGs compared to the full pangenome, have a look at these papers :

Anyway, it depends on which part of the PanPhlAn profile you focus on, core, extended core, accessory genome, rare genes… It also depends on the quality and completeness of your MAGs

Best,
Léonard Dubois

Thanks for the reply @leonard.dubois

I had considered including MAGs recovered from the same samples but didn’t feel that they would contribute much due to the completeness issues you highlighted.

Is this case I was referring to genome-sequenced strains physically isolated in the lab from the same samples and from related samples (we have recovered isolates from fecal samples for species which were not detected by PanPhlAn3 [presumably due to low coverage/abundance]).

My first instinct was to use the method mentioned above (making a fq file from the genome using wgsim or similar) but I wanted to double check that there wasn’t a more appropriate method (Eg. aligning every ORF to the PanPhlAn species database instead)

(we have recovered isolates from fecal samples for species which were not detected by PanPhlAn3 [presumably due to low coverage/abundance]).

Maybe this can be due to parameters tuning of strain detection, in this topic we discussed this kind of issue :

If that still does not work, the last method you mentioned (“Eg. aligning every ORF to the PanPhlAn species database instead”)

In the topic linked above, I explain how the coverage threshold work and the kind of profile PanPhlAn expect from a natural microbiota sample. I’m not sure making a fq file out of the genome will be treatable by PanPhlAn, as house-keeping genes present in other species are expected at very high coverage while parts of the accessory genome should have a low coverage compared to the rest of the species genome.