Include Genome Sequences in PanPhlAn3

cazzlewazzle89 · April 22, 2021, 7:26pm

Hi all,

I want to include assembled genomes in my PanPhlAn analysis to see the level of similarity between the metagenomic data and strains isolated from related samples.
I was planning to chop the assembled genome into simulated fastq reads, align these to the pre-made pangenomic database, and include the outputs in the profiling as though it was a metagenomic sample.
Do you think this is a suitable approach or is there a better method?

Thanks,
Calum

leonard.dubois · April 26, 2021, 9:29am

Hello,

seems for me a nice and relevant idea. I use MAGs when available to validate some PanPhlAn results.
I think it’s only important to keep in mind the limits of using MAGs compared to the full pangenome, have a look at these papers :

The Reliability of Metagenome-Assembled Genomes (MAGs) in Representing Natural Populations: Insights from Comparing MAGs against Isolate Genomes Derived from the Same Fecal Sample
Metagenome-assembled genome binning methods with short reads disproportionately fail for plasmids and genomic Islands

Anyway, it depends on which part of the PanPhlAn profile you focus on, core, extended core, accessory genome, rare genes… It also depends on the quality and completeness of your MAGs

Best,
Léonard Dubois

cazzlewazzle89 · April 26, 2021, 12:14pm

Thanks for the reply @leonard.dubois

I had considered including MAGs recovered from the same samples but didn’t feel that they would contribute much due to the completeness issues you highlighted.

Is this case I was referring to genome-sequenced strains physically isolated in the lab from the same samples and from related samples (we have recovered isolates from fecal samples for species which were not detected by PanPhlAn3 [presumably due to low coverage/abundance]).

My first instinct was to use the method mentioned above (making a fq file from the genome using wgsim or similar) but I wanted to double check that there wasn’t a more appropriate method (Eg. aligning every ORF to the PanPhlAn species database instead)

leonard.dubois · April 26, 2021, 3:49pm

(we have recovered isolates from fecal samples for species which were not detected by PanPhlAn3 [presumably due to low coverage/abundance]).

Maybe this can be due to parameters tuning of strain detection, in this topic we discussed this kind of issue :

If that still does not work, the last method you mentioned (“Eg. aligning every ORF to the PanPhlAn species database instead”)

In the topic linked above, I explain how the coverage threshold work and the kind of profile PanPhlAn expect from a natural microbiota sample. I’m not sure making a fq file out of the genome will be treatable by PanPhlAn, as house-keeping genes present in other species are expected at very high coverage while parts of the accessory genome should have a low coverage compared to the rest of the species genome.

Topic		Replies	Views
Minimum depth for PanPhlAn PanPhlAn	9	617	November 8, 2021
Input for Panphlan PanPhlAn	2	439	July 22, 2021
PanPhlAn threshold selection PanPhlAn	6	1553	March 25, 2021
Am I doing it right...? PanPhlAn	2	1077	July 8, 2020
About the PanPhlAn category PanPhlAn	0	699	December 13, 2019

Include Genome Sequences in PanPhlAn3

Related topics