Hi - I am analyzing DNA extracted from a stool sample. A portion of the DNA was used for 16s sequencing and a portion of the same DNA sample was used for metagenomics. The 16s sequencing shows the phylum Firmicutes at ~50% relative abundance in the sample, which is reasonable for intestinal microbiome. However, MetaPhlan3 is showing 0.42% for Firmicutes relative abundance. I’m aware there can be differences between 16s and metagenomic abundances, but this 100 fold difference seems extreme, particularly considering these were generated from the same DNA. 0.4% Firmicutes is also not biological consistent with stool microbiomes. I ran metaphlan3 without changing any parameters besides the cpu count.
Anyone have any thoughts?
Can you provide some basic stats of the sample (e.g. total size, avg/sd read length) ?
I gave a single example above, but I have ~50 paired 16s + metagenome samples representing mouse stool microbiomes. The samples were collected at different times from different groups of mice (ie biological sample diversity) and for each stool sample the16s and metagenome sequencing libraries were built simultaneously using the same DNA extraction per sample, although not all samples were built at the same time.
Sequencing Depth range for metagenomes is 35-45 million PE reads. Metagenome reads were quality controlled and run through kneaddata against the mouse genome to remove host-contaminating reads, which represented at most ~1 million reads in the most host-contaminated sample. I don’t know the 16s sequencing depth off the top of my head but I identified ~900 OTUs at a cut off of at >9 reads per OTU so it seemed comprehensive.
The firmicute discrepancy is consistent across all 50 samples. 16s relative firmicute abundance ranges from ~20-50% across all samples, and Metaphlan3 relative abundance ranges from basically 0 to 1.2% for firmicutes. I’ve also run ~5 samples without/prior to the kneaddata removal of mouse DNA reads and I still get nearly identical results.
When I searching if others had this issue I found this paper that found something similar, where metaphlan did not call firmicutes in the top five phyla, while the same samples had firmicutes as the top phyla according to 16s. I don’t necessarily need feedback on that paper, just brought up in case it was helpful.
Let me know if you have any thoughts!
I see you are trying to analyze mouse microbiomes. MetaPhlAn 3 includes more mouse-associated species, but probably the discrepancy is due to some missing species. The paper you mentioned used the first version of MetaPhlAn, which was able to profile a lower number of species compared to MetaPhlAn2 and MetaPhlAn 3.
You could try to relax some parameters, like setting
stat_q at 0.1 and ignore/allowing more alignments to be considered with
min_mapq_val (default is 5, to ignore it completely, set it at -1)
Thanks for the follow up - I used those relaxed settings on one sample to test and the firmicute relative abundance did not budge. Perhaps I’ll test out some other meta-analyses pipelines to see how well they capture the firmicute populations.