Over 80% of reads were unclassified in mouse fecal samples using MetaPhlAn3

Hi, we did shotgun metagenomics sequencing on mouse fecal samples and 2 mock community control samples which are a mixture of 20 known strains. We did quality control using Kneaddata v0.10.0 and we have over ~85% of reads retained after qc for each sample (we did trimming and decontamination to remove mouse host sequences).

Then we did taxonomy profiling using MetaPhlAn version 3.0.14 (19 Jan 2022).

The –unknown_estimation option was added when running Metaphlan to check the percentage of reads classified; however, all the mouse fecal samples had over 80% of unknown reads and the known reads were 100% classified to bacteria. The mock community control samples had only 2% of unknown reads.

I wonder is this normal for profiling mouse gut metagenome using Metaphlan? We sequenced quite deep for these samples and we were surprised to see that the majority of reads were unclassified.

Thank you so much for your help!


I am curious about this as well.

Could the mouse gut not be well represented in the database? Is there any plan to do the following for the mouse microbiome (http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html; Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle: Cell)?

Hi @Fangxi_Xu and @Scott
Indeed, the mice gut microbiome is underepresented in metaphlan 3 as most of the species present has not been isolated so far and thus are not present in the reference genome databases. I suggest you to move to version 4 (https://www.biorxiv.org/content/10.1101/2022.08.22.504593v1) in which we included information from metagenomic-assembled genomes to improve the mappability of low caracterized environments as the mouse gut

Thank you! @aitor.blancomiguez

We tried Metaphlan version 4.0.4 (17 Jan 2023) and it classified reads a lot better. Only 3%~20% reads were unclassified now.