Including larger taxonomy levels?


I’m not quite sure how to frame this question, but I have paired metagenomic, 16s, and metatranscriptomic mouse stool microbiome samples. I used kneaddata to remove mouse genome from my meta-omic’s samples and then analyzed my 16s and metatranscriptomics using other software. I had >1000 different unique taxonomy bacteria (when considering all taxonomic levels) in my 16s and my metatranscriptome had reads map to essentially all of those taxonomic hits at varying abundances which was validating of the diversity of the sample.

I wanted to switch over to using Humann3 to have a consistent pipeline to match my metagenomics with my metatranscriptomics, and I just ran Humann3 (with updated databases and fresh conda install) with a metagenome sample (~40 million reads after kneaddata to remove mouse genome, rRNA was physically removed during library building, but I didn’t kneaddata against the rRNA library) and the metaphlan results said 98% of my mapped reads (~14 million mapped reads, 26 million unmapped) came from one particular bacterial species, and found 7 other species that made up the other 2% of mapped reads.

This seems inconsistent with the paired 16s and metatranscriptomics data I have, so I’m not quite sure what to make of this. Maybe I’m misunderstanding how metaphlan is working? I think I’m likely misunderstanding a part of the process? Continuing with this taxonomy file for my metatranscriptomics input doesn’t make sense given I already know the diversity of the sample is significantly higher?