Hi,
I am quite confused about the microbial profiling result given by Metaphlan3 and Kraken2.
I used a mock community metagenome data(SRR8073716) to check the profiling accuracy which is deposited in:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8073716
The following result is produced from the metaphlan3, which shows the genus of Muricauda counts 73.2% while Halomonas counts3.5%:
➜ STEP3_METAPHLAN3 grep g_ SRR8073716.metaphlan.txt | grep -v s_ | cut -f1,3
k__Bacteria|p__Bacteroidetes|c__Flavobacteriia|o__Flavobacteriales|f__Flavobacteriaceae|g__Muricauda 73.22683
k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhizobiales|f__Cohaesibacteraceae|g__Cohaesibacter 19.97918
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micromonosporales|f__Micromonosporaceae|g__Micromonospora 3.88489
k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhodobacterales|f__Rhodobacteraceae|g__Thioclava 2.87363
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Oceanospirillales|f__Halomonadaceae|g__Halomonas 0.03547
However, using kraken2, I got the genus profiling result based on a 0.0005 threshold (Halomonas counts 47%):
Halomonas 0.472506522079523
Marinobacter 0.227462452145326
Psychrobacter 0.0559655089494823
Acinetobacter 0.00153809705484573
Moraxella 0.000736164818070423
Pseudomonas 0.00158863853633053
Vibrio 0.000601862188834009
Cohaesibacter 0.090182141229469
Thioclava 0.0213003156275142
Muricauda 0.067132217471038
Maribacter 0.00272361195132018
Flavobacterium 0.00219803972160995
Winogradskyella 0.0012892608573708
Cellulophaga 0.0010683676450856
Flagellimonas 0.000789178202723343
Polaribacter 0.000737527138769062
Arenibacter 0.000643674954892236
Aquimarina 0.000534621491183538
Zobellia 0.00052313114827737
Sediminicola 0.00050173469965757
Micromonospora 0.024416103338325
Streptomyces 0.000887906385118827
Compared to the https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8073716, it seems like the kraken2 result may somehow be similar to the NCBI result(but I know the NCBI result is based on the NCBI taxonomy database(The database contained 48,180 taxonomy nodes in January 2017) and thus provide a more stable result.
I used to use metaphlan2 a lot before in the gut microbiome research and I am currently going to update the profiling method now like trying kraken2, metaphlan2 and metaphlan3 to compare the profiling result. In the link: MetaPhlAn 3 versus 2 -- different results , I thought I had a somehow similar question to this that caused a high abundance of Flavobacteriia which might be wrong.
Generally speaking, I feel like metaphlan2 is stable and applicable in gut microbiome research(also many published papers proved this!) and kraken2 database contains more entries and thus caused more species in the profiling result.
Any suggestion would be appreciated!!
Thank you in advance!