Metaphlan target microbiomes

Hi, all

I’m using Humann/Metaphlan 3 version

I wonder if Metaphlan could determine only some kind of species ( like human gut species ) or he assumed to work good in any area?

I got this question after running pipeline on Mock Community BMock12 ( article — Shotgun metagenome data of a defined mock community using oxford nanopore, pacbio and illumina technologies ) with such communities (see below )

But Metaphlan found not all species, and distribution was drastically bad ( comparing to results in article Suppl. )

image

So the question is if this communities not very good for Metaphlan ( since Metaphlan could works only with other communities like “human gut” ) or Metaphlan worked not very good by itself on that example ?

Hi @biojack
Thanks for getting in touch. I think the main problem here is the lack of reference genomes for most of the species in the mock community in the MetaPhlAn 3 database. This is the main reason why the results look so different as you would expect

hi @aitor.blancomiguez

Thank you for answering, that’s probably that I was expect the most.

But I wish to clarify couple of things

  1. Have I correctly unerstood that Halomonas bacteria are in reference genomes for Metaphlan ( since I see it in results? ) The problem is that there are only 0.01% but according to article results that should be much more (like Muricauda or even more). So why percentage are so small?

  2. if lacking of reference genome is related to some bacteria domain ( for example they lacked for “sea” bacteria but they presented for “gut” bacteria )? Or it is related to some other things?

Hi @biojack
Answering your questions:

  1. There are Halomonas reference genomes in the MetaPhlAn database, but not for the species that are in the mock, this is the reason why the relative abundance do not correspond to the expected. You can find the species available in the mpa3.0 db here: http://cmprod1.cibio.unitn.it/biobakery3/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
  2. The lack of reference genomes is not environment associated, but taxonomy related. During the construction of the database, some of what we defined low-quality species were not taken into consideration. For more details you can check the “Data Retrieval” section of the biobakery3 paper: Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 | eLife