I recently ran the biobakery_workflows pipeline onto my shallow shotgun data. However, the obtained tsv table contained only 274 species and did not contain the first line that lists the reference marker genes database that MetaPhlAn uses. The metaphlan3 tutorial said that there are ~1.1M unique clade-specific marker genes identified from ~100k reference genomes (~99,500 bacterial and archaeal and ~500 eukaryotic) (metaphlan3 · biobakery/biobakery Wiki · GitHub), while the metaphlan3 website mentioned that the unique clade-specific marker genes were identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic)(MetaPhlAn3 – The Huttenhower Lab).
Which number of reference genomes were used?
I was wondering how I can check the used reference database? Might this missing information be due to the removal of intermediate files?
What could be the reason that only 274 species were detected? Should I adjust some additional Metaphlan parameters?
Thank you in advance for your help!
Sorry for the misunderstanding, the metaphlan marker genes were identified from 100k reference genomes and expands 17,000 microbial species. We don’t have an available list of the GCAs used but it can be retrieved following the description of the data retrieval in our last paper: Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 | eLife
Regarding to the species detected, you can get a more sensitive profiling by modifying the --stat_q parameter, please, have a look at this post: Fungal signatures - #2 by aitor.blancomiguez
Thank you for your answer. This clarifies a lot. However, I have an additional question; I read in the paper you mentioned that only selected genera of micro-eukaryotes were included within the database. When checking the list given in the post you mentioned (http://cmprod1.cibio.unitn.it/biobakery3/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2); I do not see the fungal genus that I’m also interested in; Debaryomyces
Is it possible to include more micro-eukaryotic species ourselves in this database and use it within the biobakery 3 workflows? If so, how?
Unfortunately, there is not a code or a tutorial available to generate a custom metaphlan database. But the method is well described in the last manuscript in case you can give it a chance to implement it: Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 | eLife