Humann3/Chocophlan and metaphlan3 databases compatible?


I am analysing data where one known taxon is not included in the chocophlan v201901, however it is luckily included in the newer release v201901b.
I am a bit confused about the compatibility of humann and metaphlan. Humann also uses metaphlan to analyse the taxonomic composition, however, the taxon that I was missing in v201901 is not in the latest release of the metaphlan suite (mpa_v30_CHOCOPhlAn_201901). How will this affect the analysis? Is it possible to obtain a metaphlan database which is based on v201901b?

MetaPhlAn version 3.0.14 (19 Jan 2022)
humann v3.0.1

Thanks in advance.

Best wishes

I’m unfortunately in his same boat. I’m confused by these databases.

When I run humann_databases --download chocophlan full <database folder> does it also install the database necessary for metaphlan (which is integrated in the humann3) to run and use? Or should I still install the metaphlan database separately using metaphlan --install --bowtie2db <database folder> and refer to this database when running humann using metaphlan-options? Both will download and install a database called chocophlan, but different versions of it…Did I maybe miss the documentation explaining all of this? It’s all pretty confusing :confused:

@makrez did you find a solution or figured out how this all works?

Replying first to the post above from @makrez which we missed (apologies!): the early releases of bioBakery 3 had some species that could be profiled by MetaPhlAn but which were missing pangenomes for HUMAnN or vice versa (a result of challenges to solve with the new marker/pangenome export process). We continued this refinement and released bioBakery 3.1 as a further improvement to both MetaPhlAn and HUMAnN and their databases. HUMAnN has since been upgraded to HUMAnN 3.5 (which is a critical update for an unrelated reason); you can still use HUMAnN 3.5 with MetaPhlAn 3.1.

To your question @MalbertR: MetaPhlAn will automatically download and index its database the first time it is run. It can be useful to try a demo run of MetaPhlAn first to force this behavior. HUMAnN is architected a little differently and requires you to use the humann_databases script to download the databases you want to use. This is partly because the HUMAnN databases are so large that we would not want to download them automatically, whereas the MetaPhlAn database is smaller.

1 Like

@franzosa Thank you very much for your reply. Appreciate it.

A question that remains for me is, when using humann_databases to install the necessary databases, is it correct to assume that this will also take care of the MetaPhlAn database, since MetaPhlAn can also be ran via HUMAnN? Or should I still have MetaPhlAn install it’s own database separately and refer to it through --metaphlan-options when running the HUMAnN pipeline (so including taxonomic profiling)?

Oh, and just in case (even though it should not matter), I’m trying to run HUMAnN with your docker container(s).

MetaPhlAn will automatically download an appropriate database the first time it is run, so it is slightly different from HUMAnN’s manual downloads in that way.