Utterly confused - PhloPhlAn 2 -> PhyloPhlAn 3 - how to characterise MAGs


First of all, can I just say thank you for providing and supporting amazing tools to the community, we could not function without them. Thank you!

Secondly, we have a workflow MAGpy that uses PhyloPhlAn 2 and I want to start using PhyloPhlAn 3, and I am utterly confused.

We do two tasks:

  1. Ask PhyloPhlAn to guess the taxonomy of each of the MAGs (phylophlan.py -u my_mags --nproc 16). I quite liked the tabular output, is it possible to reproduce this?
  2. Ask PhyloPhlAn to put those MAGs in the tree of life (phylophlan.py -i -t my_mags --nproc 16)

This was pretty simple and I am trying to figure out how to do this in PhylPhlAn 3. I can’t.

IMPORTANT to note that our MAGs are almost never human MAGs, so using the database from the Segata Cell paper is not useful.

Thirdly, we are installing from CONDA, and when I list databases, it gives me none. Where are the databases? The tutorials mention the phylophlan database (-d phylophlan) but if I try running with this option I get an error. Where is this database? How do I get it? Is there such a thing as an equivalent core protein database that PhyloPhlAn 2 used for PhyloPhlAn 3?

Fourthly, when I run phylophlan_metagenomic it says that mash is not installed. So mash should be added to the conda recipe, no?

Please help!

It might be worth writing some documentation that tells people how to do things in PhyloPhlAn 3 that they used to do in PhyloPhlAn 2. Especially as PhyloPhlAn 2 is hard to install now.

Again, many many thanks for doing what you do. I don’t want to appear ungrateful. I am not ungrateful, but I am lost

Thank you


Hello @BioMickWatson, I’m not sure about the forum saying that this message was posted ~8 months ago and I only receive the notification via email yesterday… I’m really sorry we didn’t answer your message earlier.

Thank you!

I know that MAGpy was using the two features you mentioned from PhyloPhlAn2, but unfortunately, those two things were removed and are not available in PhyloPhlAn 3, but it doesn’t mean that it can be done.
The main reasons why we removed them are that with PhyloPhlAn 3 we wanted to give the user the flexibility in using the preferred set of tools and have a larger panel of parameters/options to tune/use. In PhyloPhlAn2, the integration within the tree of life relied on a specific configuration of tools and parameters and on the merging of MSA function of MUSCLE which during the development of PhyloPhlAn 3 we found was introducing biases and hence no reconstructing the right phylogenies. Now, with PhyloPhlAn 3 we don’t have a single reference tree of life that can be used to place genomes and MAGs because that would mean doing it under a strict specific setting of external tools and parameters that we would pick for reconstructing it. It also means that all steps but the mapping of the phylogenetic markers should be done from scratch as we can rely on merging MSA (which would also require the user to a specific database).

However, this can be done within PhyloPhlAn 3, it just requires some initial time to set up your tree-of-life and then use it to place your MAGs. The steps here would be to download your input set of genomes and reconstruct your tree of life, and then use it to reconstruct it with your genomes and/or MAGs added to the input folder (re-using the mapped markers for the reference genomes if you force all external tools and parameters to match those used to build the first tree of life). Similarly, when you download the set of reference genomes, you can use their taxonomic label and from the new phylogeny where you place your genomes and/or MAGs, you can easily find the closes and propagate the taxonomic label. I know this might sound much more complicated than using PhyloPhlAn2, but I’ll be happy to further help you if you want to explore this path for integrating PhyloPhlAn 3 within MAGpy.

I just wanted to added that with phylophlan_metagenomic you can assign SGBs to your genomes and/or MAGs, and you’re right that the initial database contained mainly genomes and MAGs from the human gut, but we are expanding it to cover also non-human microbial species. So, I believe it could be very useful in the near future.

This is very strange. When you run PhyloPhlAn specifying the database -d phylophlan, it should be automatically downloaded if not present locally (either in the folder you specified with --databases_folder or in the default path within the conda environment).
Although this could be an old issue now fixed with the recent versions? (we moved the databases from several locations in the past year to allow the download from everywhere as not everyone can download from Dropbox)

This indeed is strange, Mash is in the conda recipe so it should be installed when installing PhyloPhlAn. Please if you can provide some versions I can check if it was a problem with some old conda package.

Sorry for the very long response, please let me know if anything is not clear.

Many thanks,

1 Like

Thansk Francesco for this really detailed response :slight_smile: I have only just seen it, but will digest :slight_smile: