One of the things we used PhyloPhlAn 2 for was to predict the genus/species for new MAGs. This was output as a side product of placing MAGs in the tree of life, using the database of portein markers in the PhyloPhlAn database.

The new phylophlan_metagenomic script replaces this functionality, but it appears it can only use SGB databases. This gives us terrible resolution for our (rumen) MAGs, often going only to Phylum and sometimes to Family (but very very rarely to Genus or Species)

Is there a way of using phylophlan_metagenomic with the old protein biomarker database, so as to replicate the behaviour of PhyloPhlan2?

Alternatively, is there a way of asking PhyloPhlAn to output tabular taxonomy predictions when it places genomes in the tree of life?


Hello @BioMickWatson, I’m sorry that functionality is not available in PhyloPhlAn 3.0. The main reason why we decided to remove it is because it was based on the muscle ability to merge MSAs. During our tests, we found several cases where the merged MSA had biases, potentially because of the difficult task of merging MSAs in an accurate way. Also, by moving the external tool configurations of PhyloPhlAn to a config file to allow to integrate more tools available and be flexible with their parameters, then we couldn’t easily provide this functionality. Not only because of the availability os several external tools that one can use, but also due to the many different configurations a user can set in the analysis.

Having said this, one can setup an analysis that ca replicate what PhyloPhlAn 2 did when integrating inputs. This would require to:

  1. retrieve a set of reference genomes to cover the diversity (this can be done using ``).
  2. Then one can use PhyloPhlAn 3.0 to build a phylogeny using the phylophlan database.
  3. Now, one can create a new input folder linking inside all the reference genomes used in the tree-of-life phylogeny from the previous step + the new genomes and MAGs to be placed.
  4. To save time one can create the new output folder (let’s say it will be output1 and copy inside output1/tmp the folder: map_dna and markers_dna (to avoid re-mapping and re-extract the markers in the phylophlan database already computed in step 2)
  5. Now a new tree of life with the new inputs can be reconstructed. Since some data were copied from the previous tree of life, the very same parameters and configuration file should be used.

I’ll be happy to further help with this if something is not clear.

