Hello bioBakers!
I am using Phylophlan v3.0.67 (24 August 2022) with a dataset of MAGs from enriched samples and trying to place retrieved genomes in a Phylogeny against references. As far as I understand, the phylophlan_get_reference
command (as shown in example 04) selects reference genomes at random. The resulting phylogeny only places a small fraction (~10%) of my test MAGs among the references, showing the others in two distinct clades entirely separate from the reference genome collection. Do you think this is because:
- The reference genomes downloaded might be biased in some way (geographically, lab stains vs. clinical, etc?)
- The input MAGs have too many gaps to accurately infer their phylogenetic position (or does Phylophlan account for this, and if so, how?)
In case 1, is there some way I can specify a collection of references to download? Even if it means manually downloading reference genomes and specifying those as a reference within the program somehow?
Any advice would be greatly appreciated!
Archie