Issue with getting reference genomes


I have 17 MAGs/bins and I have annotated them so they belong to different classes. I want to pull genomes from the Genbank that are closely related to my MAGs. Here is my command for Actinomycetia class:

phylophlan_get_reference -g c__Actinomycetia -o genomes

But I get the following error:

[e] no reference genomes found for “c__Actinomycetia”, please check the taxonomic label provided

However, if I manually search in Genbank I do find many Actinomycetia genomes.

Could you please help me how can I pull genomes for a specific class, family or genus for the purpose of drawing phylogenitc tree to see how similar/dissimilar my MAGs are from the genomes already present in the Genbank?

Many thanks,

Hi and thanks for reporting this.
So, basically, the problem is with the label. The mapping of the taxonomy in PhyloPhlAn is from a couple of years ago and the class name “Actinomycetia” was after that. I think you will find all genomes you need by changing the taxonomic label to the old “c__Actinobacteria”.

Many thanks,

Ok that one works.

Actually, I am having issue with pulling the genomes for newly renamed genomes for example, c__Andersenbacteria and p__Patescibacteria etc.

Also those labels changed quite recently and in the past were a bit messy.
I did this:

$ bzgrep -i patesci taxa2genomes_cpa201901_up201901.txt.bz2
2052139 k__Bacteria|p__Bacteria_unclassified|c__Bacteria_unclassified|o__Bacteria_unclassified|f__Bacteria_unclassified|g__Bacteria_unclassified|s__Patescibacteria_group_bacterium GCA_003142075.1;[..];GCA_008012035.1

and then

$ bzgrep -i anderse taxa2genomes_cpa201901_up201901.txt.bz2
1797278 k__Bacteria|p__Candidatus_Andersenbacteria|c__Candidatus_Andersenbacteria_unclassified|o__Candidatus_Andersenbacteria_unclassified|f__Candidatus_Andersenbacteria_unclassified|g__Candidatus_Andersenbacteria_unclassified|s__Candidatus_Andersenbacteria_bacterium_RIFCSPHIGHO2_01_FULL_46_36 GCA_001817035.1

The taxa2genomes_cpa201901_up201901.txt.bz2 is what PhyloPhlAn uses to download genomes. In your specific case, you might want to try specifying a different label with the -g param. In particular, for the two labels you are not finding, you can now try with s__Patescibacteria_group_bacterium and then with p__Candidatus_Andersenbacteria.

Thank you, that was helpful.