l’m attempting to use the newick formatted tree provided with MetaPhlAn3 to compute Generalised Unifrac and Phylogenetic Diversity metrics in R but I’m having trouble removing the assembly ID prefix from the leaf names (the leaf names need to match the row names from the species relative abundance data frame).
For example I would like the leaves to be named
Does anybody have a copy of this tree available without these IDs?
Or else a file listing the accession numbers of all genomes in the v30 database along with their associated taxonomy that I could use for a batch string replace?
All the best,