Hi,
In the MetaPhlAn4 species name file cmprod1.cibio.unitn.it, I have noticed that some SGB IDs have multiple corresponding species, such as SGB10068, which is associated with dozens of species. However, in my profiling results, the annotation for SGB10068 is “k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_coli|t__SGB10068”. This raises a few questions:
When annotating an SGB, does MetaPhlAn4 select the most likely correct species while disregarding other species? If so, what criteria does it use for selecting the annotation?
When an SGB corresponds to multiple species, does the species annotation vary among different samples?
No, it will produce the same annotation independently of the samples
Unfortunately, in the metaphlan phylogenetic tree we did not include microeuk species as we built the tree using 200 universial bacterial and archeal conserved genes. For the additional nodes, could you share with me the full list species in the tree not in the db?
Thank you for your response.
During the species annotation of the nodes in the phylogenetic tree, I mapped the species annotations from the database to the corresponding nodes in the phylogenetic tree by removing the “SGB” label. However, I noticed that 140 nodes were not annotated. I have attached them in the appendix. node.txt (956 Bytes)