In the MetaPhlAn4 species name file cmprod1.cibio.unitn.it, I have noticed that some SGB IDs have multiple corresponding species, such as SGB10068, which is associated with dozens of species. However, in my profiling results, the annotation for SGB10068 is “k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_coli|t__SGB10068”. This raises a few questions:
- When annotating an SGB, does MetaPhlAn4 select the most likely correct species while disregarding other species? If so, what criteria does it use for selecting the annotation?
- When an SGB corresponds to multiple species, does the species annotation vary among different samples?
In addition, I have noticed that some entries from the species name file cmprod1.cibio.unitn.it, such as EUK5661, are missing from the phylogenetic tree generated by MetaPhlAn4 MetaPhlAn/mpa_vOct22_CHOCOPhlAnSGB_202212.nwk at master · biobakery/MetaPhlAn · GitHub. Conversely, there are also leaf nodes in the tree, like 45766:0.0056838754, that do not exist in the species name file. Does this indicate an error?