How can I match the species from the abundance file with those in mpa_species.txt?

I processed samples from a project using Metaphlan 4.0.6 and obtained an abundance file (Please see the attachment “PRJNA389927_unknown_m4.csv”.).

Now, I am using mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt to obtain detailed information about the species in the abundance file.

However, out of the 1008 species in the abundance file, 384 cannot be found in mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.

Out of the 384 species, I found that 261 species have ‘GGB’ in their names, which appears to be a code for the genus. As an example, for s__GGB10692_SGB17347, in mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt, I found the corresponding line of information for SGB17347: ‘SGB17347 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridia_unclassified|f__Clostridia_unclassified|g__Clostridia_unclassified|s__Clostridia_bacterium’. Does this indicate that ‘GGB10692’ is ‘Clostridia’ and ‘SGB17347’ refers to ‘bacterium’? Alternatively, are there any other methods to match the information for these 300+ species in mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt?

Additionally, does ‘SGB17347’ represent the ‘17347’ in mpa_vJan21_CHOCOPhlAnSGB_202103.nwk?

PRJNA389927_unknown_m4.csv (763.9 KB)

In the abundance file, there is a species ‘s__Bacteroides_SGB14754’, and upon searching ‘SGB14754’ in mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt, the corresponding species information found is ‘SGB14754 k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Coriobacteriaceae|g__Collinsella|s__Collinsella_SGB14754’. Are ‘s__Bacteroides_SGB14754’ and ‘s__Collinsella_SGB14754’ the same? What exactly does 14754 represent in the phylogenetic tree “mpa_vJan21_CHOCOPhlAnSGB_202103.nwk”?

I’m really sorry! I made a mistake with the database version, just realized it is from 2022.