I processed samples from a project using Metaphlan 4.0.6 and obtained an abundance file (Please see the attachment “PRJNA389927_unknown_m4.csv”.).
Now, I am using mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt to obtain detailed information about the species in the abundance file.
However, out of the 1008 species in the abundance file, 384 cannot be found in mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.
Out of the 384 species, I found that 261 species have ‘GGB’ in their names, which appears to be a code for the genus. As an example, for s__GGB10692_SGB17347, in mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt, I found the corresponding line of information for SGB17347: ‘SGB17347 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridia_unclassified|f__Clostridia_unclassified|g__Clostridia_unclassified|s__Clostridia_bacterium’. Does this indicate that ‘GGB10692’ is ‘Clostridia’ and ‘SGB17347’ refers to ‘bacterium’? Alternatively, are there any other methods to match the information for these 300+ species in mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt?
Additionally, does ‘SGB17347’ represent the ‘17347’ in mpa_vJan21_CHOCOPhlAnSGB_202103.nwk?
PRJNA389927_unknown_m4.csv (763.9 KB)