Scientific names not found in NCBI Taxonomy Browser

I’m using Metaphlan 4.0 and cannot understand what these labels mean in the scientific names below:

k__Bacteria|p__Firmicutes|c__CFGB1534|o__OFGB1534|f__FGB1534|g__GGB3886|s__GGB3886_SGB5269
k__Bacteria|p__Firmicutes|c__CFGB1762|o__OFGB1762|f__FGB1762|g__GGB4533|s__GGB4533_SGB6246
k__Bacteria|p__Bacteroidetes|c__CFGB570|o__OFGB570|f__FGB570|g__GGB1203|s__GGB1203_SGB1568
k__Bacteria|p__Proteobacteria|c__CFGB4196|o__OFGB4196|f__FGB4196|g__GGB12441|s__GGB12441_SGB19290
k__Bacteria|p__Firmicutes|c__CFGB1230|o__OFGB1230|f__FGB1230|g__GGB3008|s__GGB3008_SGB3999 k__Bacteria|p__Firmicutes|c__CFGB1395|o__OFGB1395|f__FGB1395|g__GGB3388|s__GGB3388_SGB4476

I’ve searched in the NCBI extensively and I don’t understand where these labels were taken and what they mean (for example: s__GGB3008_SGB3999). I’m having trouble justifying them in my research.

Hi @Anderson
The taxonomies you are showing here above correspond to several unknown SGBs (uSGBs), i.e. SGBs defined purely by metagenomic-assembled genomes (MAGs). As the uSGBs by definition do not contain any reference genome in NBCI, some part of their taxonomy are totally unknown and thus, we assign them a numeric identifier. The 6 cases you are showing below are uSGBs that are unknown up to the phylum level, this is, we did not find any reference genome that shared, at least, 70% identity with them, and thus, we are only confident to assign a phylogeny up to the phylum.
If you are interested in more details about the SGBs and the MetaPhlAn 4 database, please, have a look at the following works:
https://doi.org/10.1016/j.cell.2019.01.001

1 Like

Hello,
Thanks for the answer that explains my case also :).
But how can I rely on the genomes that have been used ? I would like to take a look at the genome where the genes come from because i have interesting results.

Hi @aitor.blancomiguez,

If I understand the meaning of u[F|G|S]GBs correctly, they represent e.g. Family-level MAG clusters with no reference genome. However, in the taxonomy we also find OFGBs and CFGBs. What do these represent exactly? I am asking because upon comparing various taxonomic profiling tools, at these levels MetaPhlAn identifies ~150 different orders in my dataset, where any other method (Kraken, Sourmash, mOTUs) only finds about 20. Is it simply because any FGB that cannot be assigned to a known order will get a unique OFGB identifier? This would explain why even if in theory many of these FGBs could share the same Order, they are assigned different Order identifiers.

Below is a figure of the number of unique taxa found in two separate metagenome datasets (each with ~30ish samples) that illustrates what I’m referring to. For example, in the NAFLD dataset, 122 of the unique 157 order labels are OFGBs. I’m just trying to figure out why.