How to deal with GGB and SGB species?

wangyang1749 · September 1, 2025, 2:30am

database version: mpa_vJun23_CHOCOPhlAnSGB_202403
software version: MetaPhlAn version 4.1.1 (11 Mar 2024)

I am a beginner.
When I run metaphlan using the following code and draw a stacked plot to see the species composition. There are many GGB and SGB(species-level genome bins) species in the top species.These species do not have corresponding NCBI IDs. How should I deal with these species in subsequent analyses?

    metaphlan \
        --input_type fastq \
        ${reads[0]},${reads[1]} \
        ${bowtie2out} \
        --biom ${meta.id}.biom \
        --output_file ${meta.id}_profile.txt \
        --bowtie2db $metaphlan_db_latest  \
        --index $metaphlan_db_index

I found that the parameter --ignore_usgbs can remove GGB SGB

This has the same effect as directly deleting the species containing SGB and GGB and recalculating the relative abundances

abundance_rank= abundance_rank[~abundance_rank["taxonomy"].str.contains("GGB|SGB", regex=True)]
abundance_cols = abundance_rank.columns.drop("taxonomy")
abundance_rank[abundance_cols] = abundance_rank[abundance_cols].div(abundance_rank[abundance_cols].sum(axis=0), axis=1) * 100

How should I use information on GGB and SGB species in my specific research?

Vikas · September 5, 2025, 10:00am

Hello, I am also facing the same issue.
As some of these species are emerging as some of the most abundant and prevalent, I have been unable to find any literature or predict their functional role. Is there any way we can get these genomes? so that one can at least explain the presence of these SGBs based on their metabolic potential.
Any suggestions are welcome.
Thank you

imontero · September 5, 2025, 4:33pm

In my knowledge, SGB, GGB and FGB are as unnamed species, genus and families. They are not unknown as long they are not included in that group but they can not be assigned to a known and named species, genus or family, respectively.

I usually include them in the analysis because I hope they will be taxonomically named in the future.

Claudia_Mengoni · September 11, 2025, 10:08am

Hi,

as you correctly stated, these are unknown SGBs for which no isolate exists yet, only MAGs, and therefore we cannot have a corresponding taxonomy from NCBI. If you look at the full profile you can know higher taxonomic levels for these SGBs to have at least some information on their taxonomy.

What you can do to get more information is assembling your metagenomic samples and assign the bins to SGBs using the PhyloPhlAn routine phylophlan_assign_sgbs.py (see tutorial). If you manage to reconstruct the genome you are interested in, it will be assigned the corresponding uSGB and you can further study your genome.

Vikas · September 11, 2025, 11:36am

Hi @imontero and @Claudia_Mengoni ,

Thank you for your replies.
I did assemble genomes, but did not retrieve those SGBs by assembling, but some SGBs are coming abundant in read based analysis (perhaps the depth wasn’t enough to assemle these gnomes). So I was curious as to what is the function of these SGBs, anyway I can get the SGB genomes from Phylophlan or chocophlan databases?

wangyang1749 · September 22, 2025, 8:05am

Hi, I have previously tried to obtain the gene sequence of metaphlan marker，You can refer to here metaphlan_dev.ipynb or MetaPhlAn 4 · biobakery/MetaPhlAn Wiki

Vikas · September 24, 2025, 4:43am

Hi.. I made my peace with GGBs and SGBs. Thank you so much for this. You helped me with one of the major issues I was facing, mentioned in this post. Make custom metaphlan database by adding some more genomes

I am undertaking this project with minimal experience in database building processes. So, the Metaphlan GitHub page does not mention anything about making SGB-specific marker sets, except for the explanation given in the main paper methodology. My question is, while defining the markers for an SGB, what identity and coverage should I consider to eliminate gene sequences already present in the Metaphlan marker database?

ursamiklavcic · September 25, 2025, 1:07pm

Hi, I hope I’m commenting under the correct post. I have been trying to find information on this and finally - I think I can ask with confidence that I could not find the answer myself.

I understand that there is no connection between SGBs and taxonomy from NCBI for some species, but I’m interested in the MAGs that form SGBs, I have found the fasta files for all SGBs, but they are not named SGB___ instead, they have names from metastudies/binners etc. Is there a file or a way to define which SGB is which .fasta file?
Thank you for the help in advance!

Topic		Replies	Views
Scientific names not found in NCBI Taxonomy Browser MetaPhlAn	3	515	March 31, 2025
Interpreting MetaPhlAn4 SGB taxonomy MetaPhlAn	7	3364	June 1, 2023
Missing bacterial species in Metaphlan4 MetaPhlAn	1	555	November 28, 2022
Retrieve FASTA files for SGB genomes MetaPhlAn	0	158	April 1, 2024
Connecting SGB ID with fasta of a MAG/genome? MetaPhlAn	0	44	October 28, 2025

How to deal with GGB and SGB species?

Related topics