database version: mpa_vJun23_CHOCOPhlAnSGB_202403
software version: MetaPhlAn version 4.1.1 (11 Mar 2024)
I am a beginner.
When I run metaphlan using the following code and draw a stacked plot to see the species composition. There are many GGB and SGB(species-level genome bins) species in the top species.These species do not have corresponding NCBI IDs. How should I deal with these species in subsequent analyses?
Hello, I am also facing the same issue.
As some of these species are emerging as some of the most abundant and prevalent, I have been unable to find any literature or predict their functional role. Is there any way we can get these genomes? so that one can at least explain the presence of these SGBs based on their metabolic potential.
Any suggestions are welcome.
Thank you
In my knowledge, SGB, GGB and FGB are as unnamed species, genus and families. They are not unknown as long they are not included in that group but they can not be assigned to a known and named species, genus or family, respectively.
I usually include them in the analysis because I hope they will be taxonomically named in the future.
as you correctly stated, these are unknown SGBs for which no isolate exists yet, only MAGs, and therefore we cannot have a corresponding taxonomy from NCBI. If you look at the full profile you can know higher taxonomic levels for these SGBs to have at least some information on their taxonomy.
What you can do to get more information is assembling your metagenomic samples and assign the bins to SGBs using the PhyloPhlAn routine phylophlan_assign_sgbs.py (see tutorial). If you manage to reconstruct the genome you are interested in, it will be assigned the corresponding uSGB and you can further study your genome.
Thank you for your replies.
I did assemble genomes, but did not retrieve those SGBs by assembling, but some SGBs are coming abundant in read based analysis (perhaps the depth wasn’t enough to assemle these gnomes). So I was curious as to what is the function of these SGBs, anyway I can get the SGB genomes from Phylophlan or chocophlan databases?
I am undertaking this project with minimal experience in database building processes. So, the Metaphlan GitHub page does not mention anything about making SGB-specific marker sets, except for the explanation given in the main paper methodology. My question is, while defining the markers for an SGB, what identity and coverage should I consider to eliminate gene sequences already present in the Metaphlan marker database?
Hi, I hope I’m commenting under the correct post. I have been trying to find information on this and finally - I think I can ask with confidence that I could not find the answer myself.
I understand that there is no connection between SGBs and taxonomy from NCBI for some species, but I’m interested in the MAGs that form SGBs, I have found the fasta files for all SGBs, but they are not named SGB___ instead, they have names from metastudies/binners etc. Is there a file or a way to define which SGB is which .fasta file?
Thank you for the help in advance!