Best practices for visualizing MetaPhlAn 4 MAG identifiers (SGB, FGB, etc.)?

Hi everyone,

I’m analyzing my shotgun metagenomics data using MetaPhlAn 4 and have a question about the best way to handle the MAG/SGB identifiers in my taxonomic abundance plots.

When I create bar plots for different taxonomic levels (Class, Order, Family, Genus), I get many abundant taxa with identifiers like CFGB..., OFGB..., FGB..., and SGB.... While I understand these are valid uncultured genomes from the GTDB, they make the plot legends difficult to interpret since they are just codes.

My question is: What is the standard practice for labeling these identifiers in a way that is informative?

So far, I’ve tried to rename them by taking the last known classified taxon and appending a suffix like “MAG” or “unclassified.” For example, if SGB_10023 belongs to the Family Lachnospiraceae but has no classified Genus, I would group it and label it as “Lachnospiraceae (MAG)” in my genus-level plot.

Is this a sound approach, or is there a better, more accepted method? How do you typically handle these abundant but unclassified taxa to make your visualizations clear without discarding important data?

Thanks for any advice!

Hi @Leticia_gomez

I would keep the SGB labels and add an extra bar below the plot to specify which are the family/genus/etc. of belonging. If you just rename them all as e.g Lachnospiraceae SGBs you have to be careful because you may see a high abundance from the same group but don’t know how many SGBs are contributing to it and to which abundance