Hi everyone,
I’m analyzing my shotgun metagenomics data using MetaPhlAn 4 and have a question about the best way to handle the MAG/SGB identifiers in my taxonomic abundance plots.
When I create bar plots for different taxonomic levels (Class, Order, Family, Genus), I get many abundant taxa with identifiers like CFGB..., OFGB..., FGB..., and SGB.... While I understand these are valid uncultured genomes from the GTDB, they make the plot legends difficult to interpret since they are just codes.
My question is: What is the standard practice for labeling these identifiers in a way that is informative?
So far, I’ve tried to rename them by taking the last known classified taxon and appending a suffix like “MAG” or “unclassified.” For example, if SGB_10023 belongs to the Family Lachnospiraceae but has no classified Genus, I would group it and label it as “Lachnospiraceae (MAG)” in my genus-level plot.
Is this a sound approach, or is there a better, more accepted method? How do you typically handle these abundant but unclassified taxa to make your visualizations clear without discarding important data?
Thanks for any advice!