Statistical Approach for Comparing Functional Potential of Selected SGBs vs. Community Background

I have a set of metagenomic samples that have been processed using BioBakery tools.

Let’s say I’ve identified several SGBs of interest. I’d like to compare their genetic potential (pathways/EC’s) to that of the rest of the microbial community (i.e., all other SGBs). What would be an appropriate statistical approach to test this?

I assume it’s important to account for differences in SGB’s relative abundances - if the SGBs of interest are less abundant overall, I would naturally expect them to contribute less to a given functional feature.

To complicate things a bit more, the groups are unbalanced: I’m comparing ~40 SGBs of interest against >1000 other SGBs. What would be a good way to address both the abundance weighting and the imbalance in group size?

Thanks!