I have a set of metagenomic samples that have been processed using BioBakery tools.
Let’s say I’ve identified several SGBs of interest. I’d like to compare their genetic potential (pathways/EC’s) to that of the rest of the microbial community (i.e., all other SGBs). What would be an appropriate statistical approach to test this?
I assume it’s important to account for differences in SGB’s relative abundances - if the SGBs of interest are less abundant overall, I would naturally expect them to contribute less to a given functional feature.
To complicate things a bit more, the groups are unbalanced: I’m comparing ~40 SGBs of interest against >1000 other SGBs. What would be a good way to address both the abundance weighting and the imbalance in group size?
Thanks!