Single copy gene normalization

danpal · November 12, 2025, 7:24pm

In a metagenome, you can calculate the estimated number of copies of a gene per genome (let’s call it CPG). One way to estimate this value is to first calculate, for each sample, the geometric mean of the abundances of a set of universal single-copy genes (let’s call it the SCG geometric mean), and then normalize the abundance of the genes by this SCG geometric mean. I would like to use MaAsLin3 to compare the CPG of my genes between groups. To do this, I would input the abundance table normalized by the SCG geometric mean, with normalization = NONE and median_comparison_abundance = FALSE. Do you think this is a valid use of MaAsLin3?

nearinj · November 13, 2025, 6:27pm

Hi @danpal,

I’m not particularly familiar with this type of normalization but it sounds sort of similar to a CLR (without the log and using a different reference frame etc.). Which worked fine in MaAsLin3 although note, in our testing we did find that the defaults for MaAsLin3 worked better than CLR.

As such I cannot guarantee whether this normalization will led to any change in performance. If you plan to do this analysis I would make sure at the very least to look at the diagnostic plots/plot the relationships your interested in to make sure they look reasonable. Moreover, I would encourage you to think deeply about your normalization and the biological question you are asking by using it. I think what you described is reasonable but it’s important to realize that by normalizing the data in this way you are altering the type of question you are asking as to compared to using something like TSS.

Cheers,
Jacob Nearing

danpal · November 13, 2025, 7:02pm

Thanks for the response. In my case, I mainly use it for the normalization of antimicrobial resistance genes. If I have a CPG value of 2, it means that, on average, each bacterium has two copies of that gene. I think this measure is much more interpretable for genes than simply expressing that a gene has a certain relative abundance within the pool of all the genes being considered.

Topic		Replies	Views
CLR normalization and min_abundance in MaAsLin3 MaAsLin	4	430	March 13, 2025
Question Regarding the Use of MaAsLin3 for Analyzing Differential Abundance/Prevalence of MetaCyc and GMM Metabolic Pathways MaAsLin	4	100	November 4, 2025
MaAslin2 Question about gFC calculation and interpretation MaAsLin	1	60	June 16, 2025
Can MaAsLin2 be applied on gene coverage data? MaAsLin	1	302	September 30, 2021
Differential pathway abundance analysis by Masslin3 MaAsLin	1	213	April 10, 2025

Single copy gene normalization

Related topics