RPKM or other gene-length normalized feature value for MMUPhin

ijmiller2 · June 3, 2022, 7:52pm

Hi bioBakery team,

I understand that MMUPhin only accepts proportions and read count (i.e., integer) values in the adjust_batch routine. However, we have some features where it would be important to also normalize for gene-length (in addition to sequencing depth). In this case, we have RPKM values that are floats / doubles that aren’t proportional in nature (i.e. don’t sum to 1 in each sample). Do you have any thoughts or suggestions on how or if we could include gene-length normalization (using something like RPKM in the feature_abd matrix) within the adjust_batch function?

Thank you for your support,
Ian

andrewGhazi · July 6, 2022, 2:18pm

Sorry that we took so long to respond to this. Your RPKM values should be normalized to gene length already – that’s what the “per kilobase” part of RPKM means. To convert the RPKM values to proportions you can simply divide each value by the sum of its sample.

Topic		Replies	Views
Gene length normalization HUMAnN	2	1229	November 7, 2020
Normalization method at HUMAnN4a output HUMAnN	2	44	April 11, 2025
MMUPHin lm_meta: Questions regarding input file and normalization method MMUPHin	2	459	October 18, 2022
Batch correction for microbiome data MMUPHin	0	14	May 29, 2025
Adjust_batch 'Feature table does not appear to be either proportions or counts!' MMUPHin	5	661	August 19, 2021

RPKM or other gene-length normalized feature value for MMUPhin

Related topics