RPKM or other gene-length normalized feature value for MMUPhin

Hi bioBakery team,

I understand that MMUPhin only accepts proportions and read count (i.e., integer) values in the adjust_batch routine. However, we have some features where it would be important to also normalize for gene-length (in addition to sequencing depth). In this case, we have RPKM values that are floats / doubles that aren’t proportional in nature (i.e. don’t sum to 1 in each sample). Do you have any thoughts or suggestions on how or if we could include gene-length normalization (using something like RPKM in the feature_abd matrix) within the adjust_batch function?

Thank you for your support,
Ian

Sorry that we took so long to respond to this. Your RPKM values should be normalized to gene length already – that’s what the “per kilobase” part of RPKM means. To convert the RPKM values to proportions you can simply divide each value by the sum of its sample.