Normalize output?

Several questions related to normalization:

(1) I need to normalize the output?
(2) Are the outputs in proportions?
(3) Could I perform centered-log ratio transformation or additive-log ratio on the output?

You mentioned in the manuscript:

…quantile-transform the input features (species or gene family abundances) to the quantiles of a standard normal distribution in order to improve the detection power of the elastic net model.

(4) Could I perform the same normalization to the input features? This “sounds” like a good idea since the prediction model was trained on this type of normalization.

Hi @aimirza - the output is designed to be on a compositional (proportional) scale and likewise, a log-ratio transformation can be applied for follow-up analysis purposes. In principle, they can be analyzed as real TSS-normalized metabolite data.

You could potentially do that, but currently, we don’t have an option to turn this off as it is applied to the input features by default when you run MelonnPan.

Hi @himel.mallick. Thank you for the prompt response. Normally log-ratio transformation for compositional data analysis are performed for counts, not proportions. For example, ALDEx2 requires counts as inputs and not proportions. Would there be any issues to convert the proportions back to counts by multiplying the metabolite proportions by the original UniRef90 total counts for each sample? Essentially reversing the TSS normaliziation.

Hi @aimirza - sorry for the delay. Let me make sure I understand the question.

Are you trying to use ALDEx2 on the predicted compounds? If yes, I am not sure why you would want to use a compositional approach for analyzing metabolites given the limited literature support (let me know if otherwise).

In general, I would not recommend trying to reconstruct predicted metabolite counts from UniRef90 total counts especially given that they have gone through several non-reversible intermediate steps during the prediction process.

I am trying to use compositional data analysis (CoDA), not necessarily ALDEx2. Because the data is compositional (has a sum constraint) I would like to use log-ratio transformations to move the data from a simplex space to an Euclidean vector space. After this transforamtion, standard statistical methods would be appropriate to use.

Thats what I feared… I guess CoDA approaches won’t work then. Thanks for the info!