Normalize output?

aimirza · February 20, 2021, 1:47am

Several questions related to normalization:

(1) I need to normalize the output?
(2) Are the outputs in proportions?
(3) Could I perform centered-log ratio transformation or additive-log ratio on the output?

You mentioned in the manuscript:

…quantile-transform the input features (species or gene family abundances) to the quantiles of a standard normal distribution in order to improve the detection power of the elastic net model.

(4) Could I perform the same normalization to the input features? This “sounds” like a good idea since the prediction model was trained on this type of normalization.

himel.mallick · February 20, 2021, 6:51pm

Hi @aimirza - the output is designed to be on a compositional (proportional) scale and likewise, a log-ratio transformation can be applied for follow-up analysis purposes. In principle, they can be analyzed as real TSS-normalized metabolite data.

You could potentially do that, but currently, we don’t have an option to turn this off as it is applied to the input features by default when you run MelonnPan.

aimirza · February 22, 2021, 8:14pm

Hi @himel.mallick. Thank you for the prompt response. Normally log-ratio transformation for compositional data analysis are performed for counts, not proportions. For example, ALDEx2 requires counts as inputs and not proportions. Would there be any issues to convert the proportions back to counts by multiplying the metabolite proportions by the original UniRef90 total counts for each sample? Essentially reversing the TSS normaliziation.

himel.mallick · February 23, 2021, 11:51pm

Hi @aimirza - sorry for the delay. Let me make sure I understand the question.

Are you trying to use ALDEx2 on the predicted compounds? If yes, I am not sure why you would want to use a compositional approach for analyzing metabolites given the limited literature support (let me know if otherwise).

In general, I would not recommend trying to reconstruct predicted metabolite counts from UniRef90 total counts especially given that they have gone through several non-reversible intermediate steps during the prediction process.

aimirza · February 24, 2021, 12:03am

I am trying to use compositional data analysis (CoDA), not necessarily ALDEx2. Because the data is compositional (has a sum constraint) I would like to use log-ratio transformations to move the data from a simplex space to an Euclidean vector space. After this transforamtion, standard statistical methods would be appropriate to use.

Thats what I feared… I guess CoDA approaches won’t work then. Thanks for the info!

Topic		Replies	Views
Transformation of preprocessed metabolomic data for using as MelonnPan input file MelonnPan	1	394	November 20, 2022
Normalize or Trensform? MaAsLin	1	376	April 5, 2022
TSS normalization of normalized metabolomics data MaAsLin	1	239	January 10, 2024
About using pre-trained model to predict metabolites from 16S amplicon data MelonnPan	19	943	December 12, 2020
Error in ztransform MelonnPan	6	1040	March 25, 2020

Normalize output?

Related topics