Sign of coef when variables are binary

Daniel · April 4, 2022, 3:31pm

Thank you for creating and maintaining such a useful tool!

I have a question regarding a MaAsLin2 analysis of humann3 data from metagenomic sequencing performed in two different bird populations. I ran the following association test using the variables “Phase” (factor w/ 2 levels, “breeding” or “staging”) and “Mihat” (numeric):

fit <- Maaslin2( analysis_method = "NEGBIN", normalization = "CSS", transform = "NONE", input_data = data_df, input_metadata = map2, output = file.path(Save_fp, "Maaslin2_path_abundance"), fixed_effects = c("Phase", "Mihat"), min_prevalence = 0.5, )

But there is a bit of confusion among colleagues about interpreting the results, especially in light of Himel’s response to this question.

The significant_results.tsv file looks like this:

feature	metadata	value	coef	stderr	N	N.not.0	pval	qval
PWY0.1586	Mihat	Mihat	-0.359878416	0.04065671	20	15	8.62E-19	2.78E-16
PWY.6151	Mihat	Mihat	0.341375963	0.043378182	20	17	3.55E-15	5.72E-13
PWY.6270	Mihat	Mihat	0.335625435	0.054723826	20	15	8.62E-10	3.08E-08
GLUCOSE1PMETAB.PWY	Phase	Breeding	-0.873864422	0.144191751	20	13	1.36E-09	4.37E-08
PWY.5103	Mihat	Mihat	0.289748826	0.048018476	20	15	1.60E-09	4.68E-08

Is it correct to say that:

(1) In metadata = Mihat rows, features that have a negative coef are negatively correlated with Mihat?

The first pathway, PWY0.1586, has a negative coef, while the second (PWY.6151) has a positive coef, yet when you examine the graphs that were generated, both slopes are negative. Is the sign of the coef related to the direction of the association? If not, how should we be interpreting this?

(2) In metadata = Phase rows, features that have a negative coef are negatively correlated with the factor in the “value” column of that row.

For example, there is a negative coef in the GLUCOSE1PMETAB.PWY row. Does this mean that GLUCOSE1PMETAB.PWY is negatively associated with the “breeding” phase? Or does it mean that “breeding” is the reference factor level, and therefore a negative coef means the pathway is negatively associated with the opposite factor level, i.e. staging?

Similar to above, if you examine the plots, the pathway abundance all seem to be elevated in the “staging” group regardless of the sign on the coef.

Could you help us better understand how to interpret and draw conclusions from these outputs?

Thanks so much!

Kelsey_Thompson · April 5, 2022, 1:41pm

Hi @Daniel,

Thanks for the questions. I believe your second point is easier to address. The association MaAsLin is reporting for categorical variables is with the listed variable. So in this case MaAsLin identified a negative association between GLUCOSE… and the “breeding” category. It reports it this way because it is often that people use multiple level categories. If I read your question correctly, this follows the correct trend.

For the first question - I’m not exactly sure what is going on with the underlying data. Your interpretation seems to be correct - e.g. the sign of the coef should relate to the direction of the effect, but it is a little concerning that for both you are observing a negative slope. My question without seeing the data would be if there are any outliers that could be influencing the overall trend in one of those plots? Since MaAsLin plots the raw data and not the transformed data that the model uses sometimes it can be hard to translate these directly. Without playing with the data, I’m not sure I can provide a better explanation. Feel free to either send the plots or a minimally reproducible example of the data, if you still have questions.

I hope this helped,
Kelsey

Daniel · April 5, 2022, 2:47pm

Hi Kelsey,

Thank you so much for your reply!

I’m attaching a minimally reproducible example, with selected pathways that have both negative and positive coef with both the “Phase” and “Mihat” variables.

data.txt (951 Bytes)
metadata.txt (517 Bytes)

data_df <- read.table(file.path(Save_fp, "data.txt"))
metadata_df <- read.table(file.path(Save_fp, "metadata.txt"))

fit <- Maaslin2( analysis_method = "NEGBIN", normalization = "CSS", transform = "NONE", input_data = data_df, input_metadata = metadata_df, output = file.path(Save_fp, "out"), fixed_effects = c("Phase", "Mihat"), min_prevalence = 0.5, correction = "BH" )

The output of significant_results.tsv is:

feature	metadata	value	coef	stderr	N	N.not.0	pval	qval
GLUCOSE1PMETAB.PWY	Phase	Staging	1.237952129	0.026297008	20	13	0	0
PWY.6151	Mihat	Mihat	0.397912439	0.007030367	20	17	0	0
PWY0.1586	Mihat	Mihat	-0.318368495	0.007106026	20	15	0	0
PWY0.1586	Phase	Staging	0.25547133	0.012150081	20	15	3.77E-98	7.54E-98
PWY.6151	Phase	Staging	-0.271048109	0.017040735	20	17	5.77E-57	9.23E-57
GLUCOSE1PMETAB.PWY	Mihat	Mihat	0.020244803	0.008522274	20	13	0.017524411	0.023365881
PANTO.PWY	Phase	Staging	-0.26252161	0.181873555	20	20	0.148899588	0.170170958

Phase = Staging (note: factor orders switched from above), both pathways are more abundant in the staging category. Although I’m realizing now looking at them more closely that this may be driven by a large number of zeros in the Breeding category. Does that sound right?

GLUCOSE1PMETAB.PWY (coef = +1.237952129)

PANTO.PWY (coef = -0.26252161)

For the continuous variable Mihat, both slopes are negative:

PWY.6151 (coef = +0.397912439)

PWY0.1586 (coef = -0.318368495)

And while I’m at it, I had another unrelated question. Is this the correct way to convert the coef value to log2 fold change?

res <- fit$results
fc <- exp(res$coef)
res$log2_fc <- log2(fc)

Thanks so much!

Topic		Replies	Views
MaasLin2: Heatmap Vs Significance MaAsLin	8	3585	January 5, 2022
Coefficient in maaslin2 output MaAsLin	2	3302	May 13, 2021
How to interprete Maaslin output? MaAsLin	3	2005	May 23, 2021
Why do coefs differ from raw glm/lmer, and why do they depend on number of input features? MaAsLin	9	2053	May 31, 2020
Number of metadata variables impact stats? MaAsLin	1	490	July 17, 2020

Sign of coef when variables are binary

Related topics