Sign of coef when variables are binary

Thank you for creating and maintaining such a useful tool!

I have a question regarding a MaAsLin2 analysis of humann3 data from metagenomic sequencing performed in two different bird populations. I ran the following association test using the variables “Phase” (factor w/ 2 levels, “breeding” or “staging”) and “Mihat” (numeric):

fit <- Maaslin2( analysis_method = "NEGBIN", normalization = "CSS", transform = "NONE", input_data = data_df, input_metadata = map2, output = file.path(Save_fp, "Maaslin2_path_abundance"), fixed_effects = c("Phase", "Mihat"), min_prevalence = 0.5, )

But there is a bit of confusion among colleagues about interpreting the results, especially in light of Himel’s response to this question.

The significant_results.tsv file looks like this:

feature metadata value coef stderr N N.not.0 pval qval
PWY0.1586 Mihat Mihat -0.359878416 0.04065671 20 15 8.62E-19 2.78E-16
PWY.6151 Mihat Mihat 0.341375963 0.043378182 20 17 3.55E-15 5.72E-13
PWY.6270 Mihat Mihat 0.335625435 0.054723826 20 15 8.62E-10 3.08E-08
GLUCOSE1PMETAB.PWY Phase Breeding -0.873864422 0.144191751 20 13 1.36E-09 4.37E-08
PWY.5103 Mihat Mihat 0.289748826 0.048018476 20 15 1.60E-09 4.68E-08

Is it correct to say that:

(1) In metadata = Mihat rows, features that have a negative coef are negatively correlated with Mihat?

The first pathway, PWY0.1586, has a negative coef, while the second (PWY.6151) has a positive coef, yet when you examine the graphs that were generated, both slopes are negative. Is the sign of the coef related to the direction of the association? If not, how should we be interpreting this?

(2) In metadata = Phase rows, features that have a negative coef are negatively correlated with the factor in the “value” column of that row.

For example, there is a negative coef in the GLUCOSE1PMETAB.PWY row. Does this mean that GLUCOSE1PMETAB.PWY is negatively associated with the “breeding” phase? Or does it mean that “breeding” is the reference factor level, and therefore a negative coef means the pathway is negatively associated with the opposite factor level, i.e. staging?

Similar to above, if you examine the plots, the pathway abundance all seem to be elevated in the “staging” group regardless of the sign on the coef.

Could you help us better understand how to interpret and draw conclusions from these outputs?

Thanks so much!

Hi @Daniel,

Thanks for the questions. I believe your second point is easier to address. The association MaAsLin is reporting for categorical variables is with the listed variable. So in this case MaAsLin identified a negative association between GLUCOSE… and the “breeding” category. It reports it this way because it is often that people use multiple level categories. If I read your question correctly, this follows the correct trend.

For the first question - I’m not exactly sure what is going on with the underlying data. Your interpretation seems to be correct - e.g. the sign of the coef should relate to the direction of the effect, but it is a little concerning that for both you are observing a negative slope. My question without seeing the data would be if there are any outliers that could be influencing the overall trend in one of those plots? Since MaAsLin plots the raw data and not the transformed data that the model uses sometimes it can be hard to translate these directly. Without playing with the data, I’m not sure I can provide a better explanation. Feel free to either send the plots or a minimally reproducible example of the data, if you still have questions.

I hope this helped,
Kelsey

Hi Kelsey,

Thank you so much for your reply!

I’m attaching a minimally reproducible example, with selected pathways that have both negative and positive coef with both the “Phase” and “Mihat” variables.

data.txt (951 Bytes)
metadata.txt (517 Bytes)

data_df <- read.table(file.path(Save_fp, "data.txt"))
metadata_df <- read.table(file.path(Save_fp, "metadata.txt"))

fit <- Maaslin2( analysis_method = "NEGBIN", normalization = "CSS", transform = "NONE", input_data = data_df, input_metadata = metadata_df, output = file.path(Save_fp, "out"), fixed_effects = c("Phase", "Mihat"), min_prevalence = 0.5, correction = "BH" )

The output of significant_results.tsv is:

feature metadata value coef stderr N N.not.0 pval qval
GLUCOSE1PMETAB.PWY Phase Staging 1.237952129 0.026297008 20 13 0 0
PWY.6151 Mihat Mihat 0.397912439 0.007030367 20 17 0 0
PWY0.1586 Mihat Mihat -0.318368495 0.007106026 20 15 0 0
PWY0.1586 Phase Staging 0.25547133 0.012150081 20 15 3.77E-98 7.54E-98
PWY.6151 Phase Staging -0.271048109 0.017040735 20 17 5.77E-57 9.23E-57
GLUCOSE1PMETAB.PWY Mihat Mihat 0.020244803 0.008522274 20 13 0.017524411 0.023365881
PANTO.PWY Phase Staging -0.26252161 0.181873555 20 20 0.148899588 0.170170958

Phase = Staging (note: factor orders switched from above), both pathways are more abundant in the staging category. Although I’m realizing now looking at them more closely that this may be driven by a large number of zeros in the Breeding category. Does that sound right?

GLUCOSE1PMETAB.PWY (coef = +1.237952129)

PANTO.PWY (coef = -0.26252161)

For the continuous variable Mihat, both slopes are negative:

PWY.6151 (coef = +0.397912439)

PWY0.1586 (coef = -0.318368495)

And while I’m at it, I had another unrelated question. Is this the correct way to convert the coef value to log2 fold change?

res <- fit$results
fc <- exp(res$coef)
res$log2_fc <- log2(fc)

Thanks so much!