Handling 3-way interactions

Hi there,

I am new to MaAsLin and have been really enjoying this tool so far! A big thank you for the existence of this forum which has answered a lot of my questions.

Currently, I am in the process of trying to include interaction terms and understand that these have to be created manually - which is fine. The example given is very helpful but I am not sure how MaAsLin handles more complicated interactions. In my case, I want to include a 3-way interaction with the following variables:

  • Mental Health (MH) status - “MH” vs “No MH”
  • Irritable bowel syndrome (IBS) status - “IBS” vs “No IBS”
  • A continuous diet quality score

My question is whether it is enough to create a single 3-way interaction column e.g.

input_metadata$Diet_IBS_MH = (input_metadata$IBS_Status == “IBS”) *
(input_metadata$MH_Status == “MH”) *
input_metadata$Diet_score

or whether I need to also manually create the other 2-way interactions that would be part of the 3-way interaction e.g.

input_metadata$Diet_MH = (input_metadata$MH_Status == “MH”) *
input_metadata$Diet_score

input_metadata$Diet_IBS = (input_metadata$IBS_Status == “IBS”) *
input_metadata$Diet_score

input_metadata$IBS_MH = (input_metadata$IBS_Status == “IBS”) *
(input_metadata$MH_Status == “MH”)

Any help would be really appreciated :slight_smile:

Djamila

I recommend using model.matrix() to set up interaction variables e.g. model.matrix(~disp*am*vs, data = mtcars) .

By the way you will likely need an exorbitant quantity of data to get good inference on a three-way interaction term. Detecting interaction effects is much harder than main effects: You need 16 times the sample size to estimate an interaction than to estimate a main effect | Statistical Modeling, Causal Inference, and Social Science

Hi Andrew,

Thank you very much for your swift response, I really appreciate it!

As for the first point, using the model.matrix() approach worked well - thanks for the recommendation.

As for the second point, that’s a reasonable concern and something I need to discuss further with a statistician. While I am interested in being able to detect the interactions, these are not central to my research question - the primary aim is to investigate the main effect of diet and how that is associated with gut microbiota. However, I do still want to see whether the associations between diet and microbiota (if any) are moderated by whether participants have a mental health diagnosis or an IBS diagnosis.

In that sense, when running the model (and assuming the sample size is not a concern), the output showed that there was indeed a diet x IBS x MH interaction (as well as several Diet x IBS and Diet x MH interactions). But I am unsure what to do with this information. Is there a way to do a post-hoc test to see where the interactions lie? And am I still able to meaningfully interpret any associations with diet alone if those interactions are being controlled for?

I also have one more - slightly unrelated - question:

  • Is it necessary to manually add a pseudo count before using the in-built CLR transformation in MaAsLin, or is that automatically taken care of?

Thank you again for your generous time!

Djamila

The outputs should identify specific bugs alongside each reported model term. If you don’t see specific bugs in your outputs take a closer look. You’ll want to look at the distribution of the abundances for the bugs of interest identified by your three-way interaction term. Given that two of your variables are binary, the three-way interaction term estimate will be driven entirely by observations where those two binary variables are both “on”.

So you’ll probably want to do some visualizations of the data for the identified bugs and ask yourself “Does variation in diet in MH+IBS subjects explain variation in abundance of bug X above and beyond the component explained by all the other lower order regression terms?” As you can see it’s pretty tricky to verbalize/think about/visualize, which comes with the territory when using interaction terms.

MaAsLin already adds its own pseudocount when using CLR as you can see here: Maaslin2/utility_scripts.R at master · biobakery/Maaslin2 · GitHub

Hi Andrew,

Thank you again for your swift and helpful response, I am immensely grateful! I will definitely try these out but I can already see what you mean by it being difficult to interpret…

Fingers crossed, and thanks again!

1 Like

If we want an interaction term between 2 factors that each have multiple levels, do we include each of many variables that are created by model.matrix()?

Hi Matt,

Sorry I missed this earlier but yes you would include each of the variables that were created by model.matrix().

thanks,
Jacob