Question about 'baselines' for effects Maaslin3

Hello

Apologies for a potentially novice question

I am using Maaslin3 with 16s meta-genomic data from cattle. I have found that alongside my variable of interest (A,B,C) i have 3 other variables that are significant confounders.

  • days the samples were taken (1,2,3 etc.) these are not multiple samples taken from the same animal, they are just the dates the samples were collected from different sites
  • sites (farm A, farm B, Farm C, etc.)
  • Sex (M/F)

My questions are.

A. should i be considering these confounders as fixed or random effects?

B. unlike my variable of interest which has a fixed ‘baseline’ (A = control), is there a way to remove/discount the influence of the other variables? As they don’t have a fixed ‘baseline’ (e.g. neither male nor female is a ‘baseline’) what should i put in the formula for maaslin3 to remove their effect.

Thank you in advance

Will

Hi Will,

A. If each site has at least 5 samples taken from it, you have no other random effects in the model, and you think there would be per-site similarity, I’d use random effects. If any of those aren’t the case, I’d use fixed effects because the random effects are going to be hard for the model to estimate well. Using fixed effects for this is strictly a more general option; you just lose some power if the random effects model is actually the “right” one. For sex, I’d use fixed effects. For days, it depends why you think the day matters to the metagenomic data. If there’s some seasonal effect or feeding schedule effect, you’re probably better off making a covariate out of that and using it. Simply including day in the model probably doesn’t make a lot of sense unless you expect a linear trend with time (or something like that).
B. As long as you don’t have interaction terms between your main variable and all the others, it doesn’t matter what the baseline is. Without interaction terms, the model implies that the effect of your main variable on the outcome is the same regardless of what day/site/sex the sample has. Note this isn’t saying your outcome is the same regardless of day/site/sex, just that the effect of your main variable on the outcome is the same.

Will

Hi Will (snap)

thank you for such a comprehensive reply!

That makes a lot of sense thank you. I have one other quick query but otherwise i will try your suggestions. You mention interaction terms? Im not quite sure what you mean by that, do you mean i should not provide a formula in the model for those variables? Or do you mean interaction as in how those variables interact with the investigated one?

Kindest regards

Will

If you specify a model like a + b + a:b your outputs show the association with a, the association with b, and the association with the product a*b (this is the interaction term). If you think that e.g. the association between the outcome and a is different for different levels of b, your a:b term will tell you that. Otherwise, if you don’t include a:b, you’re implicitly assuming (and this is typically what people do) that the association between the outcome and a is the same regardless of what a sample’s value of b is.

1 Like

Hi Nick

Ah thank you for clarifying. So if i believe that factor B is impacting factor A (as a confounder) i would include the interaction term (A:B) and if i believe they are unrelated, and wanted to see their correlated taxa independently i would just use A + B?

so in my case (and again apologies if this is completely wrong i am very new to LME4 style formulas) as my investigative variable is health and i want to see the taxa correlated with it (but also farmlocation might be affecting this)

‘~ Health + Farmlocation + Health:Farmlocation’

kindest regards

Will

Just to be more precise, you can have B be a confounder of the A → abundance relationship but still include it as A+B rather than A+B+A:B. In particular, you should use A:B if you think the effect of A on abundance is different for different levels of B.

In your case, if you had ~ Health + Farmlocation, you’d be saying that abundance varies with health and abundance also varies with Farmlocation, but how much abundance varies by health doesn’t depend on which farm you’re on. By contrast, if you thought the effect of health on abundance was different on the different farms, it would make sense to include the interaction term. For example, if some farms had healthier animals on average but health consistently determines abundance, you’d just use ~ Health + Farmlocation since Farmlocation is a classic confounder. However, if each farm was providing only their healthy animals a unique antibiotic, an interaction term would make more sense.

Will

Hi Will

thank you again for clarifying. I just realised i might’ve misread your initial message. I believe my farm location variable does meet all the criteria for a random variable.

So would that be ‘~Health + (1|FarmLocation) ‘ rather than using + or the interaction factor?

thank you for all your help!

Right - that model would say there’s an effect of health and then there are some baseline differences in abundance across the different farms.

perfect thank you!

-Will