Hello
Apologies for a potentially novice question
I am using Maaslin3 with 16s meta-genomic data from cattle. I have found that alongside my variable of interest (A,B,C) i have 3 other variables that are significant confounders.
- days the samples were taken (1,2,3 etc.) these are not multiple samples taken from the same animal, they are just the dates the samples were collected from different sites
- sites (farm A, farm B, Farm C, etc.)
- Sex (M/F)
My questions are.
A. should i be considering these confounders as fixed or random effects?
B. unlike my variable of interest which has a fixed âbaselineâ (A = control), is there a way to remove/discount the influence of the other variables? As they donât have a fixed âbaselineâ (e.g. neither male nor female is a âbaselineâ) what should i put in the formula for maaslin3 to remove their effect.
Thank you in advance
Will
Hi Will,
A. If each site has at least 5 samples taken from it, you have no other random effects in the model, and you think there would be per-site similarity, Iâd use random effects. If any of those arenât the case, Iâd use fixed effects because the random effects are going to be hard for the model to estimate well. Using fixed effects for this is strictly a more general option; you just lose some power if the random effects model is actually the ârightâ one. For sex, Iâd use fixed effects. For days, it depends why you think the day matters to the metagenomic data. If thereâs some seasonal effect or feeding schedule effect, youâre probably better off making a covariate out of that and using it. Simply including day in the model probably doesnât make a lot of sense unless you expect a linear trend with time (or something like that).
B. As long as you donât have interaction terms between your main variable and all the others, it doesnât matter what the baseline is. Without interaction terms, the model implies that the effect of your main variable on the outcome is the same regardless of what day/site/sex the sample has. Note this isnât saying your outcome is the same regardless of day/site/sex, just that the effect of your main variable on the outcome is the same.
Will
Hi Will (snap)
thank you for such a comprehensive reply!
That makes a lot of sense thank you. I have one other quick query but otherwise i will try your suggestions. You mention interaction terms? Im not quite sure what you mean by that, do you mean i should not provide a formula in the model for those variables? Or do you mean interaction as in how those variables interact with the investigated one?
Kindest regards
Will
If you specify a model like a + b + a:b your outputs show the association with a, the association with b, and the association with the product a*b (this is the interaction term). If you think that e.g. the association between the outcome and a is different for different levels of b, your a:b term will tell you that. Otherwise, if you donât include a:b, youâre implicitly assuming (and this is typically what people do) that the association between the outcome and a is the same regardless of what a sampleâs value of b is.
1 Like
Hi Nick
Ah thank you for clarifying. So if i believe that factor B is impacting factor A (as a confounder) i would include the interaction term (A:B) and if i believe they are unrelated, and wanted to see their correlated taxa independently i would just use A + B?
so in my case (and again apologies if this is completely wrong i am very new to LME4 style formulas) as my investigative variable is health and i want to see the taxa correlated with it (but also farmlocation might be affecting this)
â~ Health + Farmlocation + Health:Farmlocationâ
kindest regards
Will
Just to be more precise, you can have B be a confounder of the A â abundance relationship but still include it as A+B rather than A+B+A:B. In particular, you should use A:B if you think the effect of A on abundance is different for different levels of B.
In your case, if you had ~ Health + Farmlocation, youâd be saying that abundance varies with health and abundance also varies with Farmlocation, but how much abundance varies by health doesnât depend on which farm youâre on. By contrast, if you thought the effect of health on abundance was different on the different farms, it would make sense to include the interaction term. For example, if some farms had healthier animals on average but health consistently determines abundance, youâd just use ~ Health + Farmlocation since Farmlocation is a classic confounder. However, if each farm was providing only their healthy animals a unique antibiotic, an interaction term would make more sense.
Will
Hi Will
thank you again for clarifying. I just realised i mightâve misread your initial message. I believe my farm location variable does meet all the criteria for a random variable.
So would that be â~Health + (1|FarmLocation) â rather than using + or the interaction factor?
thank you for all your help!
Right - that model would say thereâs an effect of health and then there are some baseline differences in abundance across the different farms.