Hello
Apologies for a potentially novice question
I am using Maaslin3 with 16s meta-genomic data from cattle. I have found that alongside my variable of interest (A,B,C) i have 3 other variables that are significant confounders.
- days the samples were taken (1,2,3 etc.) these are not multiple samples taken from the same animal, they are just the dates the samples were collected from different sites
- sites (farm A, farm B, Farm C, etc.)
- Sex (M/F)
My questions are.
A. should i be considering these confounders as fixed or random effects?
B. unlike my variable of interest which has a fixed ‘baseline’ (A = control), is there a way to remove/discount the influence of the other variables? As they don’t have a fixed ‘baseline’ (e.g. neither male nor female is a ‘baseline’) what should i put in the formula for maaslin3 to remove their effect.
Thank you in advance
Will
Hi Will,
A. If each site has at least 5 samples taken from it, you have no other random effects in the model, and you think there would be per-site similarity, I’d use random effects. If any of those aren’t the case, I’d use fixed effects because the random effects are going to be hard for the model to estimate well. Using fixed effects for this is strictly a more general option; you just lose some power if the random effects model is actually the “right” one. For sex, I’d use fixed effects. For days, it depends why you think the day matters to the metagenomic data. If there’s some seasonal effect or feeding schedule effect, you’re probably better off making a covariate out of that and using it. Simply including day in the model probably doesn’t make a lot of sense unless you expect a linear trend with time (or something like that).
B. As long as you don’t have interaction terms between your main variable and all the others, it doesn’t matter what the baseline is. Without interaction terms, the model implies that the effect of your main variable on the outcome is the same regardless of what day/site/sex the sample has. Note this isn’t saying your outcome is the same regardless of day/site/sex, just that the effect of your main variable on the outcome is the same.
Will
Hi Will (snap)
thank you for such a comprehensive reply!
That makes a lot of sense thank you. I have one other quick query but otherwise i will try your suggestions. You mention interaction terms? Im not quite sure what you mean by that, do you mean i should not provide a formula in the model for those variables? Or do you mean interaction as in how those variables interact with the investigated one?
Kindest regards
Will
If you specify a model like a + b + a:b your outputs show the association with a, the association with b, and the association with the product a*b (this is the interaction term). If you think that e.g. the association between the outcome and a is different for different levels of b, your a:b term will tell you that. Otherwise, if you don’t include a:b, you’re implicitly assuming (and this is typically what people do) that the association between the outcome and a is the same regardless of what a sample’s value of b is.