I am trying to create a nested random effect in maaslin3. I have looked up how to do this in lme4. I have tried the following in my formula:
(1|donor:patient)
which yields: Error in [.data.frame(metadata, , names(1)) :
undefined columns selected
(1|donor/patient)
which yields: Error in donor/patient : non-numeric argument to binary operator
I am putting these in my formula. donor and patient are columns in my metadata table.
I am able to get the model to run when I put it as (1|donor) + (1|patient) + and other fixed effects. Per lme4 this is technically a way to write a crossed random effect (if each patient may have not gotten the same donor). In my study each patient got the same donor.
Is there a way to make this nested in maaslin3?
Thank you for your help! Maaslin3 is awesome and I love the graphical output!
Hi,
There are a number of pre-processing steps that happen internally in order to check the metadata and make sure the results table displays the right information, so thatâs probably whatâs throwing the error with your first two formulas.
The final formula (crossed effects) is what we used in the MaAsLin 2 paper even though that was actually a nested case (subjects within a recruitment center), so if you want to use that, there is some precedent.
Iâll note that, unless you have an enormous amount of data, adding in multiple random effects (especially if theyâre nested) is probably going to cause quite a few poor model fits and not improve your validity much. If you look at the math under the hood with random effects, theyâre the same as per-individual fixed effects with a little bit of shrinkage towards a normal distribution. You might think that adding in a second layer would control for differences between groups of individuals, but if the individuals are completely nested, those effects would be naturally reabsorbed in the individual effect without the second layer. (Imagine you were using fixed effects rather than random effects throughout; these second layer intercepts would be unidentifiable/linearly dependent with the individual effects.) To me, assuming that (1) each individual has a specific effect and a group-specific effect such that both of these follow a normal distribution seems no more likely than (2) each individual has a specific effect (that already sums both the individual and supposed group effect) and that these effects follow a normal distribution.
In short, Iâd use the most nested level (probably patient in your case) as the only random intercept.
Will
Dear @WillNickols ,
Thank you for the very good explanation and quick response. I understand your explanation, and I will give it a try with just the âpatientâ as a random intercept.
One more quick question, in the figure maaslin3 generates (summary.plot), how does it decide which fixed effects are the two to choose to display? Are they the two fixed effects that are the most significantly associated? Thank you for your continued help with troubleshooting.
Yes - they are chosen based on which variables have the most significant associations. You can change them with coef_plot_vars
.
Will
1 Like
Oh, amazing! Thank you for that trick!
Hi @WillNickols
I am trying to use coef_plot_vars to specify which two fixed effects I want to include in my plot (with abundance and prevalence). This is my structure for the command: coef_plot_vars = c(âvariable1 yesâ, âvariable2 yesâ)
Does this go in the code with the forumul under âfit_out = âŚâ or is this run only under this command: maaslin_plot_results_from_output(âfoldernameâ,
coef_plot_vars = c(âvariable1 yesâ, âvariable2 yesâ),
metadata = metadata)?
Iâve tried getting it to work both ways, but when I get the maaslin_plot_results_from_output to run, I see a bunch of graphs generated in R, but no summary.plot. Any thoughts on how to structure this correctly? Iâd also like to specify which fixed effects end up under the heatmap for heatmap_vars in the summary.plot figure. If you have any code available that youâve used before so I can see how to structure my code, that would be extremely helpful and valuable! Thank you!
Does the example in the Replotting section of the vignette have what youâre looking for? The coef_plot_vars
and heatmap_vars
can go in either maaslin3
or maaslin_plot_results_from_output
.
Will
Dear @WillNickols
Thank you for pointing me towards that resource. I tried to model my code similarly, but I am unable to get it to plot the summary_plot when I specify heatmap_vars and coef_plot vars. When I specify these, I do get the association_plots but not the summary_plot.
Here is my code:
fit_out â maaslin3(
input_data = species,
input_metadata = metadata,
output = ânew_outputâ,
formula = â~ variable1 + variable2 + variable3 + variable4 + variable5 + variable6 + variable7 + variable8 + readsâ,
plot_summary_plot = TRUE,
plot_associations = TRUE,
heatmap_vars = c(âvariable1â,âvariable2â,âvariable4â,âvariable6â,âvariable7â,âvariable8â),
coef_plot_vars = c(âvariable1 yesâ,âvariable2 yesâ),
max_pngs = 30)
Iâve tried blocking out either the heatmap_vars and coef_plot_vars one at a time. The above only generates the summary_plot when both the heatmap_vars and coef_plot_vars are simultaneously blocked out.
Any suggestions?
Iâve also tried running it like this:
scatter_plots â maaslin_plot_results_from_output(
âŚ
Are all your variables continuous/binary variables? Based on the coef_plot_vars
, it looks like they are categorial with levels no
and yes
, so youâd need to make heatmap_vars
match the coef_plot_vars
style you have.
Are at least some of your associations significant? If not, it might be that there are no significant results to plot.
If neither of these solve it, could you post a minimal example (taxonomic table, metadata, and maaslin3
command) or send me an email at willnickols@g.harvard.edu? You can change the metadata and taxa names if youâd prefer to anonymize it.
Will