Hello, and thank you for developing MaAsLin 3—it’s an incredibly valuable tool for our work!
I have three questions regarding its usage. We are analyzing microbiome data obtained via 16S amplicon sequencing. Our study includes two groups: 30 participants who received a placebo formulation and 28 participants who received the active formulation. Microbiome samples were collected at baseline (Week 0) and after four weeks (Week 4). To reflect this design, we prepared two categorical variables: Formula = c(Ctrl, Active) and Week = c(0w, 4w).
-
In the feature table, several taxa could not be assigned and are labeled as “unidentified.” In some samples, these “unidentified” features account for roughly 30% of total reads. Should these unidentified features be retained when inputting data into MaAsLin 3, or is it more appropriate to remove them beforehand?
-
Our goal is to evaluate whether the temporal changes in bacterial abundance observed in the Active group differ from those in the Control group. Would the following model specification be appropriate for this purpose?
formula = "~ Formula*Week + Reads + (1|Participant_ID)"My concern is that the main effect of Formula under this specification may implicitly compare {(Ctrl, 0w) + (Ctrl, 4w)} with {(Active, 0w) + (Active, 4w)}, which may not reflect the specific contrast of interest. What model structure would be most appropriate for analyzing differential week-to-week changes between groups?
-
We have a total of 58 participants, but approximately 100 potential covariates. I assume it is necessary to restrict the number of fixed effects to around six or fewer. If this is the case, would it be reasonable to always include key covariates such as Formula, Week, and Reads, and then run multiple models while rotating the remaining covariate candidates?
Please let me know if any part of my description requires clarification. Thank you very much for your time and assistance!
Best regards,
Sho