I’m working on a microbiome study aiming to identify microbial taxa significantly associated with clinical outcomes (success vs. failure). My dataset includes: 3 timepoints, three treatment methods, other covariates, and repeated measures across individual patients. Given the repeated measures and covariates in my dataset, is MaAsLin2 appropriate for identifying differentially abundant microbes across timepoints and treatment methods?
So, how should I define the formula in MaAsLin2 to account for:
Fixed effects: outcome (success/failure), timepoint, treatment method, and other covariates
Random effect: repeated measures per PatientID
I would appreciate it if anyone could share an example formula setup or best practices for configuring MaAsLin2 in this kind of scenario.
There’s a fair amount of discussion about this type of design (here, here, and a lot of old ones), but in short:
If you actually care about differences over time, the model you’ve specified will give one effect per time point, which assumes the effect of time is the same for all treatment groups. If you want different time effects per treatment group, add an interaction term (though you might need a lot of data for this to fit well).
If you don’t care about differences over time because nothing’s changing consistently for these patients with the time point, don’t include time point.
Also, MaAsLin 3 exists now in addition to 2, and it helps distinguish between prevalence and abundance effects in addition to many other ease-of-use improvements.
I’m a Master’s student in Epidemiology and would greatly appreciate your guidance on a few questions regarding my analysis.
My project is a longitudinal metagenomics study with an intervention group (n=130) and a control group (n=50). Samples were collected at two time points: baseline and at the end of follow-up.
My primary goal is to use MaAslin2 to identify taxa that are associated with the intervention, specifically by testing for a significant time * group interaction. However, after running the analysis, none of the taxa have a significant FDR-q value. I suspect this might be due to my relatively small sample size and sparse data.
My questions are:
Do you have any suggestions on how to improve this situation? Are there any specific filtering strategies, normalization methods, or model adjustments that might help increase statistical power?
Separately, I also want to find microbes that are associated with my clinical outcomes. Is it a valid approach to use MaAslin2 for a model structured like this: clinical_outcome ~ microbe * time + covariates? Or would a different tool be more appropriate for this type of model where the microbe is the predictor?