I’m attempting to use MaAsLin 2 for gut longitudinal analysis (samples from different locations in the gut, from the duodenum to the descending colon). I have samples from multiple animals. I’d like to identify microbes that are strongly associated with the different sites of the GI tract and how it relates to levels of metabolites of interest at that location. For the longitudinal information (site), I have it in both categorical (duodenum, jejunum…) and numerical (meters from the beginning of the GI tract).
- Should I use “site” for my fixed_effects? Is there a difference between using categorical and numerical values? I’m assuming I need to analyze one site at a time (reference =c(“site,duodenum”))? Is this the correct way to do it or there is a better way to do it all at the same time (e.g., using the numerical site values)? Because I have a lot of sites and metabolites.
fit_data_random = Maaslin2(input_data = df_input_data,
input_metadata = df_input_metadata,
min_prevalence = 0,
normalization = “NONE”,
output = “demo_output_random”,
fixed_effects = c(“site”, “concentration_of_metabolite_1”),
random_effects = c(“subject”),
reference = c(“site,duodenum”))
- How should I set up the random effects? I am assuming the random effects should be set to “subject”. I am a bit confused because in the tutorial you mentioned “If you are interested in testing the effect of time in a longitudinal study, then the time point variable should be included in fixed_effects during your MaAsLin 2 call.” If that’s the case since I am interested in the effect of the “site” (which is the longitudinal information), should I also put “site” in random effects?
- It depends on how you would like to compare your final results. When using a categorical variable as you suggested you would need to set a reference. In that way you get to look at how microbes differ between sites as compared to the reference variable (i.e. which microbes are significantly related to each site as compared to your reference). If you would like to get at whether there is some sort of effect on how far along you are in the colon it might make sense to measure that in numeric units (i.e. 10 cms, 20 cms etc.)
You may also be interested in looking into ordered monotonic predictors although at this time maaslin2 does not support this type of input by default.
- In your case its most likely you would want a random intercept for each subject. In the tutorial we were reference that if you have repeated sampling you may be interested in including time as a fixed effect to see how microbes change in association with time.
Hope that helps
Thank you for your quick response!
That’s right, I am interested in the effect on microbes based on how far along the colon. However, instead of discrete comparison, I want to find a way to do continuous analysis based on distance. Just to make sure I understand you correctly, by ordered monotonic predictors you mean a list of increasing integer values, such as the distance data I have (e.g., 10, 20, 30,… unit cm)? And MaAsLin 2 does not support this type of input data? If that’s the case, could you recommend any other bioBakery tools or other algorithms that might be useful for my purpose?
I apologize if I have too many questions, I am very new to this . Thank you!
Ordered monotonic predictors are categorical values that retain order during modeling and as such allows for their order relationship to be held.
I think in your case if you want to compare microbial abundance as a function of distance I would suggest coding distance numerically in the units that you measured and including it as a fixed effect. This type of analysis is support by Maaslin2.