I’m attempting to use MaAsLin 2 for gut longitudinal analysis (samples from different locations in the gut, from the duodenum to the descending colon). I have samples from multiple animals. I’d like to identify microbes that are strongly associated with the different sites of the GI tract and how it relates to levels of metabolites of interest at that location. For the longitudinal information (site), I have it in both categorical (duodenum, jejunum…) and numerical (meters from the beginning of the GI tract).
- Should I use “site” for my fixed_effects? Is there a difference between using categorical and numerical values? I’m assuming I need to analyze one site at a time (reference =c(“site,duodenum”))? Is this the correct way to do it or there is a better way to do it all at the same time (e.g., using the numerical site values)? Because I have a lot of sites and metabolites.
fit_data_random = Maaslin2(input_data = df_input_data,
input_metadata = df_input_metadata,
min_prevalence = 0,
normalization = “NONE”,
output = “demo_output_random”,
fixed_effects = c(“site”, “concentration_of_metabolite_1”),
random_effects = c(“subject”),
reference = c(“site,duodenum”))
- How should I set up the random effects? I am assuming the random effects should be set to “subject”. I am a bit confused because in the tutorial you mentioned “If you are interested in testing the effect of time in a longitudinal study, then the time point variable should be included in fixed_effects during your MaAsLin 2 call.” If that’s the case since I am interested in the effect of the “site” (which is the longitudinal information), should I also put “site” in random effects?
- It depends on how you would like to compare your final results. When using a categorical variable as you suggested you would need to set a reference. In that way you get to look at how microbes differ between sites as compared to the reference variable (i.e. which microbes are significantly related to each site as compared to your reference). If you would like to get at whether there is some sort of effect on how far along you are in the colon it might make sense to measure that in numeric units (i.e. 10 cms, 20 cms etc.)
You may also be interested in looking into ordered monotonic predictors although at this time maaslin2 does not support this type of input by default.
- In your case its most likely you would want a random intercept for each subject. In the tutorial we were reference that if you have repeated sampling you may be interested in including time as a fixed effect to see how microbes change in association with time.
Hope that helps
Thank you for your quick response!
That’s right, I am interested in the effect on microbes based on how far along the colon. However, instead of discrete comparison, I want to find a way to do continuous analysis based on distance. Just to make sure I understand you correctly, by ordered monotonic predictors you mean a list of increasing integer values, such as the distance data I have (e.g., 10, 20, 30,… unit cm)? And MaAsLin 2 does not support this type of input data? If that’s the case, could you recommend any other bioBakery tools or other algorithms that might be useful for my purpose?
I apologize if I have too many questions, I am very new to this . Thank you!
Ordered monotonic predictors are categorical values that retain order during modeling and as such allows for their order relationship to be held.
I think in your case if you want to compare microbial abundance as a function of distance I would suggest coding distance numerically in the units that you measured and including it as a fixed effect. This type of analysis is support by Maaslin2.
I have a question on similar lines. I have 2 groups over time in my study. As mentioned in the tutorial I used both Group and timepoint as reference with timepoint being numeric and not categorical.
My formula is as follows:
fit_data = Maaslin2(
input_data = gen_maas,
input_metadata = meta,
output = “maaslin_groups_run2”,
normalization = “NONE”,
min_prevalence = 0.1,
min_abundance = 0.001,
transform = ‘NONE’,
correction = ‘BH’,analysis_method = ‘LM’,
plot_heatmap = TRUE,
reference = c(“Timepoint,1”, “Group,Paper”),
fixed_effects = c(“Group”,“Timepoint”),
random_effects = c(“Subject”))
I tried using this without any reference of group and with reference too, I get same results in both cases.
The significant results file has results based on 2 metadata (Group and timepoint).
My main question is:
- The timepoint plots gives an increasing or decreasing trend, but that does not mention for which group it is increasing or decreasing. Does this mean that all results are with respect to one group only? Say here for my paper group it shows an increasing/decreasing trend of taxa?
- The group comparison shows everything with respect to one group, say paper when it is used as reference or even based on alphabetical order if nothing is mentioned. So in this case what can we say about those differentially associated to group2?
Any help would be appreciated
Thank you for your response, so you mean I should make a fixed term column with group1_time1, group1_time2 etc and use that? I tried this and even using this doesn’t help me understand the trend of a taxa in a group.
My aim is to 1. understand the differentially abundant taxa between the two groups and 2. check for the increase or decrease of taxa in a group over time.
I will try the pairwise testing you suggested.
Thank you very much, that helps.