MaAsLin 2 for longitudinal analysis with different individuals


I’m attempting to use MaAsLin 2 for gut longitudinal analysis (samples from different locations in the gut, from the duodenum to the descending colon). I have samples from multiple animals. I’d like to identify microbes that are strongly associated with the different sites of the GI tract and how it relates to levels of metabolites of interest at that location. For the longitudinal information (site), I have it in both categorical (duodenum, jejunum…) and numerical (meters from the beginning of the GI tract).

  1. Should I use “site” for my fixed_effects? Is there a difference between using categorical and numerical values? I’m assuming I need to analyze one site at a time (reference =c(“site,duodenum”))? Is this the correct way to do it or there is a better way to do it all at the same time (e.g., using the numerical site values)? Because I have a lot of sites and metabolites.

fit_data_random = Maaslin2(input_data = df_input_data,
input_metadata = df_input_metadata,
min_prevalence = 0,
normalization = “NONE”,
output = “demo_output_random”,
fixed_effects = c(“site”, “concentration_of_metabolite_1”),
random_effects = c(“subject”),
reference = c(“site,duodenum”))

  1. How should I set up the random effects? I am assuming the random effects should be set to “subject”. I am a bit confused because in the tutorial you mentioned “If you are interested in testing the effect of time in a longitudinal study, then the time point variable should be included in fixed_effects during your MaAsLin 2 call.” If that’s the case since I am interested in the effect of the “site” (which is the longitudinal information), should I also put “site” in random effects?

Thank you!


Hi there,

  1. It depends on how you would like to compare your final results. When using a categorical variable as you suggested you would need to set a reference. In that way you get to look at how microbes differ between sites as compared to the reference variable (i.e. which microbes are significantly related to each site as compared to your reference). If you would like to get at whether there is some sort of effect on how far along you are in the colon it might make sense to measure that in numeric units (i.e. 10 cms, 20 cms etc.)

You may also be interested in looking into ordered monotonic predictors although at this time maaslin2 does not support this type of input by default.

  1. In your case its most likely you would want a random intercept for each subject. In the tutorial we were reference that if you have repeated sampling you may be interested in including time as a fixed effect to see how microbes change in association with time.

Hope that helps
Jacob Nearing

Hi Jacob,

Thank you for your quick response!

That’s right, I am interested in the effect on microbes based on how far along the colon. However, instead of discrete comparison, I want to find a way to do continuous analysis based on distance. Just to make sure I understand you correctly, by ordered monotonic predictors you mean a list of increasing integer values, such as the distance data I have (e.g., 10, 20, 30,… unit cm)? And MaAsLin 2 does not support this type of input data? If that’s the case, could you recommend any other bioBakery tools or other algorithms that might be useful for my purpose?

I apologize if I have too many questions, I am very new to this :laughing:. Thank you!


Hi Jiangshan,

Ordered monotonic predictors are categorical values that retain order during modeling and as such allows for their order relationship to be held.

I think in your case if you want to compare microbial abundance as a function of distance I would suggest coding distance numerically in the units that you measured and including it as a fixed effect. This type of analysis is support by Maaslin2.

Jacob Nearing

Hi @nearinj
I have a question on similar lines. I have 2 groups over time in my study. As mentioned in the tutorial I used both Group and timepoint as reference with timepoint being numeric and not categorical.
My formula is as follows:
fit_data = Maaslin2(
input_data = gen_maas,
input_metadata = meta,
output = “maaslin_groups_run2”,
normalization = “NONE”,
min_prevalence = 0.1,
min_abundance = 0.001,
transform = ‘NONE’,
correction = ‘BH’,analysis_method = ‘LM’,
plot_heatmap = TRUE,
reference = c(“Timepoint,1”, “Group,Paper”),
fixed_effects = c(“Group”,“Timepoint”),
random_effects = c(“Subject”))

I tried using this without any reference of group and with reference too, I get same results in both cases.
The significant results file has results based on 2 metadata (Group and timepoint).
My main question is:

  1. The timepoint plots gives an increasing or decreasing trend, but that does not mention for which group it is increasing or decreasing. Does this mean that all results are with respect to one group only? Say here for my paper group it shows an increasing/decreasing trend of taxa?
  2. The group comparison shows everything with respect to one group, say paper when it is used as reference or even based on alphabetical order if nothing is mentioned. So in this case what can we say about those differentially associated to group2?

Any help would be appreciated

Hi @Dhrati_Patangia ,

  1. It sounds like you are interested in determining the effect of both time and group in this instance. In this case you probably would want to model the interaction between these two effects. This should help you answer the question of what impact both time and group have on my microbial features. You can check out the Maaslin2 tutorial for the best way to code interaction effects.

  2. Since Maaslin2 is using linear models to compute potential differences between metadata variables of interest you must select a reference group that all other values within that metadata category are compared against. So all of the coefficents output by Maaslin2 represent a measurement of the different between your reference group and the group within the result table. To test for differences between two non-reference groups you would need to compute further testing using standard pairwise testing.

Hope that’s helpful.

Jacob Nearing

HI @nearinj
Thank you for your response, so you mean I should make a fixed term column with group1_time1, group1_time2 etc and use that? I tried this and even using this doesn’t help me understand the trend of a taxa in a group.
My aim is to 1. understand the differentially abundant taxa between the two groups and 2. check for the increase or decrease of taxa in a group over time.

I will try the pairwise testing you suggested.
Thank you very much, that helps.