Does Maaslin2 supports longitudinal data analysis?

Chloe · July 27, 2021, 7:35pm

Hi developer,

May I know does MaAsLin2 supports longitudinal microbiome data analysis?
I have attached my demo meta data. I have 200 samples(treatment: 100 case vs 100 control), belongs to 40 subjects, while each subject has 5 timepoints. I want to test the effect of treatment, time.

Much appreciated!
demo.csv (8.5 KB)

himel.mallick · July 28, 2021, 1:40pm

Hi @Chloe - it does. You need to specify subjects as random effects (to account for repeated measures) and treatment, time as fixed effects.

All the best,
Himel

Krithika_Srinivasan · August 5, 2022, 2:45pm

Hi @himel.mallick - i have a microbiome study where i have 1200 samples belonging to around 300 subjects. The weight of these samples are measured at 4 timepoints. I want to understand the top biomarkers responsible for weight gain. What should my design formula be?
Can I consider bodyweight(continuous variable) and time point (V1, V2, V3, and V4) as fixed effects?? Should I also provide the subject as a random effect in this case?

Please advise.

himel.mallick · August 12, 2022, 8:11pm

Hi @Krithika_Srinivasan - I am afraid you may need to run the analysis in a pairwise fashion with two-time points at a time to extract the features responsible for weight gain. More details about this specific modeling problem can be found here but after subsetting, you need to create an interaction term between bodyweight and time point (main effect of interest) and include it as a fixed effect along with other main effects (bodyweight and time point) and subject as a random effect in your pairwise MaAsLin 2 runs. To correctly adjust for multiple testing for the interaction term, you will need to subset the MaAsLin2 results to this specific contrast of interest and re-calculate the q-values. Hope this helps!

Mak0130 · September 2, 2022, 4:26pm

Hi @himel.mallick - I have a longitudinal microbiome study in which we sampled 200 mice at 4 different timepoints (0 months, 6 months, 16 months, and 21 months). Additionally, the mice were randomized into 4 treatment groups (A, B, C, D), with A being the reference group.

In the first iteration of my model I ran the Maaslin2 function with the following parameters:

normalization = “TSS”,
analysis_method = “LM”
transform = “LOG”
correction = “BH”
fixed_effects = c(“Group”, “Timepoint”)
random_effects = c(“Sample.ID”)
reference = c(“Group, A”, “Timepoint,0 month”)

This worked great, but I wanted to model the interaction between Time and Treatment/Group. However, because both terms are categorical, I’m unsure how to do this. Could you provide some guidance?

Mak0130 · September 6, 2022, 2:39pm

@himel.mallick I ended up creating an Interaction variable “Time*Group” which resulted in the following levels:

Time_0_Group_A
Time_0_Group_B
Time_0_Group_C
Time_0_Group_D
Time_6_Group_A
Time_6_Group_B
Time_6_Group_C
Time_6_Group_D
Time_16_Group_A
Time_16_Group_B
Time_16_Group_C
Time_16_Group_D
Time_21_Group_A
Time_21_Group_B
Time_21_Group_C
Time_21_Group_D

Thus I updated my model as follows:

normalization = “TSS”,
analysis_method = “LM”
transform = “LOG”
correction = “BH”
fixed_effects = c(“Group”, “Timepoint”, “TimeGroup")
random_effects = c(“Sample.ID”)
reference = c(“Group, A”, “Timepoint,0 month”, "TimeGroup, Time_0_Group_A”)

This worked fine. However, I’m unsure if making the Time_0_Group_A level the reference group is what I would want. I want comparisons within each Group/Timepoint, not just everything against Time_0_Group_A. However, as it is currently set up, all levels are compared against the Time_0_Group_A level.

Please advise.

andrewGhazi · September 26, 2022, 9:40pm

@Mak0130 A couple points to note:

If you want to include time as a linear trend, you’ll need to convert it to a numeric variable so that the model gets a sense of the order and time between the time points. Otherwise Maaslin just treats each time point as a separate, independent group of observations, which isn’t very “longitudinal”.
You could in theory use the time:group interaction workaround you’ve already implemented to estimate variation in the trends by group, but as you’ve noticed you’ll need to specify a reference trend. It would be more appropriate to use a random slopes model (something like (time | group) using lme4 syntax), but Maaslin2 doesn’t include that functionality.
Leaving aside the model setup, I’m a bit worried that you won’t have enough data – there’s uncertainty in estimating the trend, and there’s uncertainty in estimating the differences between groups, so there is necessarily even more uncertainty when estimating differences in trends by group. This may or may not be an issue, it depends on how noisy your data are.

Mak0130 · September 26, 2022, 10:43pm

@andrewGhazi Hi Andrew,

Thanks for the reply. I’ll convert the time variable from categorical/ordinal to numeric to see how the observed associations change.
Is there any good R packages you would recommend that could better model longitudinal data and with interaction variables?
I was able to find some significant time*diet interactions, but at the final time point my N number decreased and the data became more variable, which resulted in seeing no significant interactions at the last time point, in line with your expectations.

andrewGhazi · September 27, 2022, 7:48pm

lme4 is the most widely used package for mixed effects models, but I tend to trust the results from rstanarm (you’d want rstanarm::stan_glmer() I think) or brms more.

Dhrati_Patangia · February 7, 2023, 4:56pm

Hi, I have a similar question: Is it okay to use Maaslin2 by removing the grouping variable and examining only the effect of time point? I have 3 groups and 5 time points, and I wish to see the increase or decrease per taxa over time.
Would it be okay to use this set up each time per group?:
fit_data = Maaslin2(
input_data = spe_grp1_alltp,
input_metadata = meta_grp1_Alltp,
output = “output_alltp_grp1_species”,
normalization = “NONE”,
min_prevalence = 0.1,
min_abundance = 0.001,
transform = ‘NONE’,
correction = ‘BH’,analysis_method = ‘LM’,
plot_heatmap = TRUE,
fixed_effects = c(“Timepoint”),
random_effects = c(“Subject Number”))

Q2: Why does changing the time point format make a difference in the results? For instance using 1,2,3 vs using actual time points like 1,4,24?

Any help would be appreciated, thank you in advance.

andrewGhazi · February 7, 2023, 6:29pm

Is it okay to use Maaslin2 by removing the grouping variable and examining only the effect of time point? I have 3 groups and 5 time points, and I wish to see the increase or decrease per taxa over time.

That depends on whether the grouping variable you mention affects the abundance of taxa. You’ll have to use your scientific understanding of your experiment to implement an analysis that accounts for relevant sources of variation.

As to your second question, if your Timepoint variable is numeric, Maaslin2 uses a linear trend with time. So the change from 1 to 2 should be the same as from 2 to 3. If you change the time points to 1, 4, 24, the change from 1 to 4 should be about (4 - 1) / (24 - 4) = 3/20ths of the change from 4 to 24.

Dhrati_Patangia · February 8, 2023, 8:56am

Hi, yes our grouping variable does have an effect on the abundance of taxonomy. So in this case how would you suggest should one treat longitudinal analysis in maaslin2?

For the second question, yes my time point variable is numeric. But should one add the actual time points? Like for example my time points are in weeks and are: 1,8,24,40. So should I write these week numbers or simply 1,2,3,4 - because both these options give different results.

Thank you

andrewGhazi · February 8, 2023, 3:02pm

If your grouping variable affects taxa abundance, then you should include it as a fixed or random effect as appropriate. If you don’t know which is appropriate, you can find background information on mixed effects models in the Fitting Linear Mixed-Effects Models using lme4 vignette of the lme4 package (and many other places via web search).

If you expect the longitudinal trends to be linear with real time, then you should use the real time points.

Caffery_Yang · September 27, 2023, 11:48am

Hi @Chloe ,

MaAsLin2 is a powerful tool for microbiome data analysis, but it primarily focuses on cross-sectional data analysis. For longitudinal microbiome data analysis, especially with repeated measurements over time, MicrobiomeStat is an excellent choice. It’s specifically designed to handle time series data, making it easier to test the effects of treatment, time, and other variables in your study.

You can easily adapt your data and perform longitudinal analysis using MicrobiomeStat. It offers a wide range of functionalities for this purpose and can efficiently handle your dataset with multiple timepoints.

Give MicrobiomeStat a try for your longitudinal microbiome data analysis needs. It should be a valuable tool for your research project!

Best regards,

Topic		Replies	Views
MaAsLin2 and microbiota development over time MaAsLin	5	1318	February 12, 2021
MaAsLin 2 for longitudinal analysis with different individuals MaAsLin	6	671	October 25, 2023
Longitudinal data analysis without baseline MaAsLin	4	709	February 14, 2022
Repeated exposure measures with microbiome outcome MaAsLin	1	704	May 13, 2021
Longitudinal analysis setup MaAsLin	1	364	July 25, 2022

Does Maaslin2 supports longitudinal data analysis?

Related topics