Hi, my dataset is based on a longitudinal study design. It has 5 time points (t1,t2,t3,t4,t5) and 3 groups.
I wish to use maaslin2 and I tried the following:
fit_data = Maaslin2(
input_data = gen_maas,
input_metadata = meta,
output = “output_csab_gen3”,
normalization = “NONE”,
min_prevalence = 0.1,
min_abundance = 0.001,
transform = ‘NONE’,
correction = ‘BH’,analysis_method = ‘LM’,
plot_heatmap = TRUE,
reference = c(“Group,CS_ab”,“Timepoint,t1”),
fixed_effects = c(“Group”,“Timepoint”),
random_effects = c(“Subject Number”))

But here, when I set reference to t1 it compares the other 4 time points. My question is that t1 is not a reference time point as such. It is just one time point. To be precise this is microbiome study design, and t1 is not baseline and hence not reference. Does that still mean I can use t1 as reference in the maaslin formula and use the results?

If the answer is yes, then while trying to perform longitudinal regression analysis for each group do I run maaslin with subsetting data for 2 groups at a time and running maaslin 5 times with one timepoint as reference each time and then compile the results? Or is there a better way to do this?

Hi @Dhrati_Patangia, It sounds like time should be encoded as continuous variable rather than a categorical (eg c(1,2,3,4,5) rather than c(“t1”, “t2”, “t3”, “t4”, “t5”)). You won’t need reference groups for continuous data.

Hi @nickp60 , thank you very much for your response. I have two follow up questions to your suggestion:

In this case the time point is spread over a long range, starting from t1 (1) which is week 1 and going upto t5 (2y). Would it be still okay to keep timepoint as continous?

And if that answer to 1 is yes then the scatter plot formed is for a particular group? Or just overall study subject?
In the below example, would the scatter plot formed be for the reference group or overall groups?:
fit_data = Maaslin2(
input_data = gen_maas,
input_metadata = meta,
output = “outgen_csab_tp_cont”,
normalization = “NONE”,
min_prevalence = 0.1,
min_abundance = 0.001,
transform = ‘NONE’,
correction = ‘BH’,analysis_method = ‘LM’,
plot_heatmap = TRUE,
reference = c(“Group,CS_ab”),
fixed_effects = c(“Group”,“Timepoint”),
random_effects = c(“Subject Number”))

Hi @Dhrati_Patangia,
There are a few considerations here, but the short answer is that its depends on the question. Maaslin2’s default is to perform z-score standardization on continuous metadata, so that should help in making the wide range conducive to the modelling. However, there can be cases where it might be more informative to encode continuous data as ordinal, for instance, if the response would be nonsensical at the extremes.

The resulting scatter plot for the data would show the abundance by time, regardless of group, which is just the nature of visualizing multivariate data. It can be helpful to confirm what the model shows by doing your own visualization to highlight specific trends by group, and to look at the per-subject residuals to identify any trends that might be interesting.

Hi @nickp60 Sorry for the delay in response, also thank you very much.
This does makes sense, and I guess just to get the abundance by time for each group, I will subset the dataset based on groups and go ahead from there.
Thank you once again.
Best
DP