Paired samples by ID

Hi, I want to use maaslin2 for a cohort with a dietary treatment. I have one sample before and one after for almost all subjects but some are lacking one sample. The subjectIDs (“PatientGroup”) are as follows in my metadata file:
1 3 4 5 6 8 10 11 12 14 15 18 19 21 22 23 27 28 31 32 35 36 37 38 39 41
2 3 4 5 6 8 10 11 12 18 19 22 23 25 27 28 32 33 34 35 36 37 38 39 41 42
So, I want to use only sample pairs and I want maaslin to take the paired nature into account. Thus subject 1,14,15 etc should be ignored. However, when I run
fit_data2 = Maaslin2(
input_data = df_input_data,
input_metadata = df_input_metadata_noAB,
output = “output_TimePoint_no_AB_re_group”,
fixed_effects = c(“TimePoint”),
random_effects = c(“PatientGroup”))
The output refers to n=52 (26 for each time point) which is the total number of samples I have, including those missing their pair. What am I doing wrong?

Hi @Stef - MaAsLin 2’s random effect model by default considers all available observations. If you want to restrict yourself to paired samples only, you need to remove them before running the MaAsLin 2 model. Does it make sense?

Thanks for your reply! So if maaslin does not pair my samples according to the value in PatientGroup, how will it pair them after I removed the unpaired? Do they have to be in a specific order?

Apologies for the confusion. My original comment was based on the specific need of your analysis since you wanted to exclude single-sample subjects.

In general, as long as the random effect variable (in your case PatientGroup) has non-unique values (in your case at most 2 values per subject), a subject-sample correspondence is maintained before the random effect model is called. That way, regardless of whether you exclude single-sample subjects or not, MaAsLin 2 will always use the subject-sample information (i.e. which samples belong to which subjects) in the modeling step.

Please let me know if anything is unclear or if you have any questions.

Dear Himel,
So if I understand correctly, the analysis is performed in a paired nature for those samples where the PatientGroup is not unique and in an unpaired nature for those where PatientGroup is unique? How is this put together for the final results? I just want to know exactly what I am doing :slight_smile:

Hi @Stef - I suggest the following paper which serves as an excellent resource on mixed effect models:

MaAsLin 2’s approach is essentially a random intercept model that allows per-subject means to vary but assumes that all subjects have a common slope for a fitted covariate which is essentially the fixed effect you are interested in.

The random effect essentially takes care of the non-independence of the observations (due to the repeated measures) and calculates per-subject intercepts before accounting for them in your final estimates. Likewise, even when you have only one observation for some subjects, you will get an estimate of the corresponding subject-level intercept. The flexibility of this approach comes from the fact that your final parameter estimates are based on a joint analysis of all time points, thus contributing to potentially increased power. I hope this makes it a little clearer but let me know if not.