MaAsLin3 setup for multiple small and large random effects

Hi all,

I am working with a dataset that have longitudinal microbiome samples for different timepoints and different treatments. Sadly my data come from different sequencing batches (very evident and not negligible).

I wanted to see how the abundance of species change over time with respect to treatment. So far the best model I came up with is:

~ time:treatment + sex + age + country + n_reads + (1 | patient_ID) + (1 | sequencing_batch)

where

  • sex, age and country are inclused as fixed effect since it is known that they could play a role in the microbiome composition
  • n_reads is incluses as covariate as suggested in the github page (“Because MaAsLin 3 identifies prevalence (presence/absence) associations, sample read depth (number of reads) should be included as a covariate if available. Deeper sequencing will likely increase feature detection in a way that could spuriously correlate with metadata of interest when read depth is not included in the model.”)
  • patient_ID and sequencing_batch are inclused as random effects

My problem here is that I would consider patient_ID as small random effect (since I never have more than 4 observations for each patient). As for sequencing_batch, it should be a large random effect (>> 4 samples per group).

From my experience and understanding, it is not possible to set one small and a large random effect in MaAsLin yet, so for the model reported above I used default value (FALSE) for SMALL_RANDOM_EFFECT.
I also tried using only patient_ID as small or using only sequencing_batch as large effect.

~ time:treatment + sex + age + country + n_reads + (1 | patient_ID)
SMALL_RANDOM_EFFECT = TRUE

or

~ time:treatment + sex + age + country + n_reads + (1 | sequencing_batch)
SMALL_RANDOM_EFFECT = FALSE

Results from these three models change completely.

What would be the model you would suggest for these settings?

Thanks a lot

Are all the samples for each patient_ID within a single sequencing_batch? (For example, patients 1, 2, and 3 had all their samples in batch 1 and patients 4, 5, and 6 had theirs in batch 2.) If so, you can actually just drop the sequencing_batch and use small_random_effects on a model with only patient_ID since the sequencing batch is collinear with the subject ID but is more coarse.

If not, the safest thing is to set small_random_effects to apply to both models. The rule of thumb of 4 samples per group is more of a “samples to model complexity” measure, so I’d still use small_random_effects since you’ll have a model parameter for each patient plus a parameter for each sequencing batch (and sex, age, country, etc.).

Will