Hi all,
I am working with a dataset that have longitudinal microbiome samples for different timepoints and different treatments. Sadly my data come from different sequencing batches (very evident and not negligible).
I wanted to see how the abundance of species change over time with respect to treatment. So far the best model I came up with is:
~ time:treatment + sex + age + country + n_reads + (1 | patient_ID) + (1 | sequencing_batch)
where
- sex, age and country are inclused as fixed effect since it is known that they could play a role in the microbiome composition
- n_reads is incluses as covariate as suggested in the github page (“Because MaAsLin 3 identifies prevalence (presence/absence) associations, sample read depth (number of reads) should be included as a covariate if available. Deeper sequencing will likely increase feature detection in a way that could spuriously correlate with metadata of interest when read depth is not included in the model.”)
- patient_ID and sequencing_batch are inclused as random effects
My problem here is that I would consider patient_ID as small random effect (since I never have more than 4 observations for each patient). As for sequencing_batch, it should be a large random effect (>> 4 samples per group).
From my experience and understanding, it is not possible to set one small and a large random effect in MaAsLin yet, so for the model reported above I used default value (FALSE) for SMALL_RANDOM_EFFECT.
I also tried using only patient_ID as small or using only sequencing_batch as large effect.
~ time:treatment + sex + age + country + n_reads + (1 | patient_ID)
SMALL_RANDOM_EFFECT = TRUE
or
~ time:treatment + sex + age + country + n_reads + (1 | sequencing_batch)
SMALL_RANDOM_EFFECT = FALSE
Results from these three models change completely.
What would be the model you would suggest for these settings?
Thanks a lot