Group comparison per timepoint in a longitudinal study

Hi all,

I have a longitudinal metagenomic dataset and since I am still new in bioinformatics, I would appreciate some help and guidance. I have used Metaphlan4 for taxonomic profiling.

About the dataset. I have a cohort of 40 participants, with saliva and plaque samples taken from each participant at three timepoints throughout the trial: at baseline, after 2 weeks with no oral hygiene (week 2) and again after 2 weeks with resumed oral hygiene (week 4). At each timepoint, we have also recorded the amount of bleeding from the gums. We have used the difference in bleeding between baseline and week 2 to divide the participants into three groups: fast responders, moderate responders and slow responders.

My idea was so use MaAsLin3 to identify taxa that differ significantly between the three responder groups for each timepoint and for each sample type, i.e. saliva samples at baseline. However, when I try to subset my dataset this way, MaAsLin3 finds that all associations had errors or were insignificant (even though I put max_significance = 1).

I have added my code underneath. ps_up is the name of my phyloseq object.

ps_tmp ← ps_up %>%
subset_samples(Sample_Type == “Saliva” & Period == “Baseline”) %>%
subset_taxa(Class != “UNCLASSIFIED”) %>%
transform_sample_counts(function(x) x/sum(x) * 1)

tax_table(ps_tmp) ← tax_table(ps_tmp)[ , “Species”, drop = FALSE]

fit_taxa ← maaslin3(
input_data = ps_tmp %>% otu_table() %>%  t() %>% data.frame(),
input_metadata = ps_tmp %>% sample_data()  %>% data.frame(),
output = “/Users/lnp524/Desktop/maaslin3/tax_saliva”,
fixed_effects = c(“Inflammation_response”),
random_effects = c(“Subject”),
max_significance = 1,
reference = “Slow”,
normalization = “NONE”,
transform = “LOG”
)

Is there anything I could change in my code to get the result I want? Otherwise, I would appreciate some insights into a better way of analyzing my dataset, since I am probably not using MaAsLin3 in the best way.

Many thanks in advance!

Caroline

Hi Caroline,

Everything you’ve described seems reasonable, and I don’t see any obvious issues in the parameters you’re specifying. Would you mind posting here/emailing me (willnickols@g.harvard.edu) the log file and/or a chunk of the data that reproduces the problem so I can look at what’s actually going wrong?

Will

Hi Will,

I have attached the maaslin3.log file and the all_results.tsv file here. When I look into the all_results.tsv file there is this error code: number of levels of each grouping factor must be < number of observations (problems: Subject).

When I subset my dataset as in the code above (only saliva samples, from baseline) I end up with only one sample per subject – is this the issue?

Thank you for your help!

(Attachment maaslin3.log is missing)

(attachments)

all_results.tsv (372 KB)

Yep - that’s probably the issue. There should be more data points than random effect levels or else each group’s random effect completely determines its outcome, causing everything else to break.

Hi Will,

Thank you for your help! If you have an idea of how I can use MaAsLin3 for the analysis I want, then please let me know. I was considering pooling the saliva and plaque samples for each timepoint, so I end up with two samples per subject, however, I fear the composition of the two sample types are too distinct for my aim.

Caroline

For the baseline analysis you were describing above, the fix should be to just drop the random effect. If you don’t have repeated sampling (i.e. the same person multiple times), the random effect isn’t necessary at all.

More generally, in this case I’d fit 2 models: one for saliva and one for plaque. For each, I’d use the formula ~ Inflammation_response + Timepoint + (1|Subject). This will give a coefficient for each of the response groups as well as the change between week 2 and the baseline and week 4 and the baseline. You could maybe also use the interaction term Inflammation_response:Timepoint if you were interested in how taxa associated with the change over time differed depending on the inflammation response.

Will