MaAsLin2 with non-clinical data and small samples

Aira · May 13, 2025, 11:07am

Hello,

I’m a student with a basic background in statistics and bioinformatics. Currently, I’m working on microbiome and metabolite data for my final project. I would like to ask advice on how to best set up MaAsLin for my analysis.

This is the experimental design:
We collected fecal samples from two donors and performed in vitro fermentation. Each samples received two different treatment. We measure microbiome composition, pathway abundance, and gene families at two time points (baseline and after treatment) and from two colon regions. There are no technical replicates, so we have a total of 16 samples.

I want to analyze the effect of treatment over time on the microbiome composition. We also want to perform correlation analysis between changes in microbial features (including pathway and gene families) and metabolite profiles later.

I would like to ask a very simple question: do you think using MaAsLin2 (or MaAsLin3) makes sense for this kind of paired, small-sample design? If so, what would you recommend using as fixed effects and random effects?

I’ve read some discussions on this forum but am still unsure if this is the right tool or if I should another tool instead, because this is what my supervisor suggest.

Thank you for the help.

WillNickols · May 13, 2025, 12:36pm

Hi,

MaAsLin 3 can definitely help analyze the differences between donors, treatments, time points, and colon regions. 16 samples with 4 covariates might be too few samples to get statistically significant results, but you should at least be able to get a sense of effect sizes and community shifts. I’d recommend the model ~ donor + treatment + time_point + colon_region (all fixed effects) since all your covariates have only 2 levels and you don’t have any clustering variables besides donor. This same model would work for metabolites, gene families, and pathways.

If you want to correlate two high-dimensional datasets (e.g. taxa vs. metabolites), HAllA might be the way to go.

Will

Aira · May 13, 2025, 1:23pm

Hi Will,

Thank you very much for your quick response and clear explanation.

I have one more question: I want to analyze whether the two treatments have different effects on microbial composition after 10 days of fermentation.

Do you think I should include an interaction term for time × treatment in the model? Or, given the small number of samples, would it be better to keep the model simple and avoid complicating the MaAsLin computation?

For context, my variables will be as followe:

time0_treatment1 (reference)
time10_treatment1
time0_treatment2
time10_treatment2

Thank you for your recommendation to use HAIIA, I will check it out.

WillNickols · May 13, 2025, 2:54pm

If you’re interested in different effects of the treatments after 10 days, that’s an interaction question, so you would need the interaction term. Still, given the sample size, I’d doubt there would be significant effects.

Aira · May 13, 2025, 5:19pm

True. As you suggest, I’ll be focused on the size effects and community shifts then.

Thank you for your advice. It’s really helpful!

Aira · May 14, 2025, 11:09am

Hi Will,

I have a couple of follow-up questions and would really appreciate your input.

Could you advise on how to set only the first variable as the reference level in MaAsLin3?

Currently, I’m doing the following to set reference levels:
metadata$time ← factor(metadata$time, levels = c(“0”, “10”))
metadata$treatment ← factor(metadata$treatment, levels = c(“treatment1”, “treatment2”))

However, when I include the formula time*treatment in the MaAsLin3 model, the results table only gives a value for “10_treatment2”, and not for “0_treatment1” or “10_treatment1”.
Should I create a new combined column for time + treatment in the metadata (as in MaAsLin2) to get full results? Or is there a more straightforward way to resolve it?

Also, in your first answer, you specifically mentioned that MaAsLin3 is definitely suitable for my experimental design. But what do you think about MaAsLin2—would it also be appropriate? I’ve been using MaAsLin2 for a while and am still finding MaAsLin3 a bit challenging to set up. So I’m wondering if it’s okay to continue with MaAsLin2 for this analysis, given our study design.

Thank you very much for your time. I really appreciate your help.

WillNickols · May 14, 2025, 11:58am

Regarding the reference level, you could do the following:

metadata$time_treatment <- paste0(metadata$time, "_", metadata$treatment)

(you should get 4 levels out) and then set the reference level accordingly.

MaAsLin 2 could also be used for any of this.

Will

Aira · May 14, 2025, 3:27pm

Hi Will,

Thank you very much for your input.
I’m now able to generate results for each variable.

I still have some follow-up questions:

How can I test for individual main effects if I manually add the interaction terms as you suggested?
I tried running both of these models:

Model 1 ~ time_treatment + time + treatment + colon + donor
Model 2 ~ time_treatment + colon + donor

But they yield the same results. I don’t see any output for the individual main effects of time or treatment, even when I include them in the full model.

I tried to run both maaslin2 and maaslin3 with same set up of normalisation = “TSS” and transformation = “LOG” . However, I get a lot of warning messages in MaAsLin3 similar to this:

WARNING::Fitting problem for feature 334 returning NA

But when I run the maaslin2, I don’t get any warning messages at all. Is it normal?

Lastly, may I know how does MaAsLin3 handle zero values when normalization = "NONE" and transformation = “LOG”?
Should I add a pseudocount manually before running the analysis, or will MaAsLin3 automatically handle or skip zeros in log transformation?

Many thanks for your help.

WillNickols · May 15, 2025, 3:14am

Because time + treatment are fully collinear with time_treatment (i.e., knowing time_treatment determines time and treatment), R will automatically drop the redundant columns when fitting the linear model. The “main effects” of time and treatment would be the time[baseline]_treatment[non-baseline] and time[non-baseline]_treatment[baseline].
For MaAsLin 2, check that you’re actually getting no errors in the log since they’re not displayed in the results table. Assuming you’re actually getting no errors, it’s probably because of how the models function differently. When we split the abundance and prevalence in MaAsLin 3, we get more precise association types, but the abundance models will have fewer data points since they only use non-zeros (possibly resulting in too few data points for the model to fit), and the logistic prevalence models can have a harder time converging. With that said, it’s good to check the features that errored, but usually the errors are from there being very few samples with the feature present or too many terms in the model. In either of these cases, I wouldn’t trust a model that fit without errors too much anyways.
MaAsLin 3 automatically splits the zeros from the non-zeros and uses the non-zeros in the abundance model and the zeros vs. non-zeros in the prevalence model.

Aira · May 21, 2025, 11:59am

Hi Will,

Thank you very much for your clear explanation.
it’s been really helpful for moving forward with my analysis.

I have another follow-up question regarding the interpretation of MaAsLin2 results.

In the end, I tested the effects of time and treatment independently. Here’s the script I used:

maaslin2_species_pc ← Maaslin2(
input_data = species_pc_matrix,
input_metadata = meta_pc,
output = “species_pc_main”,
min_prevalence = 0,
normalization = “NONE”,
transform = “LOG”,
fixed_effects = c(“time”, “treatment”),
reference = c(“time,0”, "treatment,diet1),
analysis_method = “LM”,
max_significance = 0.05,
standardize = FALSE,
correction = “BH”,
save_models = TRUE
)

I then subset the results by time and treatment to create separate volcano plots. However, I’m still unsure how to interpret the results accurately.

For example, here’s a snapshot of my results:

Does this mean that Klebsiella and Enterobacter were enriched at time point 10 compared to time point 0, regardless of treatment? Or does the result also take into account the diet 2 vs. diet 1 comparison specified in the MaAsLin2 setup—meaning that Klebsiella and Enterobacter were relatively enriched at time 10 compared to time 0 specifically within the diet 2 group?

I’m sorry, I’m having a bit of trouble phrasing the question clearly. Please let me know if anything is unclear.

Thank you very much!

WillNickols · May 21, 2025, 12:30pm

Based on the model you’ve used with fixed effects for time and treatment but no interaction between the two, you’ve assumed that the change with time is the same for the two treatment groups and that the change between treated and non-treated is the same at the two time points. (It’s not “wrong” to make this assumption - I’m just making sure we’re clear on the model.) If you wanted to look for features that were enriched at a time specifically in one group, you would need an interaction term.

Aira · May 27, 2025, 9:28am

Hi Will,

Thank you for your explanation, that makes things clearer!
We’re indeed assuming that the two treatments do not have significantly different effects on microbiome composition.

However, I’m still a bit unsure about one thing. I want to test only the effect of time, so I ran MaAsLin2 with time as the only fixed effect. But the results are different compared to when I include both time and treatment as fixed effects (without interaction) and then filter the results for time only.

Based on your explanation, I expected the results to be the same in both cases. Would you mind explaining how MaAsLin2 processes the data in these two scenarios?

WillNickols · May 28, 2025, 1:20pm

This is just the difference between controlling for something and not controlling for something. When you only have time as a fixed effect, you’re comparing all the samples at one time to all the samples at the other time. When you have both time and treatment, the time coefficient you get is the difference based on time after controlling for treatment. Even if time and treatment were independent in the way you set up the experiment, if there’s any non-zero correlation (which there almost certainly will be, just by chance), you’ll end up with different coefficients between the models.

Topic		Replies	Views
Paired Samples with Two Groups? MaAsLin	13	1768	February 17, 2023
MaAsLin 2 for longitudinal analysis with different individuals MaAsLin	6	670	October 25, 2023
MaAsLin2 for other type of data MaAsLin	1	538	September 6, 2022
How to interprete Maaslin output? MaAsLin	3	2005	May 23, 2021
Using MaAsLin2 for testing microbiome predictive value MaAsLin	5	799	January 29, 2021

MaAsLin2 with non-clinical data and small samples

Related topics