Different results with random effect from Maaslin2 vs Maaslin3

We seem to get very different results using a random effect in Maaslin2 vs Maaslin3. In Maaslin2, it had absolutely no effect on the results. In Maaslin3, it caused most of the differences observed for fixed effects to disappear - not just lower P values but pretty much no differences seen on the Maaslin3 summary graph (versus what we see without the random effect). What might be the reason? Please see code for both below - this is our first time using Maaslin, so we might be doing something wrong.

In this data, we analyze piglet microbiotas and try to use their dams (sows) as a random effect.

ps_analy$exp_group <- factor(ps_analy$exp_group, levels = c('C', 'P', 'T'))
ps_analy$sex <- factor(ps_analy$sex, levels = c('M', 'F'))
ps_analy$sow <- as.factor(ps_analy$sow)

mas_1 <- Maaslin2(input_data = data.frame(otu_table(ps.taxa.rel)),
                  input_metadata = data.frame(sample_data(ps_analy)),
                  output = "maaslin2_piglets",
                  min_abundance = 0.0001,
                  min_prevalence = 0.1,
                  normalization = "TSS",
                  transform = "LOG",
                  analysis_method = "LM",
                  max_significance = 0.1,
                  fixed_effects = c("exp_group","sex","reads"),
                  random_effects = c("ps_analy$sow"),
                  reference = c("exp_group,C"),
                  correction = "BH",
                  standardize = FALSE,
                  cores = 1)

fit_out <- maaslin3(input_data = data.frame(otu_table(ps.taxa.rel)),
                    input_metadata = data.frame(sample_data(ps_analy)),
                    output = 'maaslin3_piglets',
                  min_abundance = 0.0001,
                  min_prevalence = 0.1,
                    formula = '~exp_group+sex+reads+(1|sow)',
                    normalization = 'TSS',
                    transform = 'LOG',
                    augment = TRUE,
                    standardize = TRUE,
                    max_significance = 0.1,
                    median_comparison_abundance = TRUE,
                    median_comparison_prevalence = FALSE,
                    max_pngs = 100,
                    cores = 1,
                    save_models = TRUE)

Hi!

In our evaluations of MaAsLin 3 vs. MaAsLin 2, we’ve found that MaAsLin 3 often improves the precision (proportion of significant associations that are correct), particularly on random effects models. However, this means that not all things that were called as significant before are still called as significant. Part of the effect might be attributable to this.

Besides that, a few things to look at:

  1. We recommend that MaAsLin 3 be run without abundance and prevalence filtering since the prevalence associations (zero vs. non-zero) can be most noticeable on low-prevalence features. This might have an effect.
  2. Would you be able to provide the diagnostic plots from MaAsLin 2 or MaAsLin 3 for the associations that were significant but become insignificant? Also, if you could provide a snippet of the results for some of those associations, that might help us diagnose the problem.

Will

Thanks Will! By diagnostic plots, do you mean the association plots for individual bacterial taxa? I’ll need to re-run the analyses with lower q value threshold as they disappear totally with the random effect so I don’t get the plots to compare (in MaAsLin 3).

I realized one potential problem in our analysis: there were only 4-6 piglets per sow (= a level of the categorical random effect variable). The MaAsLin 3 manual recommends at least 10X the number of samples per each continuous variable AND each level of categorical variable. If this holds for the random effect as well, then we will unavoidably have too few samples, right?

I still don’t understand why the same random effect doesn’t seem to do ANYTHING in MaAsLin 2. I mean, the p values etc. are identical with and without it. What might be the reason for that?

Yes - for the diagnostic plots, I mean for the individual taxa. The MaAsLin 2 ones would work if you have them, but setting the q-value threshold high would be necessary to get them for MaAsLin 3.

Ideally, we recommend 10x samples per fixed effect and 5 samples per level of the random effect. In our testing, we found that if you have 2-3 samples per level of the random effect, the logistic piece that’s new to MaAsLin 3 can start to give results that look very significant but are highly constrained model fits (essentially fitting failures). Still, we recognize that these high numbers of samples are usually hard to acquire, and the models should work with fewer (do check your diagnostic plots though).

When you use the random effects on MaAsLin 2, are you calling them with ps_analy$sow? If you haven’t already, it might be worth creating a stand-alone metadata table and then just putting in the column name as the random effect. It’s possible that in trying to parse the $-sign, something has gone wrong.

Thanks again Will!

Removing the $ (by using just the column name) indeed fixed the Maaslin2 code, and now the impact of the random effect is more similar in the two versions.

Question: in our case, the random effect does not generate erraneous significant results, but on the contrary, most of the significant differences between experimental groups disappear. Can this also be because there are too many categorical levels (vs. number of samples)? It also shows several exclamations about fitting failures. Or, because in our case the random effect is linked to the experimental groups (in the sense that each sow belonged to one of the experimental groups, so these are not independent from each other)?

Glad to hear that fix worked!

Relative to a model without the random effect, a model with the random effect is expected to reduce the significance since the random effect is accounting for correlated measurements. (But this is the right thing to do!) Since the experimental groups are perfectly collinear with the random effect in your case, you’ll need to include the grouping as a random effect rather than e.g. a fixed effect for the model to even fit. (This is the same situation as in longitudinal data when a diagnosis does not change over time.) However, this does reduce power compared to if you had the same sample size and no grouping. More experimental groups with the same total sample size will also tend to reduce power.

Thanks again! I have one (?) more question related to the analysis parameters:
I don’t fully understand how to decide whether to set median_comparison_abundance TRUE or FALSE. The manual says it should be FALSE when “testing for significant relative associations”. In most cases, we are dealing with essentially relative data (relative abundances) because the actual absolute abundances are not really known. So should we set it FALSE, or does the “relative” refer to something else here?

“Relative” refers to the same thing you’re describing - you have that right. In the last few years, there’s been quite a bit of statistical work about trying to infer absolute abundance changes from relative abundance data given a variety of assumptions. Here, if you assume that fewer than half of the features are changing in absolute abundance and you have a large enough sample size and number of features (>100 samples and >200 features should probably be enough), the test against the median is equivalent to a test on the absolute scale. We hope to get a pre-print out soon that should explain this in more detail.

If you don’t want to worry about any of this and just test relative abundances, you can just set median_comparison_abundance to FALSE.

Thanks for the clear explanation (and sorry for the double posting; I thought it was a bit off this topic). Will be looking forward to reading your preprint - thinking about the practical implementation of the wet lab, I’m still not completely sure what the ”absolute abundance” actually really means :slight_smile: But for now, it’s very well sufficient to know that it’s OK to set this parameter FALSE.