Formula to test multiple variables

Hi, I’m new to MaAsLin2 and am looking for some advice on setting up the formula for assessing multiple variables and also ensuring I’m reading my results correctly.

I’ve working with normalized (relative abundance) microbiome data and a series of environmental variables related to water chemistry. I’m looking to see if any taxa are significantly correlated with any of net change (end-start of experiment concentration) of the chemistry variables. There are two treatments (Control and Sponge) and 4 independent samples of each treatment taken from the same time point (so not longitudinal).

I’ve started by assessing each variable in an independent MaAsLIn test because though they’re taken from the same water sample I don’t know that they can be considered dependent, but if you have other thoughts I would love to hear those. So the chemistry variable net change is a fixed effect (some are continuous and others only have 2 discrete values so MaAsLin considers them categorical despite the fact that they are numerical) and I’ve also made treatment a fixed effect with ‘Control’ as the reference value. Does this seem reasonable?

Anthranilate_fit_data = Maaslin2(input_data = input_data, 
                    input_metadata = df_input_metadata, 
                    min_prevalence = 0,
                    normalization  = "NONE",
                    transform = "None", 
                    analysis_method = "LM",
                    output         = "anthranilate_netchange_Output", 
                    fixed_effects  = c("anthranilatenetchange", "Treatment"),
                    reference = c("Treatment, Control"),
                    standardize = FALSE,
                    correction = "BH",
                    max_significance = 0.25, #0.25 is via developer default
                    heatmap = TRUE)

I’m then obtaining output (abbreviated version) that looks like this:

feature	metadata	value	coef	stderr	N	N.not.0	pval	qval
ASV20	Treatment	SpongeExometabolome	19.7456116	2.255907585	8	5	0.000322598	0.034840569
ASV42	anthranilatenetchange	anthranilatenetchange	233502.8118	31318.6195	8	4	0.000684843	0.036981521

Am I correct in interpreting this as ASV20 is significantly positively correlated to treatment, which in this case means its more abundant in the sponge treatment. Meanwhile, ASV42 is significantly correlated with Anthranilate net change and in this case positively as well, based on the coef?

Lastly, another question regarding interpretation of these results. When I use this same code, but leave out treatment as a fixed effect, there are always many more significant ASVs, but my thinking was that I need to account for group effects (i.e., treatment). So in the above results ASV42 is correlated with Anthranilate net change AFTER accounting for the effect of treatment? Is that accurate?


Hi Alicia,

Thanks for using our software!

To answer your questions:

If you want Maaslin2 to treat a variable as numeric you need to make sure that its of that type in your R code.

Your interpretation of the resulting Maaslin2 table is correct !

As you had mentioned in your question. Based on my quick understanding of your experiment it would be important to include treatment as a fixed effect to control for it when examining other variables you are interested in. Your interpretation of the result is correct that the result is significant after accounting for the impact of treatment on your samples.

Hope that helps
Jacob Nearing

Hi @nearinj,

Thanks for your quick response. As far as I can tell R is reading the variables as numeric, when I look at the data frame they are not listed as character. But since there are only two values across the treatments, MaAsLin2 seems to be reading them as categorical rather than continuous and is making box plots instead of line plots. Is that normal?

Great! I’m glad the interpretation of the results was correct and thanks for your quick reassurance on formula. I thought I was doing that correctly, but wanted to be sure.


Hello @Alicia ,

If you only have two numeric values MaAsLin will produce a box-lot for those values. The coefficients of the models produced for those variables should, however, still take into account their numeric nature.