Why do coefs differ from raw glm/lmer, and why do they depend on number of input features?

Hi All
If I got it right, Maaslin2 builds a model per each microbial feature, independently of other microbial features (that is also why BH correction can be applied). But I notice that for the SAME bacterial feature, different coefficients are derived, if using a smaller input table with fewer features (subsetting only features, not samples). Taking a simple model as an example, and showing the coeffcieints for Escherichia:

Maaslin2(dat2,key2,min_abundance=0.005,fixed_effects = c(“Diagnosis”),paste0(path,“temp”))
coef for Escherichia: -0.0296

minidat2<-dat2[ ,grep(“Proteobacteria|Clostridia”,colnames(dat2))]

Maaslin2(minidat2,key2,min_abundance=0.005,fixed_effects = c(“Diagnosis”),paste0(path,“temp1”))
coef for Escherichia: -0.041

std. error and p value changes as well. Note that there is no change in the input data for Escherichia itself, only in teh number of features supplied along with it.

And a question perhaps related - all the coefficients reported by Maaslin2 are very different from those reported from glm (in this case) or lmer (for random effects models). Why? Example , again on Escherichia (log10 transformed):

gives: -2.5237 as coeffcient

Thanks !


MaAsLin2 by default normalizes input data to relative abundance before running the regressions (normalization = "TSS"), which would explain what you’re seeing. Could you try setting normalization = "none"and see if results are as expected?


Absolutely right, thank you!
Why are teh coefficients, std errors and p values Masslin outputs so different from wahts outputted by glm/lmer?

I suspect it’s the same reason? When using MaAsLin2 the abundance of that feature is being normalized whereas with glm/lmer it’s the raw values.

I also normalize and log-transform the data before regressing. But now that you mention it, if I use the arcsine sqrt transform, or no transform at all, the values from Maaslin2 are simlar to my regression. It seems as if somehow I am not managing to get MAsslin2 to actually run the log transformation. I will look into it next week more throughly, Thanks for teh quick answers!

OK -I am officially stumped. Whether I run Masslin2 with transform=“LOG”, or with no transform specified (default should be log) - the output I get - coefs, errors, pvals, boxplots - is very similar to what I get if specifying transform=“NONE” ( small differences between these 2 are probably a just due to zero imputation before log transform). Example, using Blautia, just for variety (but its the same for all features):
gives, for Blautia, a coefficient of 0.064
0.064 is the actual difference between Blautia relative abundance in the Healthy group relative to the non-healthy group, and therefore is also the coeffcient for Diagnosis:Healthy when regressing the non-transformed data, with:
summary(glm(Blautia~Diagnosis ,data=key2))

If regressing log data, I would except the coefficent to be the difference in the log-transformed abundances - for log10, in this example, this would be 1.12. The boxplots outputted by MAsslin are also of the raw, not log-transformed , data.

BTW, if specifying transform=“AST”, Maaslin output does look transformed - both in the boxplots, and in the tabular output (coefficients etc. are identical to what you get when transforming the data with arcsin sqrt and then regressing with glm).

Hi -

Could I ask which ver. of Maaslin2 and Biocondcutor you’re using? In the older Bioc3.10 ver., I think log transformation was actually set to log(x + 1). The 1 pseudo count was to avoid taking log of zeros. This makes sense for count data but would leave small relative abundance data barely changed (log(1 + x) ~ x when x is close to zero).

In the new Maaslin2 1.2.0 which got released with Bioc3.11 only recently (https://bioconductor.org/packages/release/bioc/html/Maaslin2.html), I believe the developers changed this behavior to log(x + half minimum non-zero value), which should be closer to the log transformation you’re experimenting with.

Sure. I am running Maaslin2_1.0.0 , installed with 3.10 (and congrats on teh new release;-))
But are you sure about the pseudocount? Firstly, it does not seem a reasonable behaviour, considering a need for input absolute abundances is not specified anywhere - even the tutorial demo files are in relative abundance. Secondly, even if log(x+1) was used for transformation, that should still affect some change in the coefficient estimate and errors - I should not be getting the exact same values as for no transformation, down to the third decimal point.

Hi -
Did you make any progress on this? Sorry the log(x+1) difference is the only reason I can think of. FWIW, you can check your Maaslin2’s log transformation behavior by inspecting Maaslin2:::LOG. I’ll forward this to the developers as well.

Hi Siyuan
Thanks for all your tips :-). Yes , you were right - my version does run log(x+1) as the transformation. And you are right that this does explain what I am seeing - that running Maaslin2 with the default parameters, or with “LOG” transform specified, is practically equal to running it with no transformation at all (I didnt expect it to be exactly teh same, in my previous post - but was forgetting that R by default takes log on base 2, not base 10).
But…it seems to me this should be remedied in all the versions, not just the newest one. Maaslin2 has been used for years, and most people will not upgrade immediatly - especially when they are not aware of the need, and when the new version seems require R-4.0.0. Besides, the new version will perform differently then the old one and people should know why.