Different results running GLM and Maaslin2 using same methods/transformations

rcuadrat · November 23, 2020, 7:05pm

Hi,
I’m running Maaslin2 for my crossectional study using CLR normalization, filtering prevalence by 0.2, and using confounders and batch/run_ID as a fixed effect in the LM method. I have nice results and so on, but when I try to reproduce the same analysis out of Maaslin2, the results are in the same direction but p-values are less optimistic (and of course, also different coefficients and std. errors). I looked at Maaslin2 code and tried to follow the same methods, filtering by prevalence, and then doing the CLR transformation, and using the same formula generated by maaslin2 and even the same R packages/methods. Nevertheless, the results are different, quite a bit. So I’m afraid I’m missing something. Also, the results of my “out of maaslin2” analysis are quite more similar to ANCON2 for example. So my question is: am I missing some step maaslin2 is taking, for example, dealing with zeros even using the standard LM method?
Thanks in advance for any help on this,
Rafael

himel.mallick · November 30, 2020, 11:23pm

Hi @rcuadrat - thanks for flagging. I am not 100% sure what might be different in your out of MaAsLin 2 analysis steps but it looks like it would be easy to debug with a side-by-side comparison if you are willing to share a snippet of your code.

rcuadrat · December 1, 2020, 3:19pm

Hi,
first Im importing my data, and filtering:

library(tidyverse)
library(“phyloseq”)
ps ← readRDS(“myRDS.rds”)
GP = prune_samples(sample_sums(ps)>=1000, ps)
GP = filter_taxa(GP, function(x) sum(x > 1) > (0.2*length(x)), TRUE)

After, Im doing the CLR transformation on the already filtered-by-prevalence (20%) object:

ps_clr ← microbiome::transform(GP, “clr”)

Then I generate a data frame for the model merging with metadata, etc

otu<-t(data.frame(otu_table(ps_clr)))

ASVs<-colnames(otu)

df_for_analysis<-merge(otu,metadata,by=0)
df_for_analysis <-df_for_analysis %>% remove_rownames %>% column_to_rownames(var=“Row.names”)

So I wanted first to run out of maaslin2 because I wanted to have my outcome on Y, so I did different of maaslin2, a 2 step approach: first I ajusted my CLR transformed OTUs by batch and RunID and took the residuals:

allModels = lapply(ASVs, function(x){
lm(formula= paste0(“", x, " ~ batch + Run_ID”),
data= df_for_analysis ,na.action = na.omit)

})

#Name the list of models to the column name
names(allModels) = ASVs

Recuperate the columns of interest from the original dataframe

allResiduals = df_for_analysis %>% select(ASVs)

#Replace all values with residuals (ignore NA)
allResiduals[,-1] = sapply(2:ncol(allResiduals), function(x){
residuals = allResiduals[,x]
residuals[!is.na(residuals)] = allModels[[x-1]]$residuals
residuals
})

allResiduals<-scale(allResiduals)

With the residuals, I ran the model for each OTU/ASV:

#Loop over them and create model for each
allModels_clr = lapply(ASVs, function(x){
glm(formula= paste0(“PWV_cf ~”,“", x,"”, “+ age + SEX + MAP + bmi”),
data= df_for_analysis %>% mutate_at(c(“age”, “MAP”,“bmi”), ~(scale(.) %>% as.vector)) ,family = ‘gaussian’)

})

#Name the list of models to the column name
names(allModels_clr) = ASVs
res<-data.frame(estimate=sapply(allModels_clr,function(x){tidy(x)$estimate[[2]]}),
std.error=sapply(allModels_clr,function(x){tidy(x)$std.error[[2]]}),
p.value=sapply(allModels_clr,function(x){tidy(x)$p.value[[2]]}),
statistic=sapply(allModels_clr,function(x){tidy(x)$statistic[[2]]})
)
res$adj <-lapply(res,function(x) {p.adjust(as.vector(x), method = “BH”)})$p.value

I got very different results, so I check weather those differences were not due to the different modeling, so I did again the models but taking directly the CLR transformed ASVs:

#Loop over them and create model for each
allModels_clr = lapply(ASVs, function(x){
glm(formula= paste0(“", x,"”,“~ PWV_cf + age + SEX + MAP + bmi + batch + Run_ID”),
data= df_for_analysis %>% mutate_at(c(“age”, “MAP”,“bmi”), ~(scale(.) %>% as.vector)) ,family = ‘gaussian’)

})

And I have very similar results with my previous modeling.

On Maaslin2 I ran like this:

fit_data_stif_MAP = Maaslin2(
input_data = data.frame(otu_table(ps)),
input_metadata = metadata,
output = “stif_clr_cf_MAP”,
fixed_effects = c(“PWV_cf”,“age”,“SEX”,“batch”,“Run_ID”,“MAP”,“bmi”),
min_prevalence = 0.2,
normalization = “CLR”,
transform = “NONE”
)
I know that the method used to CLR transform in Maaslin is not the same I used, but I did not expect to lead to a difference of one order of magnitude on the p-values and estimates/coefficients.

Anything I did is fundamentally different than what Maaslin2 is doing with the options I used?
Thanks for your support,
Rafael

himel.mallick · December 1, 2020, 5:05pm

Hi @rcuadrat - thanks for sharing the detailed code. I am not 100% sure both CLR transformations are doing the exact same thing. Would it be possible to do a quick check to ensure that these two CLR-transformed matrices are identical? Thanks!

rcuadrat · December 17, 2020, 12:57pm

Hi, sorry for the late response.
I don’t know how to get the CLR transformed matrix from Maaslin2. So I could not compare.
I will try again to use the same method/function of Maaslin2 to CLR transform in my out-of-maaslin2 model.
I just wanted to confirm one thing: if I understood the code correctly, maaslin2 is first filtering the OTUs by prevalence, and only after CLR transforming, right? Because of course, if it is doing the CLR transformed before the filtering, results would diverge, the geometric mean would be different.
Thanks for all your support,
Rafael

rcuadrat · July 12, 2022, 11:33am

Hi, I’m back here after sometime.
I still wonder if Maaslin2 is doing the filtering (removing ASVs/OTUs) before or after CLR transformation. Can anyone clarify this?
Thank you in advance!

couldbewoese · December 12, 2024, 6:30pm

Revisiting this forum because I similarly wanted to compare clr methods (microbiome::transform, zCompositions, and the Maaslin2 built-in CLRnorm).

Were you able to find a way to access the Maaslin2-processed counts table @rcuadrat ?

Thanks!

Topic		Replies	Views
Why do coefs differ from raw glm/lmer, and why do they depend on number of input features? MaAsLin	9	2054	May 31, 2020
Using Maaslin2 with centered log ratio transformation MaAsLin	2	1356	July 12, 2021
MaAsLin2 CLR transformation differing results MaAsLin	1	93	December 18, 2024
Recommanded parameters when using Maaslin2 for Metagenomics data ranging from 0 to 1 MaAsLin	0	341	May 8, 2022
MaAsLin2 working with TSS but having factor issue with CLR MaAsLin	1	597	October 19, 2022

Different results running GLM and Maaslin2 using same methods/transformations

Recuperate the columns of interest from the original dataframe

Related topics