Hi,
I’m running Maaslin2 for my crossectional study using CLR normalization, filtering prevalence by 0.2, and using confounders and batch/run_ID as a fixed effect in the LM method. I have nice results and so on, but when I try to reproduce the same analysis out of Maaslin2, the results are in the same direction but p-values are less optimistic (and of course, also different coefficients and std. errors). I looked at Maaslin2 code and tried to follow the same methods, filtering by prevalence, and then doing the CLR transformation, and using the same formula generated by maaslin2 and even the same R packages/methods. Nevertheless, the results are different, quite a bit. So I’m afraid I’m missing something. Also, the results of my “out of maaslin2” analysis are quite more similar to ANCON2 for example. So my question is: am I missing some step maaslin2 is taking, for example, dealing with zeros even using the standard LM method?
Thanks in advance for any help on this,
Rafael
Hi @rcuadrat - thanks for flagging. I am not 100% sure what might be different in your out of MaAsLin 2
analysis steps but it looks like it would be easy to debug with a side-by-side comparison if you are willing to share a snippet of your code.
Hi,
first Im importing my data, and filtering:
library(tidyverse)
library(“phyloseq”)
ps ← readRDS(“myRDS.rds”)
GP = prune_samples(sample_sums(ps)>=1000, ps)
GP = filter_taxa(GP, function(x) sum(x > 1) > (0.2*length(x)), TRUE)
After, Im doing the CLR transformation on the already filtered-by-prevalence (20%) object:
ps_clr ← microbiome::transform(GP, “clr”)
Then I generate a data frame for the model merging with metadata, etc
otu<-t(data.frame(otu_table(ps_clr)))
ASVs<-colnames(otu)
df_for_analysis<-merge(otu,metadata,by=0)
df_for_analysis <-df_for_analysis %>% remove_rownames %>% column_to_rownames(var=“Row.names”)
So I wanted first to run out of maaslin2 because I wanted to have my outcome on Y, so I did different of maaslin2, a 2 step approach: first I ajusted my CLR transformed OTUs by batch and RunID and took the residuals:
allModels = lapply(ASVs, function(x){
lm(formula= paste0(“", x, "
~ batch + Run_ID”),
data= df_for_analysis ,na.action = na.omit)})
#Name the list of models to the column name
names(allModels) = ASVsRecuperate the columns of interest from the original dataframe
allResiduals = df_for_analysis %>% select(ASVs)
#Replace all values with residuals (ignore NA)
allResiduals[,-1] = sapply(2:ncol(allResiduals), function(x){
residuals = allResiduals[,x]
residuals[!is.na(residuals)] = allModels[[x-1]]$residuals
residuals
})allResiduals<-scale(allResiduals)
With the residuals, I ran the model for each OTU/ASV:
#Loop over them and create model for each
allModels_clr = lapply(ASVs, function(x){
glm(formula= paste0(“PWV_cf ~”,“", x,"
”, “+ age + SEX + MAP + bmi”),
data= df_for_analysis %>% mutate_at(c(“age”, “MAP”,“bmi”), ~(scale(.) %>% as.vector)) ,family = ‘gaussian’)})
#Name the list of models to the column name
names(allModels_clr) = ASVs
res<-data.frame(estimate=sapply(allModels_clr,function(x){tidy(x)$estimate[[2]]}),
std.error=sapply(allModels_clr,function(x){tidy(x)$std.error[[2]]}),
p.value=sapply(allModels_clr,function(x){tidy(x)$p.value[[2]]}),
statistic=sapply(allModels_clr,function(x){tidy(x)$statistic[[2]]})
)
res$adj <-lapply(res,function(x) {p.adjust(as.vector(x), method = “BH”)})$p.value
I got very different results, so I check weather those differences were not due to the different modeling, so I did again the models but taking directly the CLR transformed ASVs:
#Loop over them and create model for each
allModels_clr = lapply(ASVs, function(x){
glm(formula= paste0(“", x,"
”,“~ PWV_cf + age + SEX + MAP + bmi + batch + Run_ID”),
data= df_for_analysis %>% mutate_at(c(“age”, “MAP”,“bmi”), ~(scale(.) %>% as.vector)) ,family = ‘gaussian’)})
And I have very similar results with my previous modeling.
On Maaslin2 I ran like this:
fit_data_stif_MAP = Maaslin2(
input_data = data.frame(otu_table(ps)),
input_metadata = metadata,
output = “stif_clr_cf_MAP”,
fixed_effects = c(“PWV_cf”,“age”,“SEX”,“batch”,“Run_ID”,“MAP”,“bmi”),
min_prevalence = 0.2,
normalization = “CLR”,
transform = “NONE”
)
I know that the method used to CLR transform in Maaslin is not the same I used, but I did not expect to lead to a difference of one order of magnitude on the p-values and estimates/coefficients.
Anything I did is fundamentally different than what Maaslin2 is doing with the options I used?
Thanks for your support,
Rafael
Hi @rcuadrat - thanks for sharing the detailed code. I am not 100% sure both CLR transformations are doing the exact same thing. Would it be possible to do a quick check to ensure that these two CLR-transformed matrices are identical? Thanks!
Hi, sorry for the late response.
I don’t know how to get the CLR transformed matrix from Maaslin2. So I could not compare.
I will try again to use the same method/function of Maaslin2 to CLR transform in my out-of-maaslin2 model.
I just wanted to confirm one thing: if I understood the code correctly, maaslin2 is first filtering the OTUs by prevalence, and only after CLR transforming, right? Because of course, if it is doing the CLR transformed before the filtering, results would diverge, the geometric mean would be different.
Thanks for all your support,
Rafael
Hi, I’m back here after sometime.
I still wonder if Maaslin2 is doing the filtering (removing ASVs/OTUs) before or after CLR transformation. Can anyone clarify this?
Thank you in advance!