Hi,

I’m running Maaslin2 for my crossectional study using CLR normalization, filtering prevalence by 0.2, and using confounders and batch/run_ID as a fixed effect in the LM method. I have nice results and so on, but when I try to reproduce the same analysis out of Maaslin2, the results are in the same direction but p-values are less optimistic (and of course, also different coefficients and std. errors). I looked at Maaslin2 code and tried to follow the same methods, filtering by prevalence, and then doing the CLR transformation, and using the same formula generated by maaslin2 and even the same R packages/methods. Nevertheless, the results are different, quite a bit. So I’m afraid I’m missing something. Also, the results of my “out of maaslin2” analysis are quite more similar to ANCON2 for example. So my question is: am I missing some step maaslin2 is taking, for example, dealing with zeros even using the standard LM method?

Thanks in advance for any help on this,

Rafael

Hi @rcuadrat - thanks for flagging. I am not 100% sure what might be different in your `out of MaAsLin 2`

analysis steps but it looks like it would be easy to debug with a side-by-side comparison if you are willing to share a snippet of your code.

Hi,

first Im importing my data, and filtering:

library(tidyverse)

library(“phyloseq”)

ps <- readRDS(“myRDS.rds”)

GP = prune_samples(sample_sums(ps)>=1000, ps)

GP = filter_taxa(GP, function(x) sum(x > 1) > (0.2*length(x)), TRUE)

After, Im doing the CLR transformation on the already filtered-by-prevalence (20%) object:

ps_clr <- microbiome::transform(GP, “clr”)

Then I generate a data frame for the model merging with metadata, etc

otu<-t(data.frame(otu_table(ps_clr)))

ASVs<-colnames(otu)

df_for_analysis<-merge(otu,metadata,by=0)

df_for_analysis <-df_for_analysis %>% remove_rownames %>% column_to_rownames(var=“Row.names”)

So I wanted first to run out of maaslin2 because I wanted to have my outcome on Y, so I did different of maaslin2, a 2 step approach: first I ajusted my CLR transformed OTUs by batch and RunID and took the residuals:

allModels = lapply(ASVs, function(x){

lm(formula= paste0("`", x, "`

~ batch + Run_ID"),

data= df_for_analysis ,na.action = na.omit)})

#Name the list of models to the column name

names(allModels) = ASVs## Recuperate the columns of interest from the original dataframe

allResiduals = df_for_analysis %>% select(ASVs)

#Replace all values with residuals (ignore NA)

allResiduals[,-1] = sapply(2:ncol(allResiduals), function(x){

residuals = allResiduals[,x]

residuals[!is.na(residuals)] = allModels[[x-1]]$residuals

residuals

})allResiduals<-scale(allResiduals)

With the residuals, I ran the model for each OTU/ASV:

#Loop over them and create model for each

allModels_clr = lapply(ASVs, function(x){

glm(formula= paste0(“PWV_cf ~”,"`", x,"`

", “+ age + SEX + MAP + bmi”),

data= df_for_analysis %>% mutate_at(c(“age”, “MAP”,“bmi”), ~(scale(.) %>% as.vector)) ,family = ‘gaussian’)})

#Name the list of models to the column name

names(allModels_clr) = ASVs

res<-data.frame(estimate=sapply(allModels_clr,function(x){tidy(x)$estimate[[2]]}),

std.error=sapply(allModels_clr,function(x){tidy(x)$std.error[[2]]}),

p.value=sapply(allModels_clr,function(x){tidy(x)$p.value[[2]]}),

statistic=sapply(allModels_clr,function(x){tidy(x)$statistic[[2]]})

)

res$adj <-lapply(res,function(x) {p.adjust(as.vector(x), method = “BH”)})$p.value

I got very different results, so I check weather those differences were not due to the different modeling, so I did again the models but taking directly the CLR transformed ASVs:

#Loop over them and create model for each

allModels_clr = lapply(ASVs, function(x){

glm(formula= paste0("`", x,"`

","~ PWV_cf + age + SEX + MAP + bmi + batch + Run_ID"),

data= df_for_analysis %>% mutate_at(c(“age”, “MAP”,“bmi”), ~(scale(.) %>% as.vector)) ,family = ‘gaussian’)})

And I have very similar results with my previous modeling.

On Maaslin2 I ran like this:

fit_data_stif_MAP = Maaslin2(

input_data = data.frame(otu_table(ps)),

input_metadata = metadata,

output = “stif_clr_cf_MAP”,

fixed_effects = c(“PWV_cf”,“age”,“SEX”,“batch”,“Run_ID”,“MAP”,“bmi”),

min_prevalence = 0.2,

normalization = “CLR”,

transform = “NONE”

)

I know that the method used to CLR transform in Maaslin is not the same I used, but I did not expect to lead to a difference of one order of magnitude on the p-values and estimates/coefficients.

Anything I did is fundamentally different than what Maaslin2 is doing with the options I used?

Thanks for your support,

Rafael

Hi @rcuadrat - thanks for sharing the detailed code. I am not 100% sure both CLR transformations are doing the exact same thing. Would it be possible to do a quick check to ensure that these two CLR-transformed matrices are identical? Thanks!

Hi, sorry for the late response.

I don’t know how to get the CLR transformed matrix from Maaslin2. So I could not compare.

I will try again to use the same method/function of Maaslin2 to CLR transform in my out-of-maaslin2 model.

I just wanted to confirm one thing: if I understood the code correctly, maaslin2 is first filtering the OTUs by prevalence, and only after CLR transforming, right? Because of course, if it is doing the CLR transformed before the filtering, results would diverge, the geometric mean would be different.

Thanks for all your support,

Rafael