Guide for normalization/transformation for Maaslin2 Input

Hi all - I have some taxa counts I want to run through Maaslin2. Is there a guide for best practicing in normalizing and transforming my data?

Specifically, should I be inputing the raw counts matrix into Maalin2, or is it better to do some sort of relative-abundance (TSS) normalization before? Or should I just the “CSS” transformation in the Maalin2 formula? Or is CSS or CLR recommended? When should I use a Log transformation?

I’d appreciate any sort of information on which options to choose. Thank you!

Here is the code I am currently playing with:

fit.data <- Maaslin2(df , annotations, 
                     fixed_effects = c("condition"), #This is what you are trying to learn on
                     random_effects = c( "age", "sex"), #vars you are CORRECTING for
                     plot_heatmap = T, 
                     plot_scatter = T,
                     cores = 6,
                     output = "./maaslin_test", 
                     min_abundance = 0,
                     min_prevalence = 0, 
                     min_variance = 0,
                     transform  = "NONE", 
                     analysis_method = "LM",
                     normalization = "CSS", 
                     standardize = TRUE)

1 Like

hi @kreigema ,

I would suggest you take a look at the maaslin2 paper (Multivariable association discovery in population-scale meta-omics studies) where the team looked at the performance of a number of different configurations.

In general if you pass the raw counts into Maaslin2 the defaults of TSS normalization and log transformation should work well for taxonomic and functional data.

Hope that is helpful,
Cheers,
Jacob Nearing

1 Like