I am extremely new to bioinformatics and statistics in general but I am trying to understand and analyze gut microbiome data associations with masaalin 2. However, I don’t understand which model to use with my outtable. I have transcript per million data and .clr data format for my out table. Now, as per the instructions it seems, I can use both LM and CPLM. however, my data has a lot of zeros. so, should I use ZINB? how and what to use has been extremely confusing to me. I read the notes but still am struggling. Also, if anyone could recommend me some tutorials to understand the difference between these modelling techniques, I would be grateful

To answer your question, there are many different parameters and models that can be used for microbiome data and depending on the context those models can give different results. However, based on simulations from our group, Maaslin2’s default settings perform well in terms of specificity and sensitivity when dealing with relative abundance data (Multivariable association discovery in population-scale meta-omics studies).

One thing to note is that if your data is already in CLR abundance you should avoid using any normalizations/transformations in Maaslin2 as your data has already undergone this. If you’re looking for some comments on the differences between various normalizations etc. you could check out this paper that I previously published: Microbiome differential abundance methods produce different results across 38 datasets | Nature Communications

