The bioBakery help forum

Maaslin2 handling of covariates


First, Maaslin2 is awesome. Thank you for this wonderful tool!

I am running an analysis with one predictor of interest and 3 categorical covariates. I’ve run maaslin2 (lm analysis method, min prevalence and abundance thresholds to filter out rare features), and the output shows pairwise comparisons for the covariates as well as the predictor of interest. Is there a way to make maaslin2 only perform comparisons for the predictor of interest and not the covariates (to reduce the number of comparisons performed) while still including covariates in the input so that they can be controlled for?

Thank you!

Hi @fquerdasi - one easy way to do that is as follows:

  1. Let’s say you have saved the MaAsLin 2 results in R as follows:
fit_data<-Maaslin2(input_data, input_metadata,...) # Your MaAsLin 2 run
maaslin2_all_results<-fit_data$results # Save results table
  1. Assuming that your primary predictor of interest is named MAIN, you can simply subset the results table as:
maaslin2_results<-maaslin2_all_results %>% filter(metadata == 'MAIN') # Discard covariate associations
  1. Once you have done that, you can simply re-calculate the q-values as follows:
maaslin2_results$qval<-p.adjust(maaslin2_results$pval, method = 'BH') # FDR correction using 'BH'

Hope this helps. Let me know if anything is unclear.

Hi @himel.mallick,

Thank you for the detailed explanation! This method works great.


Hello @himel.mallick ,

thanks a lot for your nice explenation in the other thread.

Regarding q-value multiple comparisons of my covariate (sequencing_depth, treatment) and my variate of interest (phenotype), I also followed the steps outlined above.

I am wondering about the results: In the original output files generated by MaAslin2, a total of 88 associations of microbes with the phenotype are found, from which 78 are significant (q-value threshold = 0.25, which is the MaAslin2 default). I am not a statistician, but I would guess this is a quite high number of significant associations?

After recalculating the q-values only for my variate of interest, if taking a q-value threshold of 0.25 as a basis, 81 associations would be significant.

Is there any rule of thumb, where one could estimate appropriate q-value thresholds?

Maybe to explain my case I am working on: Skin microbiome, 65 samples, total of 490 different species, I used min_prevalence = 0.25, min_abundance = 0.0001

Hi @plicht - an appropriate q-value threshold is highly context-dependent and I’m not aware of whether the optimal threshold can be estimated on a per dataset basis. The usual recommendation is that you should always use a combination of effect size estimate (i.e., strong vs. weak association), data distribution, and domain knowledge to call out a 'statistically significant' association as 'biologically relevant'.

As @Kelsey_Thompson mentioned elsewhere in a different context, this can be done, for example, by checking the box or scatter plots to make sure your results appear to be true, unaffected by a few potential outliers. You can also sort your results by effect size instead of statistical significance or a combination thereof and pinpoint the most relevant results based on a meaningful effect size threshold possibly based on prior domain knowledge.

Hope this helps!