Maaslin2 handling of covariates

Hi,

First, Maaslin2 is awesome. Thank you for this wonderful tool!

I am running an analysis with one predictor of interest and 3 categorical covariates. I’ve run maaslin2 (lm analysis method, min prevalence and abundance thresholds to filter out rare features), and the output shows pairwise comparisons for the covariates as well as the predictor of interest. Is there a way to make maaslin2 only perform comparisons for the predictor of interest and not the covariates (to reduce the number of comparisons performed) while still including covariates in the input so that they can be controlled for?

Thank you!
Fran

Hi @fquerdasi - one easy way to do that is as follows:

  1. Let’s say you have saved the MaAsLin 2 results in R as follows:
fit_data<-Maaslin2(input_data, input_metadata,...) # Your MaAsLin 2 run
maaslin2_all_results<-fit_data$results # Save results table
  1. Assuming that your primary predictor of interest is named MAIN, you can simply subset the results table as:
library(tidyverse)
maaslin2_results<-maaslin2_all_results %>% filter(metadata == 'MAIN') # Discard covariate associations
  1. Once you have done that, you can simply re-calculate the q-values as follows:
maaslin2_results$qval<-p.adjust(maaslin2_results$pval, method = 'BH') # FDR correction using 'BH'

Hope this helps. Let me know if anything is unclear.

3 Likes

Hi @himel.mallick,

Thank you for the detailed explanation! This method works great.

Best,
Fran

Hello @himel.mallick ,

thanks a lot for your nice explenation in the other thread.

Regarding q-value multiple comparisons of my covariate (sequencing_depth, treatment) and my variate of interest (phenotype), I also followed the steps outlined above.

I am wondering about the results: In the original output files generated by MaAslin2, a total of 88 associations of microbes with the phenotype are found, from which 78 are significant (q-value threshold = 0.25, which is the MaAslin2 default). I am not a statistician, but I would guess this is a quite high number of significant associations?

After recalculating the q-values only for my variate of interest, if taking a q-value threshold of 0.25 as a basis, 81 associations would be significant.

Is there any rule of thumb, where one could estimate appropriate q-value thresholds?

Maybe to explain my case I am working on: Skin microbiome, 65 samples, total of 490 different species, I used min_prevalence = 0.25, min_abundance = 0.0001

1 Like

Hi @plicht - an appropriate q-value threshold is highly context-dependent and I’m not aware of whether the optimal threshold can be estimated on a per dataset basis. The usual recommendation is that you should always use a combination of effect size estimate (i.e., strong vs. weak association), data distribution, and domain knowledge to call out a 'statistically significant' association as 'biologically relevant'.

As @Kelsey_Thompson mentioned elsewhere in a different context, this can be done, for example, by checking the box or scatter plots to make sure your results appear to be true, unaffected by a few potential outliers. You can also sort your results by effect size instead of statistical significance or a combination thereof and pinpoint the most relevant results based on a meaningful effect size threshold possibly based on prior domain knowledge.

Hope this helps!

2 Likes

Hi @himel.mallick, I was going through this topic and the solution that you have suggested.

I am not sure if I well understood. This approach of subset the results and recalculate the q-values has be done in case of a longitudinal dataset (e.g. effect of intervention vs control over time + environmental covariates) or also in case of cross sectional data set (differences in the microbiota between health and disease + environmental covariates)? I don’t know if I was able to correctly explain my doubt. Many thanks

This is a great solution!
If I extend this solution for multiple group comparisons in variable MAIN, should I filter out each comparison and recalculate q-value like this?
[e.g. if main variable has 3 groups A, B, C. reference is A. so comparisons are B vs A, C vs A. should I filter B vs A from results and recalculate q-value and so on?]

Thank you