Maaslin2 handling of covariates

fquerdasi · July 23, 2021, 5:15pm

Hi,

First, Maaslin2 is awesome. Thank you for this wonderful tool!

I am running an analysis with one predictor of interest and 3 categorical covariates. I’ve run maaslin2 (lm analysis method, min prevalence and abundance thresholds to filter out rare features), and the output shows pairwise comparisons for the covariates as well as the predictor of interest. Is there a way to make maaslin2 only perform comparisons for the predictor of interest and not the covariates (to reduce the number of comparisons performed) while still including covariates in the input so that they can be controlled for?

Thank you!
Fran

himel.mallick · July 25, 2021, 9:06am

Hi @fquerdasi - one easy way to do that is as follows:

Let’s say you have saved the MaAsLin 2 results in R as follows:

fit_data<-Maaslin2(input_data, input_metadata,...) # Your MaAsLin 2 run
maaslin2_all_results<-fit_data$results # Save results table

Assuming that your primary predictor of interest is named MAIN, you can simply subset the results table as:

library(tidyverse)
maaslin2_results<-maaslin2_all_results %>% filter(metadata == 'MAIN') # Discard covariate associations

Once you have done that, you can simply re-calculate the q-values as follows:

maaslin2_results$qval<-p.adjust(maaslin2_results$pval, method = 'BH') # FDR correction using 'BH'

Hope this helps. Let me know if anything is unclear.

fquerdasi · July 26, 2021, 8:29pm

Hi @himel.mallick,

Thank you for the detailed explanation! This method works great.

Best,
Fran

plicht · January 5, 2022, 5:40pm

Hello @himel.mallick ,

thanks a lot for your nice explenation in the other thread.

Regarding q-value multiple comparisons of my covariate (sequencing_depth, treatment) and my variate of interest (phenotype), I also followed the steps outlined above.

I am wondering about the results: In the original output files generated by MaAslin2, a total of 88 associations of microbes with the phenotype are found, from which 78 are significant (q-value threshold = 0.25, which is the MaAslin2 default). I am not a statistician, but I would guess this is a quite high number of significant associations?

After recalculating the q-values only for my variate of interest, if taking a q-value threshold of 0.25 as a basis, 81 associations would be significant.

Is there any rule of thumb, where one could estimate appropriate q-value thresholds?

Maybe to explain my case I am working on: Skin microbiome, 65 samples, total of 490 different species, I used min_prevalence = 0.25, min_abundance = 0.0001

himel.mallick · January 6, 2022, 10:11pm

Hi @plicht - an appropriate q-value threshold is highly context-dependent and I’m not aware of whether the optimal threshold can be estimated on a per dataset basis. The usual recommendation is that you should always use a combination of effect size estimate (i.e., strong vs. weak association), data distribution, and domain knowledge to call out a 'statistically significant' association as 'biologically relevant'.

As @Kelsey_Thompson mentioned elsewhere in a different context, this can be done, for example, by checking the box or scatter plots to make sure your results appear to be true, unaffected by a few potential outliers. You can also sort your results by effect size instead of statistical significance or a combination thereof and pinpoint the most relevant results based on a meaningful effect size threshold possibly based on prior domain knowledge.

Hope this helps!

Alessandro_Atzeni · November 2, 2023, 1:41pm

Hi @himel.mallick, I was going through this topic and the solution that you have suggested.

I am not sure if I well understood. This approach of subset the results and recalculate the q-values has be done in case of a longitudinal dataset (e.g. effect of intervention vs control over time + environmental covariates) or also in case of cross sectional data set (differences in the microbiota between health and disease + environmental covariates)? I don’t know if I was able to correctly explain my doubt. Many thanks

Jaydee · December 22, 2023, 5:46pm

This is a great solution!
If I extend this solution for multiple group comparisons in variable MAIN, should I filter out each comparison and recalculate q-value like this?
[e.g. if main variable has 3 groups A, B, C. reference is A. so comparisons are B vs A, C vs A. should I filter B vs A from results and recalculate q-value and so on?]

Thank you

hzk184987245 · August 17, 2024, 7:41am

Hello, @himel.mallick，look here! MaAsLin2 is a great tool, but I’ve encountered some issues:

I have several covariates and a continuous variable A, and I want to find the relationship between variable A and microbiome features using a model like Taxa ~ LM (variable A + covariate1 + covariate2 + covariate3…). After obtaining all the results and filtering them based on a maximum significance threshold of q ≤ 0.25, I found 136 features significantly associated with variable A (all_results.tsv). Following some advice, I subset the final MaAsLin2 results table to focus on the main effects of variable A and re-computed the q-values to detect significant microbiome features. I found that only 20 features met the q ≤ 0.25 threshold after this re-computation. Here’s my code:
maas.result = Maaslin2(
input_data = taxonomy,
input_metadata = sample_metadata,
output = ‘output’,
min_abundance = 0,
min_prevalence = 0,
max_significance = 0.25,
normalization = ‘TSS’,
transform = ‘LOG’, #AST
fixed_effects = c(“Breed”,“Strain”,“Age”,“Batch”,“A”),
reference=c(“Breed,Y”),
standardize = T,
plot_heatmap = F,
plot_scatter = F)

Question 1: Which result should I trust—the 136 or 20 features? If the 136 features are more favorable for my further analysis, can I use them directly without considering the recalculated q-values?
Question 2: Is it necessary to recalculate the q-values? I believe the object is to correct for the influence of covariates and identify microbiome features biologically related to the variable. If recalculating q-values is an important step, why is this not included in the MaAsLin2 user manual?
Question 3: If I have other variables, should I analyze them together in the same model or construct separate models?
fixed_effects = c(“variable A”, “variable B”,“variable C”,“covariate")
or
fixed_effects = c(“variable A”, “covariate“)
fixed_effects = c(“variable B”, “covariate“)
fixed_effects = c(“variable C”, “covariate“)
Thank you very much！

Topic		Replies	Views
The issue of comparing results after recalculating q-values in Maaslin2 MaAsLin	1	93	November 1, 2024
Maaslin2 q value calculation MaAsLin	3	598	December 11, 2023
Could Masslin2 control more than 3 covariates at same time? and How to set interaction with microbiome relative abundance and one of the variable of interest? MaAsLin	2	818	November 5, 2021
Difficulty running MaAsLin2 on microbe data MaAsLin	2	768	February 23, 2022
MaasLin3 and Covariates pFDR Downstream analysis and statistics	6	23	March 2, 2025

Maaslin2 handling of covariates

Related topics