First, Maaslin2 is awesome. Thank you for this wonderful tool!
I am running an analysis with one predictor of interest and 3 categorical covariates. I’ve run maaslin2 (lm analysis method, min prevalence and abundance thresholds to filter out rare features), and the output shows pairwise comparisons for the covariates as well as the predictor of interest. Is there a way to make maaslin2 only perform comparisons for the predictor of interest and not the covariates (to reduce the number of comparisons performed) while still including covariates in the input so that they can be controlled for?
thanks a lot for your nice explenation in the other thread.
Regarding q-value multiple comparisons of my covariate (sequencing_depth, treatment) and my variate of interest (phenotype), I also followed the steps outlined above.
I am wondering about the results: In the original output files generated by MaAslin2, a total of 88 associations of microbes with the phenotype are found, from which 78 are significant (q-value threshold = 0.25, which is the MaAslin2 default). I am not a statistician, but I would guess this is a quite high number of significant associations?
After recalculating the q-values only for my variate of interest, if taking a q-value threshold of 0.25 as a basis, 81 associations would be significant.
Is there any rule of thumb, where one could estimate appropriate q-value thresholds?
Maybe to explain my case I am working on: Skin microbiome, 65 samples, total of 490 different species, I used min_prevalence = 0.25, min_abundance = 0.0001
Hi @plicht - an appropriate q-value threshold is highly context-dependent and I’m not aware of whether the optimal threshold can be estimated on a per dataset basis. The usual recommendation is that you should always use a combination of effect size estimate (i.e., strong vs. weak association), data distribution, and domain knowledge to call out a 'statistically significant' association as 'biologically relevant'.
As @Kelsey_Thompson mentioned elsewhere in a different context, this can be done, for example, by checking the box or scatter plots to make sure your results appear to be true, unaffected by a few potential outliers. You can also sort your results by effect size instead of statistical significance or a combination thereof and pinpoint the most relevant results based on a meaningful effect size threshold possibly based on prior domain knowledge.
Hi @himel.mallick, I was going through this topic and the solution that you have suggested.
I am not sure if I well understood. This approach of subset the results and recalculate the q-values has be done in case of a longitudinal dataset (e.g. effect of intervention vs control over time + environmental covariates) or also in case of cross sectional data set (differences in the microbiota between health and disease + environmental covariates)? I don’t know if I was able to correctly explain my doubt. Many thanks
This is a great solution!
If I extend this solution for multiple group comparisons in variable MAIN, should I filter out each comparison and recalculate q-value like this?
[e.g. if main variable has 3 groups A, B, C. reference is A. so comparisons are B vs A, C vs A. should I filter B vs A from results and recalculate q-value and so on?]
Hello, @himel.mallick,look here! MaAsLin2 is a great tool, but I’ve encountered some issues:
I have several covariates and a continuous variable A, and I want to find the relationship between variable A and microbiome features using a model like Taxa ~ LM (variable A + covariate1 + covariate2 + covariate3…). After obtaining all the results and filtering them based on a maximum significance threshold of q ≤ 0.25, I found 136 features significantly associated with variable A (all_results.tsv). Following some advice, I subset the final MaAsLin2 results table to focus on the main effects of variable A and re-computed the q-values to detect significant microbiome features. I found that only 20 features met the q ≤ 0.25 threshold after this re-computation. Here’s my code:
maas.result = Maaslin2(
input_data = taxonomy,
input_metadata = sample_metadata,
output = ‘output’,
min_abundance = 0,
min_prevalence = 0,
max_significance = 0.25,
normalization = ‘TSS’,
transform = ‘LOG’, #AST
fixed_effects = c(“Breed”,“Strain”,“Age”,“Batch”,“A”),
reference=c(“Breed,Y”),
standardize = T,
plot_heatmap = F,
plot_scatter = F)
Question 1: Which result should I trust—the 136 or 20 features? If the 136 features are more favorable for my further analysis, can I use them directly without considering the recalculated q-values? Question 2: Is it necessary to recalculate the q-values? I believe the object is to correct for the influence of covariates and identify microbiome features biologically related to the variable. If recalculating q-values is an important step, why is this not included in the MaAsLin2 user manual? Question 3: If I have other variables, should I analyze them together in the same model or construct separate models?
fixed_effects = c(“variable A”, “variable B”,“variable C”,“covariate")
or
fixed_effects = c(“variable A”, “covariate“)
fixed_effects = c(“variable B”, “covariate“)
fixed_effects = c(“variable C”, “covariate“)
Thank you very much!