Hi, I am using MaAsLin3 version 1.2.0 with R version 4.5.2.
I have a couple of feature tables but identical samples and metadata. All parameters (default normalisation and transformation, formula, etc.) are also identical. I have also checked that the absolute abundance across samples is also identical for each feature that is present in both feature tables.
However, the pval_joint for the same feature across feature tables varies by a lot, i.e. one is significant (~0.04) and the other is not (~0.7). From what I understand, joint p-value is computed for each feature and usually changes significantly when covariates in the formula changes. It should not change too much when the number of features changes.
I hope someone can enlighten me on this. Thank you.
Hi,
Could you share the abundance and prevalence rows of the associations in question? It’d be useful to see what the individual p-values are. Also, when you say the absolute abundance is identical do you mean the feature abundance values in the different input tables or something else?
Will
Hi Will,
Yes, two input tables.
This is the results from input table 1:
And, input table 2:
Ah, what’s different about those is the null hypothesis. In the abundance models by default, to account for compositionality, each coefficient is compared against the median coefficient for that metadatum over the features. If you have a bunch of additional features in one dataset, that could move the median and therefore the null hypothesis. In this case, the null hypothesis gets much closer to your observed coefficient, so the p-value is a lot higher.
It’s worth noting that if all you care about is relative abundance associations, you can just turn off the median comparison and these will come out to have about the same p-value. However, if you’re trying to infer absolute abundance associations, the inclusion of extra features does change your results in this case.
Thanks for the insight, Will. I re-ran with median_comparison_abundance = FALSE and some do have similar p-value, or even identical. Just curious, what are the factors that may cause differences in p-values between input tables, ever so slightly. Below are the results for tables 1 and 2 respectively:
Feature 3 has identical p-value, while Features 1 and 2 have differences from third decimal place onwards.
It looks like your abundance coefficient is changing, so presumably your abundance values are changing between the 2 models. The joint p-value is based on whichever of the abundance or prevalence p-values is lower, so if the logistic model doesn’t change (because presence/absence doesn’t change between the datasets) but the abundance model does (e.g. if your total sum scaling is affected by additional features), the joint p-value will change for the cases where the abundance model had the lower p-value.
Assuming that abundance values are the abundance in input tables, values are identical for all three features across the two input tables.
For features 1 and 2, the coef, stderr and pval_individual are NA for prevalence model. However, feature 3 has values for these. Could this be the reason why there are small differences in joint p-value across different input tables although abundance values are identical and median comparison for abundance is disabled? Or is it due to something else?
Apologies for more questions. I am trying to understand how it works in the background.
Are the abundance tables identical for all features in all samples though? Even if the values are identical for those 3 features, if one table has extra features, that will increase the denominator in total sum scaling ([particular feature abundance] / [total feature abundance]), which would affect the relative abundance that MaAsLin uses.
When the prevalence model has NA (in your case, all the features were always present, so there’s no presence/absence to evaluate), the joint p-value is the same as the individual p-value of the non-NA model. (You can see this for Features 1 and 2 in the table.) So this shouldn’t be what’s causing the difference.
1 Like