Hi,
I’m working on a longitudinal experiment with two time points (baseline and end of intervention). I’m interested in evaluating the associations between the changes (deltas) in taxa and the changes in biomarkers/metabolites.
A straightforward—though perhaps not the most efficient—approach would be to calculate variables representing these changes (e.g., log change in taxa, relative change in biomarkers), and then assess their associations using traditional methods like Spearman correlation or linear models.
Is there a more efficient or appropriate way to approach this using a MaAsLin* model?
Thanks!
Hi,
All the MaAsLin models answer exactly that question with linear models. They’ll tell you: for each one unit change in your outcome of interest, how many doublings of the taxon’s relative abundance are there. The wiki is a good place to get started.
Will
Hi Will,
Thanks for your reply! I understand that MaAsLin uses linear models to evaluate associations, and I’m familiar with the general framework. However, my question is a bit more specific.
I’m not trying to model associations at a single time point, or even to model repeated measures by including both time points in the input. I’m specifically interested in associating the change in taxa (e.g., log fold-change between baseline and end) with the change in biomarkers/metabolites over the same period—essentially, correlating delta with delta.
While including both time points in the model might adjust for subject-level effects, it doesn’t directly test whether the within-subject change in specific taxa is associated with the within-subject change in biomarkers.
So I’m wondering whether there’s a recommended way to structure the input to MaAsLin2—or an alternative approach within its framework—that specifically targets change vs. change associations.
Appreciate your help!
Hi,
I might be answering the wrong question, but linear models already correlate the delta with the delta: they answer, for a 1 unit increase (delta) in X, how much does Y change by (delta)? If your X is metabolites and your Y is taxon, this will tell you how much a delta in your metabolites is associated with a delta in your taxon.
However, maybe you’re specifically looking for something that handles paired samples by regressing the difference in taxa on the difference in covariates (akin to a one-sample paired t-test rather than a two-sample t-test in the two-group comparison case). MaAsLin could be made to work for a low-dimensional case by computing the difference in your taxa (outcome) and the difference in your metabolites (covariate) and running a model with normalization="NONE", transform="NONE"
. However, if you need to correlate many taxa with many metabolites, you might want to look into something like HAllA with precomputed differences.
Will
1 Like
Thanks, Will—your second point is exactly what I was referring to.
This is an RCT setup, and I’m specifically interested in modeling change from baseline —i.e., regressing the within-subject difference in taxa on the difference in metabolites (delta vs delta).
Your suggestion to manually compute the deltas and run MaAsLin2 with normalization="NONE"
and transform="NONE"
makes sense.
As for HAllA—I’m familiar with it and considered it, but couldn’t find a clear benchmark or documented use case for applying it in an RCT setting focused on within-subject changes . If you know of any examples, I’d be happy to check them out.
One thing I’m still unsure about:
Is there a best practice for how to compute the change in taxa?
Would a log-ratio between time points be more appropriate than an absolute difference, especially considering the compositional nature of the data?
Would appreciate any thoughts or references on that.
I’ll ask some people in our group who might know about a benchmark of HAllA on within-subject changes, but I don’t know of any off the top of my head.
For computing the change in taxa, I’d log transform the relative abundances and take the difference on the log scale. Some people get super bent out of shape about relative vs. absolute abundances
but if you’re just clear that your numbers are changes in log relative abundance, that should be fine. If you really need absolute abundances, you might try using spike-ins or qPCR of marker genes in the future.